From eb74f85a1f384c037972014c56ec9da9c2f02bb9 Mon Sep 17 00:00:00 2001 From: kuyo chang Date: Sun, 15 Jun 2025 21:10:56 +0800 Subject: [PATCH] sched/deadline: Fix RT task potential starvation when expiry time passed [Symptom] The fair server mechanism, which is intended to prevent fair starvation when higher-priority tasks monopolize the CPU. Specifically, RT tasks on the runqueue may not be scheduled as expected. [Analysis] --------- The log "sched: DL replenish lagged too much" triggered. By memory dump of dl_server: -------------- curr = 0xFFFFFF80D6A0AC00 ( dl_server = 0xFFFFFF83CD5B1470( dl_runtime = 0x02FAF080, dl_deadline = 0x3B9ACA00, dl_period = 0x3B9ACA00, dl_bw = 0xCCCC, dl_density = 0xCCCC, runtime = 0x02FAF080, deadline = 0x0000082031EB0E80, flags = 0x0, dl_throttled = 0x0, dl_yielded = 0x0, dl_non_contending = 0x0, dl_overrun = 0x0, dl_server = 0x1, dl_server_active = 0x1, dl_defer = 0x1, dl_defer_armed = 0x0, dl_defer_running = 0x1, dl_timer = ( node = ( expires = 0x000008199756E700), _softexpires = 0x000008199756E700, function = 0xFFFFFFDB9AF44D30 = dl_task_timer, base = 0xFFFFFF83CD5A12C0, state = 0x0, is_rel = 0x0, is_soft = 0x0, clock_update_flags = 0x4, clock = 0x000008204A496900, - The timer expiration time (rq->curr->dl_server->dl_timer->expires) is already in the past, indicating the timer has expired. - The timer state (rq->curr->dl_server->dl_timer->state) is 0. [Suspected Root Cause] -------------------- The relevant code flow in the throttle path of update_curr_dl_se() as follows: dequeue_dl_entity(dl_se, 0); // the DL entity is dequeued if (unlikely(is_dl_boosted(dl_se) || !start_dl_timer(dl_se))) { if (dl_server(dl_se)) // timer registration fails enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH);//enqueue immediately ... } The failure of `start_dl_timer` is caused by attempting to register a timer with an expiration time that is already in the past. When this situation persists, the code repeatedly re-enqueues the DL entity without properly replenishing or restarting the timer, resulting in RT task may not be scheduled as expected. [Proposed Solution]: ------------------ Instead of immediately re-enqueuing the DL entity on timer registration failure, this change ensures the DL entity is properly replenished and the timer is restarted, preventing RT potential starvation. Signed-off-by: kuyo chang Closes: https://lore.kernel.org/CAMuHMdXn4z1pioTtBGMfQM0jsLviqS2jwysaWXpoLxWYoGa82w@mail.gmail.com Tested-by: Geert Uytterhoeven --- kernel/sched/deadline.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c index be1b917dc8ce..9f60a71c7562 100644 --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1537,10 +1537,12 @@ throttle: } if (unlikely(is_dl_boosted(dl_se) || !start_dl_timer(dl_se))) { - if (dl_server(dl_se)) - enqueue_dl_entity(dl_se, ENQUEUE_REPLENISH); - else + if (dl_server(dl_se)) { + replenish_dl_new_period(dl_se, rq); + start_dl_timer(dl_se); + } else { enqueue_task_dl(rq, dl_task_of(dl_se), ENQUEUE_REPLENISH); + } } if (!is_leftmost(dl_se, &rq->dl))