From c5ec32dfd97fd2929183fce5c7d172a500db91ce Mon Sep 17 00:00:00 2001 From: John Stultz Date: Fri, 23 Feb 2024 14:28:19 -0800 Subject: [PATCH] FROMLIST: sched: Fix psi_dequeue for Proxy Execution Currently, if the sleep flag is set, psi_dequeue() doesn't change any of the psi_flags. This is because psi_switch_task() will clear TSK_ONCPU as well as other potential flags (TSK_RUNNING), and the assumption is that a voluntary sleep always consists of a task being dequeued followed shortly there after with a psi_sched_switch() call. Proxy Execution changes this expectation, as mutex-blocked tasks that would normally sleep stay on the runqueue. In the case where the mutex owning task goes to sleep, we will then deactivate the blocked task as well. However, in that case, the mutex-blocked task will have had its TSK_ONCPU cleared when it was switched off the cpu, but it will stay TSK_RUNNING. Then when we later dequeue it becaues of a sleeping-owner, as it is sleeping psi_dequeue() won't change any state (leaving it TSK_RUNNING), as it incorrectly expects a psi_task_switch() call to immediately follow. Later on when it get re enqueued, and psi_flags are set for TSK_RUNNING, we hit an error as the task is already TSK_RUNNING: psi: inconsistent task state! To resolve this, extend the logic in psi_dequeue() so that if the sleep flag is set, we also check if psi_flags have TSK_ONCPU set (meaning the psi_task_switch is imminient) before we do the shortcut return. If TSK_ONCPU is not set, that means we've already switched away, and this psi_dequeue call needs to clear the flags. Cc: Joel Fernandes Cc: Qais Yousef Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Valentin Schneider Cc: Steven Rostedt Cc: Ben Segall Cc: Zimuzo Ezeozue Cc: Mel Gorman Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: "Paul E. McKenney" Cc: Metin Kaya Cc: Xuewen Yan Cc: K Prateek Nayak Cc: Thomas Gleixner Cc: Daniel Lezcano Cc: kernel-team@android.com Change-Id: I13a56956a3af5c9af1456b13739e7abd93e52db8 Signed-off-by: John Stultz Link: https://lore.kernel.org/lkml/20241125195204.2374458-5-jstultz@google.com/ Bug: 306081722 --- v13: * Reworked for collision --- kernel/sched/stats.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h index 8ee0add5a48a..c313fe76a772 100644 --- a/kernel/sched/stats.h +++ b/kernel/sched/stats.h @@ -176,8 +176,12 @@ static inline void psi_dequeue(struct task_struct *p, int flags) * avoid walking all ancestors twice, psi_task_switch() handles * TSK_RUNNING and TSK_IOWAIT for us when it moves TSK_ONCPU. * Do nothing here. + * In the SCHED_PROXY_EXECUTION case we may do sleeping + * dequeues that are not followed by a task switch, so check + * TSK_ONCPU is set to ensure the task switch is imminent. + * Otherwise clear the flags as usual. */ - if (flags & DEQUEUE_SLEEP) + if ((flags & DEQUEUE_SLEEP) && (p->psi_flags & TSK_ONCPU)) return; /*