ANDROID: Update proxy-exec logic from v14 to v18

This updates the proxy-exec logic in android16-6.12 which was added at v14, to be synced with the v18 series of the patchset. v14 series: https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v14-6.12 v18 series: https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v18-6.12 Changes since v14: * Improved naming consistency and using the guard macro where appropriate * Improved comments * Build fixes for !CONFIG_SMP * Fixes for when sched_proxy_exec() is disabled * Renamed update_curr_se to update_se_times, as suggested by Steven Rostedt. * Use put_prev_set_next_task as suggested by K Prateek Nayak * Try to rework find_proxy_task() locking to use guard and proxy_deactivate_task() in the way Peter suggested. * Simplified changes to enqueue_task_rt to match deadline's logic, as pointed out by Peter * Get rid of preserve_need_resched flag and rework per Peter's suggestion * Rework find_proxy_task() to use guard to cleanup the exit gotos as Peter suggested. * Reworked the forced return-migration from find_proxy_task to use Peter’s dequeue+wakeup approach, which helps resolve the cpuhotplug issues I had also seen, caused by the manual return migration sending tasks to offline cpus. * A number of improvements to the commit messages and comments suggested by Juri Lelli and Peter Zijlstra * Added missing logic to put_prev_task_dl as pointed out by K Prateek Nayak * Add lockdep_assert_held_once and drop the READ_ONCE in __get_task_blocked_on(), as suggested by Juri Lelli * Moved update_curr_task logic into update_curr_se to simplify things * Renamed update_se_times to update_se, as suggested by Peter * Reworked logic to fix an issue Peter pointed out with thread group accounting being done on the donor, rather than the running execution context. * Fixed typos caught by Metin Kaya * Suleiman Souhlal noticed an inefficiency in that we evaluate if the lock owner’s task_cpu() is the current cpu, before we look to see if the lock owner is on_rq at all. With v17 this would result in us proxy-migrating a donor to a remote cpu, only to then realize the task wasn’t even on the runqueue, and doing the sleeping owner enqueuing. Suleiman suggested instead that we evaluate on_rq first, so we can immediately do sleeping owner enqueueing. Then only if the owner is on a runqueue do we proxy-migrate the donor (which requires the more costly lock juggling). While not a huge logical change, it did uncover other problems, which needed to be resolved. * One issue found was there was a race where if do_activate_blocked_waiter() from the sleeping owner wakeup was delayed and the task had already been woken up elsewhere. It’s possible if that task was running and called into schedule() to be blocked, it would be dequeued from the runqueue, but before we switched to the new task, do_activate_blocked_waiter() might try to activate it on a different cpu. Clearly the do_activate_blocked_waiter() needed to check the task on_cpu value as well. * I found that we still can hit wakeups that end up skipping the BO_WAKING -> BO_RUNNALBE transition (causing find_proxy_task() to end up spinning waiting for that transition), so I re-added the logic to handle doing return migrations from find_proxy_task() if we hit that case. * Hupu suggested a tweak in ttwu_runnable() to evaluate proxy_needs_return() slightly earlier. * Kuyo Chang reported and isolated a fix for a problem with __task_is_pushable() in the !sched_proxy_exec case, which was folded into the “sched: Fix rt/dl load balancing via chain level balance” patch * Reworked some of the logic around releasing the rq->donor reference on migrations, using rq->idle directly. * Sueliman also pointed out that some added task_struct elements were not being initialized in the init_task code path, so that was good to fix. Bug: 427820735 Change-Id: I20ce778e474124a917dbf51378dc1301535ac858 Signed-off-by: John Stultz <jstultz@google.com>
2025-06-25 22:33:49 +00:00
parent 3fa8dabe1a
commit 4f9e4406e4
11 changed files with 400 additions and 371 deletions
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1261,8 +1261,8 @@ struct task_struct {
 	enum blocked_on_state		blocked_on_state;
 	struct mutex			*blocked_on;	/* lock we're blocked on */
 	struct task_struct		*blocked_donor;	/* task that is boosting this task */
-#ifdef CONFIG_SCHED_PROXY_EXEC
 	struct list_head		migration_node;
+#ifdef CONFIG_SCHED_PROXY_EXEC
 	struct list_head		blocked_head;  /* tasks blocked on this task */
 	struct list_head		blocked_node;  /* our entry on someone elses blocked_head */
 	/* Node for list of tasks to process blocked_head list for blocked entitiy activations */
@@ -2198,6 +2198,18 @@ extern int __cond_resched_rwlock_write(rwlock_t *lock);
 	__cond_resched_rwlock_write(lock);					\
 })

+static inline void __force_blocked_on_runnable(struct task_struct *p)
+{
+	lockdep_assert_held(&p->blocked_lock);
+	p->blocked_on_state = BO_RUNNABLE;
+}
+
+static inline void force_blocked_on_runnable(struct task_struct *p)
+{
+	guard(raw_spinlock_irqsave)(&p->blocked_lock);
+	__force_blocked_on_runnable(p);
+}
+
 static inline void __set_blocked_on_runnable(struct task_struct *p)
 {
 	lockdep_assert_held(&p->blocked_lock);
@@ -2208,17 +2220,14 @@ static inline void __set_blocked_on_runnable(struct task_struct *p)

 static inline void set_blocked_on_runnable(struct task_struct *p)
 {
-	unsigned long flags;
-
 	if (!sched_proxy_exec())
 		return;

-	raw_spin_lock_irqsave(&p->blocked_lock, flags);
+	guard(raw_spinlock_irqsave)(&p->blocked_lock);
 	__set_blocked_on_runnable(p);
-	raw_spin_unlock_irqrestore(&p->blocked_lock, flags);
 }

-static inline void set_blocked_on_waking(struct task_struct *p)
+static inline void __set_blocked_on_waking(struct task_struct *p)
 {
 	lockdep_assert_held(&p->blocked_lock);

@@ -2226,25 +2235,37 @@ static inline void set_blocked_on_waking(struct task_struct *p)
 		p->blocked_on_state = BO_WAKING;
 }

-static inline void set_task_blocked_on(struct task_struct *p, struct mutex *m)
+static inline void __set_task_blocked_on(struct task_struct *p, struct mutex *m)
 {
-	lockdep_assert_held(&p->blocked_lock);
-
+	WARN_ON_ONCE(!m);
+	/* The task should only be setting itself as blocked */
+	WARN_ON_ONCE(p != current);
+	/* Currently we serialize blocked_on under the task::blocked_lock */
+	lockdep_assert_held_once(&p->blocked_lock);
 	/*
-	 * Check we are clearing values to NULL or setting NULL
-	 * to values to ensure we don't overwrite existing mutex
-	 * values or clear already cleared values
+	 * Check ensure we don't overwrite existing mutex value
+	 * with a different mutex.
 	 */
-	WARN_ON((!m && !p->blocked_on) || (m && p->blocked_on));
-
+	WARN_ON_ONCE(p->blocked_on);
 	p->blocked_on = m;
-	p->blocked_on_state = m ? BO_BLOCKED : BO_RUNNABLE;
+	p->blocked_on_state = BO_BLOCKED;
 }

-static inline struct mutex *get_task_blocked_on(struct task_struct *p)
+static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *m)
 {
-	lockdep_assert_held(&p->blocked_lock);
+	/* The task should only be clearing itself */
+	WARN_ON_ONCE(p != current);
+	/* Currently we serialize blocked_on under the task::blocked_lock */
+	lockdep_assert_held_once(&p->blocked_lock);
+	/* Make sure we are clearing the relationship with the right lock */
+	WARN_ON_ONCE(p->blocked_on != m);
+	p->blocked_on = NULL;
+	p->blocked_on_state = BO_RUNNABLE;
+}

+static inline struct mutex *__get_task_blocked_on(struct task_struct *p)
+{
+	lockdep_assert_held_once(&p->blocked_lock);
 	return p->blocked_on;
 }

--- a/init/init_task.c
+++ b/init/init_task.c
@@ -172,6 +172,15 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
 #ifdef CONFIG_CPUSETS
 	.mems_allowed_seq = SEQCNT_SPINLOCK_ZERO(init_task.mems_allowed_seq,
 						 &init_task.alloc_lock),
+#endif
+	.blocked_on_state = BO_RUNNABLE,
+	.blocked_donor = NULL,
+	.migration_node = LIST_HEAD_INIT(init_task.migration_node),
+#ifdef CONFIG_SCHED_PROXY_EXEC
+	.blocked_head = LIST_HEAD_INIT(init_task.blocked_head),
+	.blocked_node = LIST_HEAD_INIT(init_task.blocked_node),
+	.blocked_activation_node = LIST_HEAD_INIT(init_task.blocked_activation_node),
+	.sleeping_owner = NULL,
 #endif
 #ifdef CONFIG_RT_MUTEXES
 	.pi_waiters	= RB_ROOT_CACHED,
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2355,8 +2355,8 @@ __latent_entropy struct task_struct *copy_process(
 	p->blocked_on_state = BO_RUNNABLE;
 	p->blocked_on = NULL; /* not blocked yet */
 	p->blocked_donor = NULL; /* nobody is boosting p yet */
-#ifdef CONFIG_SCHED_PROXY_EXEC
 	INIT_LIST_HEAD(&p->migration_node);
+#ifdef CONFIG_SCHED_PROXY_EXEC
 	INIT_LIST_HEAD(&p->blocked_head);
 	INIT_LIST_HEAD(&p->blocked_node);
 	INIT_LIST_HEAD(&p->blocked_activation_node);
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -54,13 +54,13 @@ void debug_mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
 	lockdep_assert_held(&lock->wait_lock);

 	/* Current thread can't be already blocked (since it's executing!) */
-	DEBUG_LOCKS_WARN_ON(get_task_blocked_on(task));
+	DEBUG_LOCKS_WARN_ON(__get_task_blocked_on(task));
 }

 void debug_mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter,
 			 struct task_struct *task)
 {
-	struct mutex *blocked_on = get_task_blocked_on(task);
+	struct mutex *blocked_on = __get_task_blocked_on(task);

 	DEBUG_LOCKS_WARN_ON(list_empty(&waiter->list));
 	DEBUG_LOCKS_WARN_ON(waiter->task != task);
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -652,7 +652,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 	}

 	trace_android_vh_mutex_wait_start(lock);
-	set_task_blocked_on(current, lock);
+	__set_task_blocked_on(current, lock);
 	set_current_state(state);
 	trace_contention_begin(lock, LCB_F_MUTEX);
 	for (;;) {
@@ -713,8 +713,10 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 			bool opt_acquired;

 			/*
-			 * mutex_optimistic_spin() can schedule, so  we need to
-			 * release these locks before calling it.
+			 * mutex_optimistic_spin() can call schedule(), so
+			 * we need to release these locks before calling it,
+			 * and clear blocked on so we don't become unselectable
+			 * to run.
 			 */
 			current->blocked_on_state = BO_RUNNABLE;
 			raw_spin_unlock(&current->blocked_lock);
@@ -729,7 +731,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
 			trace_contention_begin(lock, LCB_F_MUTEX);
 		}
 	}
-	set_task_blocked_on(current, NULL);
+	__clear_task_blocked_on(current, lock);
 	__set_current_state(TASK_RUNNING);
 	trace_android_vh_mutex_wait_finish(lock);

@@ -763,12 +765,12 @@ skip_wait:
 	return 0;

 err:
-	set_task_blocked_on(current, NULL);
+	__clear_task_blocked_on(current, lock);
 	__set_current_state(TASK_RUNNING);
 	trace_android_vh_mutex_wait_finish(lock);
 	__mutex_remove_waiter(lock, &waiter);
 err_early_kill:
-	WARN_ON(get_task_blocked_on(current));
+	WARN_ON(__get_task_blocked_on(current));
 	trace_contention_end(lock, ret);
 	raw_spin_unlock(&current->blocked_lock);
 	raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
@@ -990,10 +992,10 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
 			struct mutex *next_lock;

 			raw_spin_lock_nested(&donor->blocked_lock, SINGLE_DEPTH_NESTING);
-			next_lock = get_task_blocked_on(donor);
+			next_lock = __get_task_blocked_on(donor);
 			if (next_lock == lock) {
 				next = donor;
-				set_blocked_on_waking(donor);
+				__set_blocked_on_waking(donor);
 				wake_q_add(&wake_q, donor);
 				current->blocked_donor = NULL;
 			}
@@ -1014,10 +1016,10 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne

 		raw_spin_lock_nested(&next->blocked_lock, SINGLE_DEPTH_NESTING);
 		debug_mutex_wake_waiter(lock, waiter);
-		WARN_ON(get_task_blocked_on(next) != lock);
-		set_blocked_on_waking(next);
-		wake_q_add(&wake_q, next);
+		WARN_ON_ONCE(__get_task_blocked_on(next) != lock);
+		__set_blocked_on_waking(next);
 		raw_spin_unlock(&next->blocked_lock);
+		wake_q_add(&wake_q, next);
 	}

 	if (owner & MUTEX_FLAG_HANDOFF)
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -281,21 +281,20 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
 		return false;

 	if (waiter->ww_ctx->acquired > 0 && __ww_ctx_less(waiter->ww_ctx, ww_ctx)) {
-		/* nested as we should hold current->blocked_lock already */
-		raw_spin_lock_nested(&waiter->task->blocked_lock, SINGLE_DEPTH_NESTING);
 #ifndef WW_RT
 		debug_mutex_wake_waiter(lock, waiter);
+#endif
+		 /* nested as we should hold current->blocked_lock already */
+		raw_spin_lock_nested(&waiter->task->blocked_lock, SINGLE_DEPTH_NESTING);
 		/*
 		 * When waking up the task to die, be sure to set the
-		 * blocked_on_state to WAKING. Otherwise we can see
-		 * circular blocked_on relationships that can't
-		 * resolve.
+		 * blocked_on_state to BO_WAKING. Otherwise we can see
+		 * circular blocked_on relationships that can't resolve.
 		 */
-		WARN_ON(get_task_blocked_on(waiter->task) != lock);
-#endif
-		set_blocked_on_waking(waiter->task);
-		wake_q_add(wake_q, waiter->task);
+		WARN_ON_ONCE(__get_task_blocked_on(waiter->task) != lock);
+		__set_blocked_on_waking(waiter->task);
 		raw_spin_unlock(&waiter->task->blocked_lock);
+		wake_q_add(wake_q, waiter->task);
 	}

 	return true;
@@ -347,12 +346,12 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
 			raw_spin_lock_nested(&owner->blocked_lock, SINGLE_DEPTH_NESTING);
 			/*
 			 * When waking up the task to wound, be sure to set the
-			 * blocked_on_state flag. Otherwise we can see circular
-			 * blocked_on relationships that can't resolve.
+			 * blocked_on_state to BO_WAKING. Otherwise we can see
+			 * circular blocked_on relationships that can't resolve.
 			 */
-			set_blocked_on_waking(owner);
-			wake_q_add(wake_q, owner);
+			__set_blocked_on_waking(owner);
 			raw_spin_unlock(&owner->blocked_lock);
+			wake_q_add(wake_q, owner);
 		}
 		return true;
 	}
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2141,7 +2141,7 @@ inline bool dequeue_task(struct rq *rq, struct task_struct *p, int flags)
 	return dequeue_task_result;
 }

-void activate_task(struct rq *rq, struct task_struct *p, int flags)
+static inline void __activate_task(struct rq *rq, struct task_struct *p, int flags)
 {
 	if (task_on_rq_migrating(p))
 		flags |= ENQUEUE_MIGRATED;
@@ -2155,6 +2155,61 @@ void activate_task(struct rq *rq, struct task_struct *p, int flags)
 }
 EXPORT_SYMBOL_GPL(activate_task);

+#ifdef CONFIG_SCHED_PROXY_EXEC
+static inline
+void __proxy_remove_from_sleeping_owner(struct task_struct *owner, struct task_struct *p)
+{
+	lockdep_assert_held(&owner->blocked_lock);
+
+	if (p->sleeping_owner == owner) {
+		list_del_init(&p->blocked_node);
+		WRITE_ONCE(p->sleeping_owner, NULL);
+		put_task_struct(owner); // matches get in proxy_enqueue_on_owner
+	}
+}
+
+static inline void proxy_remove_from_sleeping_owner(struct task_struct *p)
+{
+	struct task_struct *owner = READ_ONCE(p->sleeping_owner);
+
+	if (owner) {
+		raw_spin_lock(&owner->blocked_lock);
+		__proxy_remove_from_sleeping_owner(owner, p);
+		raw_spin_unlock(&owner->blocked_lock);
+	}
+}
+
+void activate_task(struct rq *rq, struct task_struct *p, int flags)
+{
+	if (!sched_proxy_exec()) {
+		__activate_task(rq, p, flags);
+		return;
+	}
+
+	lockdep_assert_rq_held(rq);
+	proxy_remove_from_sleeping_owner(p);
+	/*
+	 * By calling __activate_task() with blocked_lock held, we
+	 * order against the find_proxy_task() blocked_task case
+	 * such that no more blocked tasks will be enqueued on p
+	 * once we release p->blocked_lock.
+	 */
+	raw_spin_lock(&p->blocked_lock);
+	WARN_ON(task_cpu(p) != cpu_of(rq));
+	__activate_task(rq, p, flags);
+	raw_spin_unlock(&p->blocked_lock);
+}
+#else
+static inline void proxy_remove_from_sleeping_owner(struct task_struct *p)
+{
+}
+
+void activate_task(struct rq *rq, struct task_struct *p, int flags)
+{
+	__activate_task(rq, p, flags);
+}
+#endif
+
 void deactivate_task(struct rq *rq, struct task_struct *p, int flags)
 {
 	SCHED_WARN_ON(flags & DEQUEUE_SLEEP);
@@ -3794,68 +3849,14 @@ static inline void ttwu_do_wakeup(struct task_struct *p)
 }

 #ifdef CONFIG_SCHED_PROXY_EXEC
-static inline
-void __proxy_remove_from_sleeping_owner(struct task_struct *owner, struct task_struct *p)
-{
-	lockdep_assert_held(&owner->blocked_lock);
-
-	if (p->sleeping_owner == owner) {
-		list_del_init(&p->blocked_node);
-		WRITE_ONCE(p->sleeping_owner, NULL);
-		put_task_struct(owner); // matches get in proxy_enqueue_on_owner
-	}
-}
-
-static inline void proxy_remove_from_sleeping_owner(struct task_struct *p)
-{
-	struct task_struct *owner = READ_ONCE(p->sleeping_owner);
-
-	if (owner) {
-		raw_spin_lock(&owner->blocked_lock);
-		__proxy_remove_from_sleeping_owner(owner, p);
-		raw_spin_unlock(&owner->blocked_lock);
-	}
-}
-
-static void do_activate_task(struct rq *rq, struct task_struct *p, int en_flags)
-{
-	if (!sched_proxy_exec()) {
-		activate_task(rq, p, en_flags);
-		return;
-	}
-
-	lockdep_assert_rq_held(rq);
-	proxy_remove_from_sleeping_owner(p);
-	/*
-	 * By calling activate_task with blocked_lock held, we
-	 * order against the find_proxy_task() blocked_task case
-	 * such that no more blocked tasks will be enqueued on p
-	 * once we release p->blocked_lock.
-	 */
-	raw_spin_lock(&p->blocked_lock);
-	WARN_ON(task_cpu(p) != cpu_of(rq));
-	activate_task(rq, p, en_flags);
-	raw_spin_unlock(&p->blocked_lock);
-}
-
-static bool proxy_task_runnable_but_waking(struct task_struct *p)
-{
-	if (!sched_proxy_exec())
-		return false;
-	return (READ_ONCE(p->__state) == TASK_RUNNING &&
-		READ_ONCE(p->blocked_on_state) == BO_WAKING);
-}
-
 #ifdef CONFIG_SMP
 static inline void proxy_set_task_cpu(struct task_struct *p, int cpu)
 {
 	unsigned int wake_cpu;

-	/* Sanity check to make sure we can return safely */
-	WARN_ON(!is_cpu_allowed(p, p->wake_cpu));
 	/*
-	 * Since we enqueuing blocked tasks on a cpu it may not
-	 * be able to run on, preserve wake_cpu when we
+	 * Since we are enqueuing a blocked task on a cpu it may
+	 * not be able to run on, preserve wake_cpu when we
 	 * __set_task_cpu so we can return the task to where it
 	 * was previously runnable.
 	 */
@@ -3869,38 +3870,53 @@ static inline void proxy_set_task_cpu(struct task_struct *p, int cpu)
 	__set_task_cpu(p, cpu);
 }
 #endif /* CONFIG_SMP */
+static bool proxy_task_runnable_but_waking(struct task_struct *p)
+{
+	if (!sched_proxy_exec())
+		return false;
+	return (READ_ONCE(p->__state) == TASK_RUNNING &&
+		READ_ONCE(p->blocked_on_state) == BO_WAKING);
+}

 static void do_activate_blocked_waiter(struct rq *target_rq, struct task_struct *p, int en_flags)
 {
-	unsigned long flags;
 	unsigned int state;
 	struct rq_flags rf;
 	int target_cpu = cpu_of(target_rq);

-	raw_spin_lock_irqsave(&p->pi_lock, flags);
-	state = READ_ONCE(p->__state);
-	/* Avoid racing with ttwu */
-	if (state == TASK_WAKING)
-		goto out;
+	scoped_guard (raw_spinlock_irqsave, &p->pi_lock) {
+		state = READ_ONCE(p->__state);
+		/* Avoid racing with ttwu */
+		if (state == TASK_WAKING)
+			return;

-	if (READ_ONCE(p->on_rq)) {
-		/*
-		 * We raced with a non mutex handoff activation of p.
-		 * That activation will also take care of activating
-		 * all of the tasks after p in the blocked_head list,
-		 * so we're done here.
-		 */
-		goto out;
+		if (READ_ONCE(p->on_rq)) {
+			/*
+			 * We raced with a non mutex handoff activation of p.
+			 * That activation will also take care of activating
+			 * all of the tasks after p in the blocked_head list,
+			 * so we're done here.
+			 */
+			return;
+		}
+		if (READ_ONCE(p->on_cpu)) {
+			/*
+			 * Its possible this activation is very late, and
+			 * we already were woken up and are running on a
+			 * different cpu. If that task blocked, it could be
+			 * dequeued (so on_rq == 0), but still on_cpu.
+			 * Bail in this case, as we definitely don't want to
+			 * activate a task when its on_cpu elsewhere.
+			 */
+			return;
+		}
+		proxy_set_task_cpu(p, target_cpu);
+		rq_lock_irqsave(target_rq, &rf);
+		update_rq_clock(target_rq);
+		activate_task(target_rq, p, en_flags);
+		resched_curr(target_rq);
+		rq_unlock_irqrestore(target_rq, &rf);
 	}
-
-	proxy_set_task_cpu(p, target_cpu);
-	rq_lock_irqsave(target_rq, &rf);
-	update_rq_clock(target_rq);
-	do_activate_task(target_rq, p, en_flags);
-	resched_curr(target_rq);
-	rq_unlock_irqrestore(target_rq, &rf);
-out:
-	raw_spin_unlock_irqrestore(&p->pi_lock, flags);
 }

 static void activate_blocked_waiters(struct rq *target_rq,
@@ -3920,10 +3936,10 @@ static void activate_blocked_waiters(struct rq *target_rq,
 		en_flags |= ENQUEUE_MIGRATED;

 	/*
-	 * A whole bunch of 'proxy' tasks back this blocked task, wake
-	 * them all up to give this task its 'fair' share.
-	 */
-	/*
+	 * A whole bunch of waiting donor tasks back this blocked
+	 * lock owner task, wake them all up to give this task its
+	 * 'fair' share.
+	 *
 	 * This is a little unique here and the locking is messy.
 	 * At this point we only hold the blocked_lock, so the
 	 * owner task may be able to run and do all sorts of
@@ -4105,16 +4121,6 @@ void move_queued_task_locked(struct rq *src_rq, struct rq *dst_rq, struct task_s
 }
 #endif /* CONFIG_SMP */
 #else /* !CONFIG_SCHED_PROXY_EXEC */
-static inline void proxy_remove_from_sleeping_owner(struct task_struct *p)
-{
-}
-
-static inline void do_activate_task(struct rq *rq, struct task_struct *p,
-				    int en_flags)
-{
-	activate_task(rq, p, en_flags);
-}
-
 static bool proxy_task_runnable_but_waking(struct task_struct *p)
 {
 	return false;
@@ -4128,6 +4134,13 @@ static inline void activate_blocked_waiters(struct rq *target_rq,
 #endif /* CONFIG_SCHED_PROXY_EXEC */

 #ifdef CONFIG_SMP
+/*
+ * Checks to see if task p has been proxy-migrated to another rq
+ * and needs to be returned. If so, we deactivate the task here
+ * so that it can be properly woken up on the p->wake_cpu
+ * (or whichever cpu select_task_rq() picks at the bottom of
+ * try_to_wake_up()
+ */
 static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
 {
 	bool ret = false;
@@ -4136,7 +4149,7 @@ static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
 		return false;

 	raw_spin_lock(&p->blocked_lock);
-	if (get_task_blocked_on(p) && p->blocked_on_state == BO_WAKING) {
+	if (__get_task_blocked_on(p) && p->blocked_on_state == BO_WAKING) {
 		if (!task_current(rq, p) && (p->wake_cpu != cpu_of(rq))) {
 			if (task_current_donor(rq, p)) {
 				put_prev_task(rq, p);
@@ -4161,6 +4174,7 @@ static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
 {
 	return false;
 }
+
 static inline void _trace_sched_pe_return_migration(struct task_struct *p)
 {
 }
@@ -4192,7 +4206,7 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags,
 		atomic_dec(&task_rq(p)->nr_iowait);
 	}

-	do_activate_task(rq, p, en_flags);
+	activate_task(rq, p, en_flags);
 	wakeup_preempt(rq, p, wake_flags);

 	ttwu_do_wakeup(p);
@@ -4260,6 +4274,10 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags)
 			proxy_remove_from_sleeping_owner(p);
 			enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
 		}
+		if (proxy_needs_return(rq, p)) {
+			_trace_sched_pe_return_migration(p);
+			goto out;
+		}
 		if (!task_on_cpu(rq, p)) {
 			/*
 			 * When on_rq && !on_cpu the task is preempted, see if
@@ -4267,10 +4285,6 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags)
 			 */
 			wakeup_preempt(rq, p, wake_flags);
 		}
-		if (proxy_needs_return(rq, p)) {
-			_trace_sched_pe_return_migration(p);
-			goto out;
-		}
 		ttwu_do_wakeup(p);
 		ret = 1;
 	}
@@ -5546,6 +5560,7 @@ static void do_balance_callbacks(struct rq *rq, struct balance_callback *head)
 	}
 }

+#ifdef CONFIG_SCHED_PROXY_EXEC
 /*
 * Only called from __schedule context
 *
@@ -5573,6 +5588,7 @@ static void zap_balance_callbacks(struct rq *rq)
 	}
 	rq->balance_callback = found ? &balance_push_callback : NULL;
 }
+#endif /* CONFIG_SCHED_PROXY_EXEC */

 static void balance_push(struct rq *rq);

@@ -5642,9 +5658,11 @@ void balance_callbacks(struct rq *rq, struct balance_callback *head)

 #else

+#ifdef CONFIG_SCHED_PROXY_EXEC
 static inline void zap_balance_callbacks(struct rq *rq)
 {
 }
+#endif /* CONFIG_SCHED_PROXY_EXEC */

 static inline void __balance_callbacks(struct rq *rq)
 {
@@ -7165,13 +7183,10 @@ static bool try_to_block_task(struct rq *rq, struct task_struct *p,
 }

 #ifdef CONFIG_SCHED_PROXY_EXEC
-
-static inline struct task_struct *
-proxy_resched_idle(struct rq *rq)
+static inline struct task_struct *proxy_resched_idle(struct rq *rq)
 {
-	put_prev_task(rq, rq->donor);
+	put_prev_set_next_task(rq, rq->donor, rq->idle);
 	rq_set_donor(rq, rq->idle);
-	set_next_task(rq, rq->idle);
 	set_tsk_need_resched(rq->idle);
 	return rq->idle;
 }
@@ -7192,11 +7207,10 @@ proxy_resched_idle(struct rq *rq)
 static void proxy_migrate_task(struct rq *rq, struct rq_flags *rf,
 			       struct task_struct *p, int target_cpu)
 {
+	struct rq *target_rq = cpu_rq(target_cpu);
 	LIST_HEAD(migrate_list);
-	struct rq *target_rq;

 	lockdep_assert_rq_held(rq);
-	target_rq = cpu_rq(target_cpu);

 	/*
 	 * Since we're going to drop @rq, we have to put(@rq->donor) first,
@@ -7219,8 +7233,8 @@ static void proxy_migrate_task(struct rq *rq, struct rq_flags *rf,
 	/* XXX - Added to address problems with changed dl_server semantics - double check */
 	__put_prev_set_next_dl_server(rq, rq->donor, rq->curr);
 	put_prev_task(rq, rq->donor);
-	rq_set_donor(rq, rq->curr);
-	set_next_task(rq, rq->curr);
+	rq_set_donor(rq, rq->idle);
+	set_next_task(rq, rq->idle);

 	for (; p; p = p->blocked_donor) {
 		WARN_ON(p == rq->curr);
@@ -7246,12 +7260,41 @@ static void proxy_migrate_task(struct rq *rq, struct rq_flags *rf,
 	raw_spin_rq_unlock(target_rq);
 	raw_spin_rq_lock(rq);
 	rq_repin_lock(rq, rf);
+}

-	/*
-	 * Ok, now we have the lock again, put rq->curr and
-	 * set_next_task() to idle
-	 */
-	proxy_resched_idle(rq);
+static void proxy_force_return(struct rq *rq, struct rq_flags *rf,
+			       struct task_struct *p)
+{
+	lockdep_assert_rq_held(rq);
+
+	_trace_sched_pe_return_migration(p);
+
+	put_prev_task(rq, rq->donor);
+	rq_set_donor(rq, rq->idle);
+	set_next_task(rq, rq->idle);
+
+	WARN_ON(p == rq->curr);
+
+	p->blocked_on_state = BO_WAKING;
+	get_task_struct(p);
+	block_task(rq, p, 0);
+
+	zap_balance_callbacks(rq);
+	rq_unpin_lock(rq, rf);
+	raw_spin_rq_unlock(rq);
+
+	wake_up_process(p);
+	put_task_struct(p);
+
+	raw_spin_rq_lock(rq);
+	rq_repin_lock(rq, rf);
+}
+
+static inline bool proxy_can_run_here(struct rq *rq, struct task_struct *p)
+{
+	if (p == rq->curr || p->wake_cpu == cpu_of(rq))
+		return true;
+	return false;
 }
 #else /* !CONFIG_SMP */
 static inline
@@ -7259,6 +7302,17 @@ void proxy_migrate_task(struct rq *rq, struct rq_flags *rf,
 			struct task_struct *p, int target_cpu)
 {
 }
+
+static inline
+void proxy_force_return(struct rq *rq, struct rq_flags *rf,
+			struct task_struct *p)
+{
+}
+
+static inline bool proxy_can_run_here(struct rq *rq, struct task_struct *p)
+{
+	return true;
+}
 #endif /* CONFIG_SMP */

 static void proxy_enqueue_on_owner(struct rq *rq, struct task_struct *owner,
@@ -7314,7 +7368,6 @@ static struct task_struct *
 find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
 {
 	struct task_struct *owner = NULL;
-	struct task_struct *ret = NULL;
 	bool curr_in_chain = false;
 	int this_cpu = cpu_of(rq);
 	struct task_struct *p;
@@ -7331,18 +7384,48 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
 		 * By taking mutex->wait_lock we hold off concurrent mutex_unlock()
 		 * and ensure @owner sticks around.
 		 */
-		raw_spin_lock(&mutex->wait_lock);
-		raw_spin_lock(&p->blocked_lock);
+		guard(raw_spinlock)(&mutex->wait_lock);
+		guard(raw_spinlock)(&p->blocked_lock);

 		/* Check again that p is blocked with blocked_lock held */
-		if (mutex != get_task_blocked_on(p)) {
+		if (mutex != __get_task_blocked_on(p)) {
 			/*
 			 * Something changed in the blocked_on chain and
 			 * we don't know if only at this level. So, let's
-			 * just bail out completely and let __schedule
+			 * just bail out completely and let __schedule()
 			 * figure things out (pick_again loop).
 			 */
-			goto out;
+			return NULL;
+		}
+
+		/* Double check blocked_on_state now we're holding the lock */
+		if (p->blocked_on_state == BO_RUNNABLE)
+			return p;
+
+		/*
+		 * If a ww_mutex hits the die/wound case, it marks the task as
+		 * BO_WAKING and calls try_to_wake_up(), so that the mutex
+		 * cycle can be broken and we avoid a deadlock.
+		 *
+		 * However, if at that moment, we are here on the cpu which the
+		 * die/wounded task is enqueued, we might loop on the cycle as
+		 * BO_WAKING still causes task_is_blocked() to return true
+		 * (since we want return migration to occur before we run the
+		 * task).
+		 *
+		 * Unfortunately since we hold the rq lock, it will block
+		 * try_to_wake_up from completing and doing the return
+		 * migration.
+		 *
+		 * So when we hit a BO_WAKING task try to wake it up ourselves.
+		 */
+		if (p->blocked_on_state == BO_WAKING) {
+			if (task_current(rq, p)) {
+				/* If its current just set it runnable */
+				__force_blocked_on_runnable(p);
+				return p;
+			}
+			goto needs_return;
 		}

 		if (task_current(rq, p))
@@ -7351,61 +7434,22 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
 		owner = __mutex_owner(mutex);
 		if (!owner) {
 			/* If the owner is null, we may have some work to do */
+			if (!proxy_can_run_here(rq, p))
+				goto needs_return;

-			/* First if p is no longer blocked, just return it to run */
-			if (!task_is_blocked(p)) {
-				ret = p;
-				goto out;
-			}
-
-			goto needs_return;
+			__force_blocked_on_runnable(p);
+			return p;
 		}

-		owner_cpu = task_cpu(owner);
-		if (owner_cpu != this_cpu) {
-			trace_sched_pe_migration(donor, owner);
-
-			/*
-			 * @owner can disappear, simply migrate to @owner_cpu and leave that CPU
-			 * to sort things out.
-			 */
-			raw_spin_unlock(&p->blocked_lock);
-			raw_spin_unlock(&mutex->wait_lock);
-			if (curr_in_chain)
-				return proxy_resched_idle(rq);
-
-			proxy_migrate_task(rq, rf, p, owner_cpu);
-			return NULL;
-		}
-
-		if (task_on_rq_migrating(owner)) {
-			trace_sched_pe_owner_is_migrating(owner, p);
-
-			/*
-			 * One of the chain of mutex owners is currently migrating to this
-			 * CPU, but has not yet been enqueued because we are holding the
-			 * rq lock. As a simple solution, just schedule rq->idle to give
-			 * the migration a chance to complete. Much like the migrate_task
-			 * case we should end up back in find_proxy_task(), this time
-			 * hopefully with all relevant tasks already enqueued.
-			 */
-			raw_spin_unlock(&p->blocked_lock);
-			raw_spin_unlock(&mutex->wait_lock);
-			return proxy_resched_idle(rq);
-		}
-
-		if (!owner->on_rq || owner->se.sched_delayed) {
+		if (!READ_ONCE(owner->on_rq) || owner->se.sched_delayed) {
 			/*
 			 * rq->curr must not be added to the blocked_head list or else
 			 * ttwu_do_activate could enqueue it elsewhere before it switches
 			 * out here. The approach to avoid this is the same as in the
 			 * migrate_task case.
 			 */
-			if (curr_in_chain) {
-				raw_spin_unlock(&p->blocked_lock);
-				raw_spin_unlock(&mutex->wait_lock);
+			if (curr_in_chain)
 				return proxy_resched_idle(rq);
-			}

 			/*
 			 * If !@owner->on_rq, holding @rq->lock will not pin the task,
@@ -7415,26 +7459,52 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
 			 * We use @owner->blocked_lock to serialize against ttwu_activate().
 			 * Either we see its new owner->on_rq or it will see our list_add().
 			 */
-			if (owner != p) {
-				raw_spin_unlock(&p->blocked_lock);
-				raw_spin_lock(&owner->blocked_lock);
-			}
-
+			WARN_ON(owner == p);
+			raw_spin_unlock(&p->blocked_lock);
+			raw_spin_lock(&owner->blocked_lock);
 			proxy_resched_idle(rq);
 			proxy_enqueue_on_owner(rq, owner, p);
-
 			raw_spin_unlock(&owner->blocked_lock);
-			raw_spin_unlock(&mutex->wait_lock);
+			raw_spin_lock(&p->blocked_lock);
+
 			return NULL; /* retry task selection */
 		}

+		owner_cpu = task_cpu(owner);
+		if (owner_cpu != this_cpu) {
+			trace_sched_pe_migration(donor, owner);
+			/*
+			 * @owner can disappear, simply migrate to @owner_cpu
+			 * and leave that CPU to sort things out.
+			 */
+			if (curr_in_chain)
+				return proxy_resched_idle(rq);
+			goto migrate;
+		}
+
+		if (task_on_rq_migrating(owner)) {
+			trace_sched_pe_owner_is_migrating(owner, p);
+			/*
+			 * One of the chain of mutex owners is currently migrating to this
+			 * CPU, but has not yet been enqueued because we are holding the
+			 * rq lock. As a simple solution, just schedule rq->idle to give
+			 * the migration a chance to complete. Much like the migrate_task
+			 * case we should end up back in find_proxy_task(), this time
+			 * hopefully with all relevant tasks already enqueued.
+			 */
+			return proxy_resched_idle(rq);
+		}
+
 		/*
-		 * We could race with ttwu's return migration, so holding the
-		 * rq lock, double check owner is both on_rq & on this cpu, as
-		 * it might not even be on our RQ still
+		 * Its possible to race where after we check owner->on_rq
+		 * but before we check (owner_cpu != this_cpu) that the
+		 * task on another cpu was migrated back to this cpu. In
+		 * that case it could slip by our  checks. So double check
+		 * we are still on this cpu and not migrating. If we get
+		 * inconsistent results, try again.
 		 */
-		if (!(task_on_rq_queued(owner) && task_cpu(owner) == this_cpu))
-			goto out;
+		if (!task_on_rq_queued(owner) || task_cpu(owner) != this_cpu)
+			return NULL;

 		if (owner == p) {
 			/*
@@ -7456,81 +7526,34 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
 			 *
 			 * Which leaves us to finish the ttwu_runnable() and make it go.
 			 *
-			 * So schedule rq->idle so that ttwu_runnable can get the rq lock
-			 * and mark owner as running.
+			 * So schedule rq->idle so that ttwu_runnable() can get the rq
+			 * lock and mark owner as running.
 			 */
-			if (p->blocked_on_state == BO_WAKING)
-				goto needs_return;
-
-			raw_spin_unlock(&p->blocked_lock);
-			raw_spin_unlock(&mutex->wait_lock);
 			return proxy_resched_idle(rq);
 		}
-		/*
-		 * If a ww_mutex hits the die/wound case, it marks the task as
-		 * BO_WAKING and calls try_to_wake_up(), so that the mutex
-		 * cycle can be broken and we avoid a deadlock.
-		 *
-		 * However, if at that moment, we are here on the cpu which the
-		 * die/wounded task is enqueued, we might loop on the cycle as
-		 * BO_WAKING still causes task_is_blocked() to return true
-		 * (since we want return migration to occur before we run the
-		 * task).
-		 *
-		 * Unfortunately since we hold the rq lock, it will block
-		 * try_to_wake_up from completing and doing the return
-		 * migration.
-		 *
-		 * So when we hit a BO_WAKING task that has a valid mutex, and
-		 * that mutex has an owner, we're hitting a  mid-chain wakeup,
-		 * so we can briefly schedule idle so we release the rq and
-		 * let the wakeup complete.
-		 */
-		if (p->blocked_on_state == BO_WAKING)
-			goto needs_return;
-
 		/*
 		 * OK, now we're absolutely sure @owner is on this
 		 * rq, therefore holding @rq->lock is sufficient to
 		 * guarantee its existence, as per ttwu_remote().
 		 */
-		raw_spin_unlock(&p->blocked_lock);
-		raw_spin_unlock(&mutex->wait_lock);
-
 		owner->blocked_donor = p;
 	}

 	WARN_ON_ONCE(owner && !owner->on_rq);
 	return owner;

-needs_return:
-#ifdef CONFIG_SMP
-	WARN_ON(!is_cpu_allowed(p, p->wake_cpu));
-	if (p->wake_cpu == this_cpu) {
-		/* We can actually run here fine */
-		p->blocked_on_state = BO_RUNNABLE;
-		ret = p;
-		goto out;
-	}
-	raw_spin_unlock(&p->blocked_lock);
-	raw_spin_unlock(&mutex->wait_lock);
-
-	if (curr_in_chain)
-		return proxy_resched_idle(rq);
-
-	p->blocked_on_state = BO_RUNNABLE;
-	_trace_sched_pe_return_migration(p);
-	proxy_migrate_task(rq, rf, p, p->wake_cpu);
+	/*
+	 * NOTE: This logic is down here, because we need to call
+	 * the functions with the mutex wait_lock and task
+	 * blocked_lock released, so we have to get out of the
+	 * guard() scope.
+	 */
+migrate:
+	proxy_migrate_task(rq, rf, p, owner_cpu);
+	return NULL;
+needs_return:
+	proxy_force_return(rq, rf, p);
 	return NULL;
-#else
-	/* Nowhere else to migrate on UP */
-	p->blocked_on_state = BO_RUNNABLE;
-	ret = p;
-#endif
-out:
-	raw_spin_unlock(&p->blocked_lock);
-	raw_spin_unlock(&mutex->wait_lock);
-	return ret;
 }
 #else /* SCHED_PROXY_EXEC */
 static struct task_struct *
@@ -7612,7 +7635,6 @@ static void __sched notrace __schedule(int sched_mode)
 	struct rq *rq;
 	bool prev_not_proxied;
 	int cpu;
-	bool preserve_need_resched = false;

 	cpu = smp_processor_id();
 	rq = cpu_rq(cpu);
@@ -7668,7 +7690,8 @@ static void __sched notrace __schedule(int sched_mode)
 			goto picked;
 		}
 	} else if (!preempt && prev_state) {
-		block = try_to_block_task(rq, prev, prev_state, !task_is_blocked(prev));
+		block = try_to_block_task(rq, prev, prev_state,
+					  !task_is_blocked(prev));
 		switch_count = &prev->nvcsw;
 	}

@@ -7681,19 +7704,16 @@ pick_again:
 	next->blocked_donor = NULL;
 	if (unlikely(task_is_blocked(next))) {
 		next = find_proxy_task(rq, next, &rf);
-		if (!next) {
-			/* zap the balance_callbacks before picking again */
-			zap_balance_callbacks(rq);
+		if (!next)
 			goto pick_again;
-		}
 		if (next == rq->idle)
-			preserve_need_resched = true;
+			goto keep_resched;
 	}
 	trace_sched_finish_task_selection(rq->donor, next, cpu);
 picked:
-	if (!preserve_need_resched)
-		clear_tsk_need_resched(prev);
+	clear_tsk_need_resched(prev);
 	clear_preempt_need_resched();
+keep_resched:
 #ifdef CONFIG_SCHED_DEBUG
 	rq->last_seen_need_resched_ns = 0;
 #endif
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2485,6 +2485,10 @@ static void put_prev_task_dl(struct rq *rq, struct task_struct *p, struct task_s
 	update_curr_dl(rq);

 	update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 1);
+
+	if (task_is_blocked(p))
+		return;
+
 	if (on_dl_rq(&p->dl) && p->nr_cpus_allowed > 1)
 		enqueue_pushable_dl_task(rq, p);
 }
@@ -2679,34 +2683,18 @@ static struct task_struct *pick_next_pushable_dl_task(struct rq *rq)
 }

 static inline bool __dl_revalidate_rq_state(struct task_struct *task, struct rq *rq,
-					    struct rq *later, bool *retry)
+					    struct rq *later)
 {
-	if (task_rq(task) != rq)
-		return false;
-
-	if (!cpumask_test_cpu(later->cpu, &task->cpus_mask))
-		return false;
-
-	if (task_on_cpu(rq, task))
-		return false;
-
 	if (!dl_task(task))
 		return false;
-
-	if (is_migration_disabled(task))
-		return false;
-
-	if (!task_on_rq_queued(task))
-		return false;
-
-	return true;
+	return __revalidate_rq_state(task, rq, later);
 }

 static inline bool dl_revalidate_rq_state(struct task_struct *task, struct rq *rq,
 					  struct rq *later, bool *retry)
 {
 	if (!sched_proxy_exec())
-		return __dl_revalidate_rq_state(task, rq, later, retry);
+		return __dl_revalidate_rq_state(task, rq, later);

 	if (!dl_task(task) || is_migration_disabled(task))
 		return false;
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1192,7 +1192,7 @@ static void update_tg_load_avg(struct cfs_rq *cfs_rq)
 }
 #endif /* CONFIG_SMP */

-static s64 update_curr_se(struct rq *rq, struct sched_entity *se)
+static s64 update_se(struct rq *rq, struct sched_entity *se)
 {
 	u64 now = rq_clock_task(rq);
 	s64 delta_exec;
@@ -1203,6 +1203,7 @@ static s64 update_curr_se(struct rq *rq, struct sched_entity *se)

 	se->exec_start = now;
 	if (entity_is_task(se)) {
+		struct task_struct *donor = task_of(se);
 		struct task_struct *running = rq->curr;
 		/*
 		 * If se is a task, we account the time against the running
@@ -1210,8 +1211,14 @@ static s64 update_curr_se(struct rq *rq, struct sched_entity *se)
 		 */
 		running->se.exec_start = now;
 		running->se.sum_exec_runtime += delta_exec;
+
+		trace_sched_stat_runtime(running, delta_exec);
+		account_group_exec_runtime(running, delta_exec);
+
+		/* cgroup time is always accounted against the donor */
+		cgroup_account_cputime(donor, delta_exec);
 	} else {
-		/* If not task, account the time against se */
+		/* If not task, account the time against donor se  */
 		se->sum_exec_runtime += delta_exec;
 	}

@@ -1226,13 +1233,6 @@ static s64 update_curr_se(struct rq *rq, struct sched_entity *se)
 	return delta_exec;
 }

-static inline void update_curr_task(struct task_struct *p, s64 delta_exec)
-{
-	trace_sched_stat_runtime(p, delta_exec);
-	account_group_exec_runtime(p, delta_exec);
-	cgroup_account_cputime(p, delta_exec);
-}
-
 static inline bool did_preempt_short(struct cfs_rq *cfs_rq, struct sched_entity *curr)
 {
 	if (!sched_feat(PREEMPT_SHORT))
@@ -1271,13 +1271,8 @@ static inline bool do_preempt_short(struct cfs_rq *cfs_rq,
 s64 update_curr_common(struct rq *rq)
 {
 	struct task_struct *donor = rq->donor;
-	s64 delta_exec;

-	delta_exec = update_curr_se(rq, &donor->se);
-	if (likely(delta_exec > 0))
-		update_curr_task(donor, delta_exec);
-
-	return delta_exec;
+	return update_se(rq, &donor->se);
 }

 /*
@@ -1285,6 +1280,12 @@ s64 update_curr_common(struct rq *rq)
 */
 static void update_curr(struct cfs_rq *cfs_rq)
 {
+	/*
+	 * Note: cfs_rq->curr corresponds to the task picked to
+	 * run (ie: rq->donor.se) which due to proxy-exec may
+	 * not necessarily be the actual task running
+	 * (rq->curr.se). This is easy to confuse!
+	 */
 	struct sched_entity *curr = cfs_rq->curr;
 	struct rq *rq = rq_of(cfs_rq);
 	s64 delta_exec;
@@ -1293,7 +1294,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
 	if (unlikely(!curr))
 		return;

-	delta_exec = update_curr_se(rq, curr);
+	delta_exec = update_se(rq, curr);
 	if (unlikely(delta_exec <= 0))
 		return;

@@ -1302,10 +1303,6 @@ static void update_curr(struct cfs_rq *cfs_rq)
 	update_min_vruntime(cfs_rq);

 	if (entity_is_task(curr)) {
-		struct task_struct *p = task_of(curr);
-
-		update_curr_task(p, delta_exec);
-
 		/*
 		 * If the fair_server is active, we need to account for the
 		 * fair_server time whether or not the task is running on
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1511,25 +1511,14 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags)

 	enqueue_rt_entity(rt_se, flags);

-	/*
-	 * Current can't be pushed away. Selected is tied to current,
-	 * so don't push it either.
-	 */
-	if (task_current(rq, p) || task_current_donor(rq, p))
-		return;
-	/*
-	 * Pinned tasks can't be pushed.
-	 */
-	if (p->nr_cpus_allowed == 1)
-		return;
-
 	if (should_honor_rt_sync(rq, p, sync))
 		return;

 	if (task_is_blocked(p))
 		return;

-	enqueue_pushable_task(rq, p);
+	if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
+		enqueue_pushable_task(rq, p);
 }

 static bool dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags)
@@ -2035,42 +2024,18 @@ static struct task_struct *pick_next_pushable_task(struct rq *rq)
 }

 static inline bool __rt_revalidate_rq_state(struct task_struct *task, struct rq *rq,
-					    struct rq *lowest, bool *retry)
+					    struct rq *lowest)
 {
-	/*
-	 * We had to unlock the run queue. In the mean time, task could have
-	 * migrated already or had its affinity changed. Also make sure that it
-	 * wasn't scheduled on its rq. It is possible the task was scheduled,
-	 * set "migrate_disabled" and then got preempted, so we must check the
-	 * task migration disable flag here too.
-	 */
-	if (task_rq(task) != rq)
-		return false;
-
-	if (!cpumask_test_cpu(lowest->cpu, &task->cpus_mask))
-		return false;
-
-	if (task_on_cpu(rq, task))
-		return false;
-
 	if (!rt_task(task))
 		return false;
-
-	if (is_migration_disabled(task))
-		return false;
-
-	if (!task_on_rq_queued(task))
-		return false;
-
-	return true;
+	return __revalidate_rq_state(task, rq, lowest);
 }

-/* XXX: TODO: Consolidate this w/ dl_revalidate_rq_state */
 static inline bool rt_revalidate_rq_state(struct task_struct *task, struct rq *rq,
 					  struct rq *lowest, bool *retry)
 {
 	if (!sched_proxy_exec())
-		return __rt_revalidate_rq_state(task, rq, lowest, retry);
+		return __rt_revalidate_rq_state(task, rq, lowest);
 	/*
 	 * Releasing the rq lock means we need to re-check pushability.
 	 * Some scenarios:
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2344,7 +2344,7 @@ static inline int task_on_cpu(struct rq *rq, struct task_struct *p)

 static inline int task_on_rq_queued(struct task_struct *p)
 {
-	return p->on_rq == TASK_ON_RQ_QUEUED;
+	return READ_ONCE(p->on_rq) == TASK_ON_RQ_QUEUED;
 }

 static inline int task_on_rq_migrating(struct task_struct *p)
@@ -3169,6 +3169,34 @@ extern void set_rq_offline(struct rq *rq);

 extern bool sched_smp_initialized;

+static inline bool __revalidate_rq_state(struct task_struct *task, struct rq *rq,
+					 struct rq *lowest)
+{
+	/*
+	 * We had to unlock the run queue. In the mean time, task could have
+	 * migrated already or had its affinity changed. Also make sure that it
+	 * wasn't scheduled on its rq. It is possible the task was scheduled,
+	 * set "migrate_disabled" and then got preempted, so we must check the
+	 * task migration disable flag here too.
+	 */
+	if (task_rq(task) != rq)
+		return false;
+
+	if (!cpumask_test_cpu(lowest->cpu, &task->cpus_mask))
+		return false;
+
+	if (task_on_cpu(rq, task))
+		return false;
+
+	if (is_migration_disabled(task))
+		return false;
+
+	if (!task_on_rq_queued(task))
+		return false;
+
+	return true;
+}
+
 #else /* !CONFIG_SMP: */

 /*
@@ -3922,6 +3950,7 @@ int __task_is_pushable(struct rq *rq, struct task_struct *p, int cpu)

 	return 0;
 }
+#endif /* CONFIG_SMP */

 #ifdef CONFIG_SCHED_PROXY_EXEC
 void move_queued_task_locked(struct rq *rq, struct rq *dst_rq, struct task_struct *task);
@@ -3946,7 +3975,6 @@ struct task_struct *find_exec_ctx(struct rq *rq, struct task_struct *p)
 	return p;
 }
 #endif /* CONFIG_SCHED_PROXY_EXEC */
-#endif /* CONFIG_SMP */

 #ifdef CONFIG_RT_MUTEXES