rcu: Restrict access to RCU CPU stall notifiers

Although the RCU CPU stall notifiers can be useful for dumping state when
tracking down delicate forward-progress bugs where NUMA effects cause
cache lines to be delivered to a given CPU regularly, but always in a
state that prevents that CPU from making forward progress.  These bugs can
be detected by the RCU CPU stall-warning mechanism, but in some cases,
the stall-warnings printk()s disrupt the forward-progress bug before
any useful state can be obtained.

Unfortunately, the notifier mechanism added by commit 5b404fdaba ("rcu:
Add RCU CPU stall notifier") can make matters worse if used at all
carelessly. For example, if the stall warning was caused by a lock not
being released, then any attempt to acquire that lock in the notifier
will hang. This will prevent not only the notifier from producing any
useful output, but it will also prevent the stall-warning message from
ever appearing.

This commit therefore hides this new RCU CPU stall notifier
mechanism under a new RCU_CPU_STALL_NOTIFIER Kconfig option that
depends on both DEBUG_KERNEL and RCU_EXPERT.  In addition, the
rcupdate.rcu_cpu_stall_notifiers=1 kernel boot parameter must also
be specified.  The RCU_CPU_STALL_NOTIFIER Kconfig option's help text
contains a warning and explains the dangers of careless use, recommending
lockless notifier code.  In addition, a WARN() is triggered each time
that an attempt is made to register a stall-warning notifier in kernels
built with CONFIG_RCU_CPU_STALL_NOTIFIER=y.

This combination of measures will keep use of this mechanism confined to
debug kernels and away from routine deployments.

[ paulmck: Apply Dan Carpenter feedback. ]

Fixes: 5b404fdaba ("rcu: Add RCU CPU stall notifier")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.iitr10@gmail.com>
This commit is contained in:
Paul E. McKenney
2023-11-01 18:28:38 -07:00
committed by Neeraj Upadhyay (AMD)
parent 98b1cc82c4
commit 4e58aaeebb
7 changed files with 62 additions and 12 deletions
+25
View File
@@ -105,6 +105,31 @@ config RCU_CPU_STALL_CPUTIME
The boot option rcupdate.rcu_cpu_stall_cputime has the same function
as this one, but will override this if it exists.
config RCU_CPU_STALL_NOTIFIER
bool "Provide RCU CPU-stall notifiers"
depends on RCU_STALL_COMMON
depends on DEBUG_KERNEL
depends on RCU_EXPERT
default n
help
WARNING: You almost certainly do not want this!!!
Enable RCU CPU-stall notifiers, which are invoked just before
printing the RCU CPU stall warning. As such, bugs in notifier
callbacks can prevent stall warnings from being printed.
And the whole reason that a stall warning is being printed is
that something is hung up somewhere. Therefore, the notifier
callbacks must be written extremely carefully, preferably
containing only lockless code. After all, it is quite possible
that the whole reason that the RCU CPU stall is happening in
the first place is that someone forgot to release whatever lock
that you are thinking of acquiring. In which case, having your
notifier callback acquire that lock will hang, preventing the
RCU CPU stall warning from appearing.
Say Y here if you want RCU CPU stall notifiers (you don't want them)
Say N if you are unsure.
config RCU_TRACE
bool "Enable tracing for RCU"
depends on DEBUG_KERNEL