> Ingo Molnar a écrit :> > > > Why not use the obvious solution: a _single_ wrlock for global > > access and read_can_lock() plus per cpu locks in the fastpath?> > Obvious is not the qualifier I would use :)> > Brilliant yes :)

Note that C1 got "ripped apart" into C1A and C1B with C2 injected - reducing cache locality between C1A and C1B. We have to execute C1B no matter what, so we didnt actually win anything in terms of total work to do, by processing C2 out of order.

[ Preemption of work (which this kind of nesting is really about) is _the anti-thesis of performance_, and we try to delay it as much as possible and we try to batch up as much as possible. For example the CPU scheduler will try _real_ hard to not preempt a typical workload, as long as external latency boundaries allow that. ]

2) Locking complexity and robustness. Nested locking is rank #1 in terms of introducing regressions into the kernel.

3) Instrumentation/checking complexity. Checking locking dependencies is good and catches a boatload of bugs before they hit upstream, and nested locks are supported but cause an exponential explosion in terms of dependencies to check.

Also, whenever instrumentation explodes is typically the sign of some true, physical complexity that has been introduced into the code. So it often is a canary for a misdesign at a fundamental level, not a failure in the instrumentation framework.

In the past i saw lock nesting often used as a wrong solution when the critical sections were too long (causing too long latencies for critical work - e.g. delaying hardirq completion processing unreasonably), or just plain out of confusion about the items above.

I dont know whether that's the case here - it could be one of the rare exceptions calling for a new locking primitive (which should then be introduced at the core kernel level IMHO) - i dont know the code that well.