On Sun, Aug 31, 2008 at 07:45:02PM +0200, Manfred Spraul wrote:> Paul E. McKenney wrote:>> Assuming that the ordering of processing pending irqs and marking the>> CPU offline in cpu_online_mask can be resolved as noted above, it should>> work fine -- if a CPU's bit is clear, we can safely ignore it. The race>> can be resolved by checking the CPU's bit in force_quiescent_state().>>>> Or am I missing something?>> > Yes, that would work:> Rule 1: after CPU_DEAD, a cpu is gone. The cpu is quiet, rcu callbacks must > be moved to other cpus, ...> Rule 2: if a cpu is not listed in cpu_online_mask, then it can be > considered as outside a read-side critical section.>> The problem with rule 2 is that it means someone [force_quiescent_state()] > must poll the cpu_online_mask and look for changes.> I'd really prefer a notifier. CPU_DYING is nearly the correct thing, it > only has to be moved down 3 lines ;-)> (I want to kill the bitmaps, not add a hierarchical bitmap polling system!)

But some later CPU_DYING notifier might decide that the CPU cannot beremoved after all, which would mean bringing the CPU back. And thenwhatever the CPU was needed for might have actually happened in themeantime, which does not sound good to me...

>> It is entirely possible that rcu_try_flip_waitack() and>> rcu_try_flip_waitmb() need to check the AND of rcu_cpu_online_map and>> cpu_online_map. If this really is a problem (and it might well be),>> then the easiest fix is to check for cpu_is_offline(cpu) in both>> rcu_try_flip_waitmb_needed() and rcu_try_flip_waitack_needed(), and>> that in both versions of both functions. Thoughts?>> > I made a mistake, get_online_cpus() stores current, not a cpu number. Thus > the described race it not possible. Perhaps there are other users that > could deadlock.> I don't know enough about the preempt algorithm, thus I can't confirm if > your proposal would work or not.