Having only one test was really nice here, maybe we simply read a
barrier before reading the status?

I agree - but the alternative is letting all modifications of
xnsched::status use atomic bitops (that's required when folding all bits
into a single word). And that should be much more costly, specifically
on SMP.

What about issuing a barrier before testing the status?

The problem is not about reading but writing the status concurrently,
thus it's not about the code you see above.

The bits are modified under nklock, which implies a barrier when
unlocked. Furthermore, an IPI is guaranteed to be received on the remote
CPU after this barrier, so, a barrier should be enough to see the
modifications which have been made remotely.

Check nucleus/intr.c for tons of unprotected status modifications.

Ok. Then maybe, we should reconsider the original decision to start
fiddling with the XNRESCHED bit remotely.

...which removed complexity and fixed a race? Let's better review the
checks done in xnpod_schedule vs. its callers, I bet there is more to
save (IOW: remove the need to test for sched->resched).

Not that much complexitiy... and the race was a false positive in debug
code, no big deal. At least it worked, and it has done so for a long
time. No atomic needed, no barrier, only one test in xnpod_schedule. And
a nice invariant: sched->status is always accessed on the local cpu.
What else?

Take a step back and look at the root cause for this issue again. Unlocked
if need-resched
__xnpod_schedule
is inherently racy and will always be (not only for the remote
reschedule case BTW). So we either have to accept this and remove the
debugging check from the scheduler or push the check back to
__xnpod_schedule where it once came from. When this it cleaned up, we
can look into the remote resched protocol again.

Probably being daft here; why not stop fiddling with remote CPU status
bits and always do a reschedule on IPI irq's?