Re: SPL NOT ZERO ON SYSCALL ENTRY

On Wed, Mar 12, 2008 at 12:46:07AM +0100, Matthias Drochner wrote:
> wrstuden%NetBSD.org@localhost said:
> > My guess is the problem was a missing splx() in an error-handling
> > case.
>
> That's what I'd also had assumed if the reason was
> "not lowered in syscall exit". But this was on entry...
>
> Looking at the kernel msgbuf in the crashdump I found
> however that there were indeed a number of "not lowered
> on syscall exit" and "not lowered on trap exit" messages,
> just these don't trap into DDB.
> So I've added a DDB trap to the syscall exit check and
> triggered the problem again.
> What I'm seeing now is
>
> (gdb) print cpu_info_primary.ci_next->ci_istate
> $15 = {ipending = 0x40000000, ilevel = 0x7}
>
> This makes me think that the softintr() code
> is to blame.
It's not an splhigh/splx mismatch. I've checked all those numerous times and
there isn't a problem in the MI or x86 parts of the kernel that I can see.
It's not sloppy locking because that would have been detected by LOCKDEBUG
by now, and it's not an out-of-order lock release because the mutexes handle
that.
I also suspect it's some corner case or race condition with the soft
interrupt code.
> I din't completely understand how this is supposed to
> work, but it could be a wrong error message as well:
> the softintr() code doesn't even try to save and
> restore its ilevel. So it appearently relies on the
> syscall code to clean up? Can somone explain?
The previous priority is always restored by Xspllower or Xdoreti on return.
If the soft interrupt sleeps then the first time it is restored there, and
on subsequent sleeps during the same run it's restored like any other normal
thread, by the thread switched to in mi_switch() or lwp_exit_swichaway().
Andrew