Re: tstile lockups - test case

In article <p06240800cdd916f42451@[71.39.101.51]>,
Donald Lee <MacPPC2%c.icompute.com@localhost> wrote:
>At 11:55 PM +0000 3/30/13, Valery Ushakov wrote:
>>Donald Lee <MacPPC2%c.icompute.com@localhost> wrote:
>>> At 5:22 PM +0000 3/30/13, Valery Ushakov wrote:
>>>
>>>>Next time it hangs you can break into DDB from console and check with
>>>>ps (ddb command, not ordinary command; see ddb(4)). I've seen such
>>>>lockup once on my newly installed mini g4 when I was copying over
>>>>several cvs trees from the old machine.
>>>
>>> Console does not respond (aside from the first CR, sometimes).
>>> I don't think I can use ddb. I'll read the man page...... Thanks
>>
>>Press Ctrl-Alt-Esc on the console keyboard to break into ddb. It
>>should work on tstiled machine even when normal console input isn't.
>>IIRC, NetBSD 6 should have the necessary fixes for USB keyboard to be
>>usable with ddb.
>>
>>-uwe
>
>I can reproduce the tstile hang fairly easily and am looking for
>a path to fix it. I don't really know how to use ddb, and am not very
>familiar with the internals of the kernel. What I have done so
>far is put together a test case that reliably causes the problem
>in a few hours or less.
>
>The test case is just a shell script. I have apache running on the
>machine, and have a shell script doing
>wget operations on http://127.0.0.1/... The network interface
>is down (gem0) when the failure occurs, so it doesn't look like a driver
>problem. When the hang occurs, I cannot get a command launched, so no
>user-level debugging is possible, but I can break into ddb with the
>ctrl-alt-esc sequence. When I break into ddb I can do "ps", and see
>many, many processes waiting on "tstile".
>
>I have run this test maybe 10 times, and the failures all look about
>the same.
>
>This smells to me like a race condition - a small timing window that gets
>hit somewhere. Those are always fun to find. ;->
>
>I have run the same test case on a VM instance of an i386 NetBSD install,
>and it does not fail (so far), so I'm pretty sure it's a macppc-specific
>problem.
>
>Anyone have suggestions on how to track this down? What's the shortest path
>to my getting enough ddb expertise to help track this down, or getting my
>test case in the hands of someone with the requisite skill?
I would stary by running a DIAGNOSTIC/DEBUG/LOCKDEBUG kernel. If that does
not find the deadlock, at least it will let us look at the locks more easily.
christos