On Thu, Sep 28, 2006 at 02:24:12PM -0400, Peter Watkins wrote:
> Greetings SGI Origin wizards,
>
> I'm doing some SMP testing on an SGI Origin 200 (ip27).
>
> I started with a 2.6.15 vintage kernel and added changes from here:
>ftp://ftp.linux-mips.org/pub/linux/mips/people/ralf/ip27/
There have been a few significant fixes to the Origin code IP27 since
2.6.15 ...
> It boots both processors and runs OK.
>
> Then I turn on CONFIG_DEBUG_SPINLOCK and CONFIG_DEBUG_SPINLOCK_SLEEP,
> and get lots of lockup messages. A typical one is below.
>
> Anyone seen this? Some of the low-level lock code has R10000_LLSC_WAR
> versions, but I don't see anything wrong there.
The R10000_LLSC_WAR is a workaround for a CPU bug in certain relativly
old version of the R10000 processor. Version 2.6 or older were affected
but the cutoff version number could have been 2.7. Anyway, the sympthom
was that possibly multiple processors were taking a able to grab a
spinlock which obvious is the way to disaster. I originally found the
problem when analyzing why rebuilding a MD RAID array was resulting in a
crash.
Even with that fix applied I found a MD RAID 5 / 6 not very stable as of
last week; it seems this instability is limited to IP27 and IP30 and it
seemed like the various kernel debuging options I tried were aggrevating
the problem significantly.
Ralf