Server crashes using 2.0.32 and SMP

I have been having some problems with server crashes. On two occasions I was able to have personnel at the co-location facility, where my server lives, look at the console immediately after a crash.

The kernel version running was 2.0.32 w/ SMP support on a dual Pentium Pro box.

When the server would crash, a message would be continuously displayed on the console (but not in the syslog):

Aiee: scheduling in interrupt: 0012BBD1

A search of the sources found that this condition was tested for in /usr/src/linux/sched.c on line 396 and the message printed on line 497.

It would appear that an interrupt was encountered during the schedule() operation. This would be a bad thing. (It's not nice to re-enter the scheduler via an interrupt)

Since the address being printed is, presumably, the return address after the schedule call, and is consistent, I am assuming that the scheduler is being re-entered while servicing some sort of interrupt from within the same ISR.

First, are my assumptions even close to reality?

Secondly, is this a "known" issue with the 2.0.32 kernel. I understand there have been some changes in the kernel SMP code between 2.0.32 and 2.0.33 so I am wondering if upgrading the kernel will fix this.

Thirdly, does this indicate some sort of hardware failure and if so, how can I trace this back to the device in question.

Finally, I am open to suggestions for other ideas and/or options here.

As always, any help is appreciated. Most suggestions taken seriously :)