On Tue, 2005-01-25 at 19:09 -0700, Kim Lux wrote:
> How do I figure out what is causing the problem ? I've checked the
> system logs, but they are clean.
With lots of crashes lately but never an oops or panic message to
report, I was about to have the same question, but just to be safe I
left memtest86 running today, and found bad ram :(
I'm running with mem=236M for now to block out the bad parts, but has
there been an RFE for the badram kernel patch? (not seeing any on
bugzilla, not even closed GOAWAY or BADIDEA or whatever) We've
already got a version of memtest86 that can spit out the badram
values... Assuming the labor of maintaining it in the patchset isn't
too high, I think it's probably a better thing to recognize that
people are going to use imperfect hardware and give them a way to deal
with it, than to decide that everyone needs new hardware. (start
flamewar now)
http://rick.vanrein.org/linux/badram/
If that turns out not to be the (only) problem, what *is* the best way
to get debug info from bad crashes, where even alt-sysrq-jitsu does no
good? I know about the serial console capability; lately I've also
seen stuff about diskdump and netdump... which of these is most
likely to survive serious kernel problems long enough to get a useful
report that can be bugzilla'ed?