Sup everybody. I've been getting this error ever since I got my new computer (dually AMD MP system on a Tyan Tiger MPX board). Does ANYBODY have any idea how to fix it or what is causing it? Basically, it happens about 50% of the time right after the kernel gets loaded, and the system immediately halts. The other 50% of the time I boot into Gentoo and everything works 100% fine (great job on the audigy drivers, btw).

I've been researching this on LNO for a while, but have come up with no answers yet. At first I thought it was a hardware issue, but my drives are all fine, I checked and tested them thoroughly. I also thought it was a cable issue (switched them out to normal ATA66/100 cables), and it still didn't fix it. I've also played around with the kernel about a million times, and searched Google extensively but found nothing (except possibly hardware issues, which I don't believe is so).

So basically... have any of you guys experienced this problem? I am thinking it is either a motherboard BIOS issue (I am waiting for the new BIOS to come out, it's still in beta-stage), or a kernel driver issue with the motherboard chipset or with the Promise ATA133 controller (I tried installing it 2x, once as onboard IDE and once on the Promise ATA133 card, same thing with both installs). Please help, as it is rather annoying to only be able to boot into Linux 50% of the time (as opposed to windows' 99% [heh, notice it's not 100%...] ), and I could really use some Gentoo-specific help here from y'all Gentoo-experts .

One last thing, I have tried it with both vanilla and gentoo-source kernels._________________McManus
----
Linux user #267375 - http://counter.li.org

If you're getting a CRC error on boot from the kernel -- and it's sporadic == it is almost certainly a hardware issue. Your hard drive might be going bad, you might have a bad cable, your mobo could be having issues, a lower chunk of your RAM might be flaky, etc. If the same data comes up with two different CRCs, then something is wrong.

If you're getting a CRC error on boot from the kernel -- and it's sporadic == it is almost certainly a hardware issue. Your hard drive might be going bad, you might have a bad cable, your mobo could be having issues, a lower chunk of your RAM might be flaky, etc. If the same data comes up with two different CRCs, then something is wrong.

Hrm, what do you mean by the same data coming up with 2 different CRCs? Is there a way to check? (it doesn't end up in any of my logs when I get the crc error)_________________McManus
----
Linux user #267375 - http://counter.li.org

Hrm, I am thinking it's a memory issue. However, I should note that windows never has any problems. Well, then again... it pauses every once in a while for the ECC, but it almost always recovers from it (even in the middle of a game)_________________McManus
----
Linux user #267375 - http://counter.li.org

Your error indicated that the CRC computed from the read-from-disk data was different than the CRC that was stored in it, meaning that something didn't compute right somehwere along the line. Plus, it changes, because it sometimes comes up correctly (matching the stored CRC). The kernel, as stored on disk, is being read or acted upon in two different ways -- randomly -- which just screams to me of a hardware issue. Something is running too hot/too fast/too hard/too long or is simply not working. I would say to check your hard drive, RAM, and CPU; definitely stop overclocking if you are.

Your error indicated that the CRC computed from the read-from-disk data was different than the CRC that was stored in it, meaning that something didn't compute right somehwere along the line. Plus, it changes, because it sometimes comes up correctly (matching the stored CRC). The kernel, as stored on disk, is being read or acted upon in two different ways -- randomly -- which just screams to me of a hardware issue. Something is running too hot/too fast/too hard/too long or is simply not working. I would say to check your hard drive, RAM, and CPU; definitely stop overclocking if you are.

I am definitely not overclocking, as it tends to break things, and I don't really need that much more speed. I am testing the RAM tonight, and have also tested the hard drives (and cables) extensively. If the RAM turns out to be okay, the only thing left will be the CPU. How on earth would I test it? The system almost NEVER crashes, and is very stable, WHEN I actually get into linux, that is. Maybe it's an SMP issue? No clue..._________________McManus
----
Linux user #267375 - http://counter.li.org

As far as the CPU goes, you could either try each CPU separately in another computer (that doesn't have any issues) and try stressing it (like doing sixteen concurrent kernel compiles from a RAM disk... or something). Or, (and this is less reliable,) you could try each CPU individually.

As far as the CPU goes, you could either try each CPU separately in another computer (that doesn't have any issues) and try stressing it (like doing sixteen concurrent kernel compiles from a RAM disk... or something). Or, (and this is less reliable,) you could try each CPU individually.

Do you have problems booting, say, the Gentoo install CD?

The install CD? Never any problems. And I have done like sixteen concurrent kernel compiles (tho not from a RAM disk, so er, nevermind, hehe). I also wish I had the $$$ to test it in another system, too.

Oh, one thing I should mention. I "feel" that the system boots up more often in non-fb mode (versus me using the vga=791 line in lilo.conf) and that I get into linux more often. Though it is only a very rough generalization. It's not like I sat there with pen & paper & counted (usually I am just very happy to be in linux, so I stay, hehe). I am using a GF4 Ti4600, so who knows, maybe it's that nVidia & AMD incompatibility thingy._________________McManus
----
Linux user #267375 - http://counter.li.org

If you haven't had any issues with the install CD (why not do a half dozen reboots or so just to make sure ), then it's most likely hard-drive related. Could be the cable, but that's doubtful; your disk could be failing.

Try getting md5sums of your /boot directory; i.e. mount -o ro /dev/BOOT /boot; md5sum --binary `find /boot -type f` and comparing them across reboots, both from the local disk and from the CD. You should be able to see then if the hard drive is returning the data correctly.

It'll really thrash your hard drive about but will a) give you a nice stress test and b) give you a quick and easy visual check that your boot partition is in order. My advice would be to run it once within Gentoo and once off the install CD and compare sums. If the numbers turn up different in sequential passes, you know you have a problem.

It'll really thrash your hard drive about but will a) give you a nice stress test and b) give you a quick and easy visual check that your boot partition is in order. My advice would be to run it once within Gentoo and once off the install CD and compare sums. If the numbers turn up different in sequential passes, you know you have a problem.

That's a cool little test, you should put that into the tips and tricks section.

That's a cool little test, you should put that into the tips and tricks section.

I just ran it on my drive to make sure, and it definately works.

Well, that's actually the hard way. It checks each of the individual files separately, and tends to stress things more (which is probably what we want in this case). You can md5 the whole partition like this:

Code:

dd if=/dev/hda1 2>/dev/null | md5sum

But, that's just one sequential read of the whole partition (which is less error-prone, but we want to find errors), so if it's got a lot of free space you'll run into problems.

If you haven't had any issues with the install CD (why not do a half dozen reboots or so just to make sure ), then it's most likely hard-drive related. Could be the cable, but that's doubtful; your disk could be failing.

Try getting md5sums of your /boot directory; i.e. mount -o ro /dev/BOOT /boot; md5sum --binary `find /boot -type f` and comparing them across reboots, both from the local disk and from the CD. You should be able to see then if the hard drive is returning the data correctly.

Okay.. I will try your "little" test But one quick question. Why is it that if I can boot the install CD's kernel fine but not my own, then it's a HD issue? Why wouldn't that be a "misconfigured" kernel issue?_________________McManus
----
Linux user #267375 - http://counter.li.org

It'll really thrash your hard drive about but will a) give you a nice stress test and b) give you a quick and easy visual check that your boot partition is in order. My advice would be to run it once within Gentoo and once off the install CD and compare sums. If the numbers turn up different in sequential passes, you know you have a problem.

ahhh..... that's NOT an apostrophe.. it's the thing next to the 1...

btw, I get different results with the for-loop (for blah blah blah) than with the:

dd if=/dev/hde1 2> /dev/null | md5sum

but.. both are consistent across the board with themselves. It seems like, tho, that everytime I do it I get different results... weird_________________McManus
----
Linux user #267375 - http://counter.li.org

Well, I'm going to bed, but here: trust the big long for loop thing, because if you md5 the whole partition, you'll also sum the journal and mount count and whatnot which can and probably will change when you run the big loop.

So, if the big loop at any time ever produces a different number -- across passes, retries, or reboots -- then something is definitely wrong, and it's likely your hard drive. You should get the same number booted from Gentoo as you would booted from the rescue disk, as long as the partition is always mounted read-only.

It seems like, tho, that everytime I do it I get different results... weird

You mean that the command loops are internally consistent (i.e. between passes) but not consistent between runs (i.e. rebooting or whatever and running the loop again)?

Wow, it's great being able to get help @ 1 am

What I mean, is that, when I ran the dd if= and then the big for -loop, and then I ran the dd if= again, I would get diff. results for the dd if= test. For loop test always stayed the same, though.

I am running memtest86 right now, seeing if I can find something wrong with my RAM. Overall, I bet it's just some archaic incompatability with my mobo's BIOS and the kernel._________________McManus
----
Linux user #267375 - http://counter.li.org

Well, I'm going to bed, but here: trust the big long for loop thing, because if you md5 the whole partition, you'll also sum the journal and mount count and whatnot which can and probably will change when you run the big loop.

So, if the big loop at any time ever produces a different number -- across passes, retries, or reboots -- then something is definitely wrong, and it's likely your hard drive. You should get the same number booted from Gentoo as you would booted from the rescue disk, as long as the partition is always mounted read-only.

Yeah, I'm going to bed too. Thanks a lot for the help.

Thanks for clearing up the diff. between the two tests. Tomorrow evening I will test out the md5 for loop test whilst booting off of the install CD. I think the HD & cables are fine, tho.

BTW, how are you supposed to mount /boot? I am using defaults 1 1, or should it be defaults 1 2 or defaults 0 0 ?? Prolly doesn't even make a difference, oh well. I bet it's that weird AMD incompatibility thing with NVidia cards or something like that, 'cuz I dont' get crc errors as OFTEN when I'm not using fb. (because the only way I can tell it's a crc error is when I'm not in fb mode, 'cuz when I am all I get is a blank screen)_________________McManus
----
Linux user #267375 - http://counter.li.org

As far as framebuffer support goes, if the kernel is giving you CRC errors, that (AFAIK) happens before pretty much anything else, including initializing framebuffer stuff.

er, well, I meant that I can't see what kind of error it is when I use framebuffer mode because it never pops up. Screen changes for fb mode, but nothing comes on the screen I only found out (a week ago?) that they were crc errors after I turned off fb mode.

Update: I ran memtest86 just fine last night when I went to bed, but that was with ECC off. After I turned it on, I started getting a few ECC errors (which were corrected, according to the test). But get this.. this is an excerpt from Crucial about my motherboard:

"The Tiger MPX supports non-registered DDR SDRAM in the first 2 memory sockets only (DIMM1 and DIMM2, as labeled on the motherboard). Registered DIMMs are supported in all sockets. "

So.... maybe it just doesn't like that configuration. I will try memtest86 on a different DIMM slot tonight and see how things work. But either way, either the mobo has some BIOS issues/quirks or I need new RAM (and I think there is where my troubles lie) _________________McManus
----
Linux user #267375 - http://counter.li.org

okay, I really think my memory is bad. When I run memtest86 I get errors at failing address 00000000000 (I guess the very first location). *sigh* I have bought new RAM, and am going to get the old one RMA'd_________________McManus
----
Linux user #267375 - http://counter.li.org