If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Bad RAM or bad CPU + random freezing

I have two issues I believe are unrelated, but I will hopefully get them resolved in this thread.

My first issue is one that is more of a lack of skill. I'm Googling around and looking through the logs I have and I don't see anything that is helping me find an answer to my issue.

I have a system with 2x Opteron 2435 and 8x1GB of DDR2 400 registered RAM. I am getting a LOT of ECC correction messages. My first though was to take all the memory out and try the system with each DIMM individually. I did that and I thought I found the culprit 2 DIMMs. I don't think that is the case anymore. Here is what I noticed.

When I boot my system everything usually goes fine, sometimes I will notice an ECC correction message during boot but I only saw that once. Once I boot the machine, I start X and browse the web for a bit. It will usually happen within the first 5 minutes. For each DIMM, I spent an hour, or more, browsing the net. I found that with 2 DIMMs, I get ECC messages within 5 minutes and after about 10 minutes after the first message, there are about 50 corrections. The other 6 DIMMs seem fine.

So I put the 6 DIMMs in and I go about my merry way. I boot the system and within 5 minutes, again, ECC correction messages. So I tried 1 CPU and then I tested the 6 good DIMMs again. They all tested fine. So I tested them in pairs. That seemed to be fine. Once I go beyond 2 DIMMs it seems that ECC messages like to pop up.

Here is 5 minutes of errors. Coincidentally the first error happens at nearly 1 second into the 5th minute of the system running. This is just a coincidence. You will notice that sometimes it's only one message and sometimes it's many messages.

Only two things remain the same. The Operating System, and the Hard Drive.

I would love to get to the bottom of the hard drive issue, but the memory diagnostics would be of more interest since those messages are more annoying...although random freezing seems to be quite annoying as well.

[EDIT]
Random freezing still occurred while I only had 2 DIMMs and no ECC messages.

As you may have noticed by my nick, you would be right in assuming assume that I live in the boonies. We experience very high lightning activity.

I have lost at least three loaded computers due to storms, and my hunch is that part of your system has been fried (maybe not by lightning though). It sounds as if that CPU is a goner if the other CPU does act up.

You might want to google the CPU and the memory combo and see if anything shows up, then add your OS as a kicker.

Back to lightning: a lot of my woes did not show up until the winter. Then I'd experience what you've experienced, and on my last beauty, it would just turn itself off randomly. (Gotta watch the modem/router connections -- they can zap a machine rather quickly and thoroughly.)

1) Their website doesn't contain pertinent revision information that only the LAST revision of the motherboard model H8DAE-2 supports the Opteron 2400 series. It does however say it supports hex-core Opteron 2000 series(only available in the 2400 series) 2.01a suports 2400s, I have 2.01.

2) They won't tell me what component change is required and they want me to set up an out-of-warranty RMA. They want to charge me $45 to diagnose the issue and they won't tell me how much it will cost unless I send them the board first.

3) They're not sure if it's a BIOS issue or a component change...well some of their support techs don't know.

Unsatisfied customer that will never buy another SuperMicro motherboard ever again.

P.S. Wouldn't a surge protector with LAN surge protector built in resolve your issue with lightning? $35 is a small price to pay to save so much headache.

Hello, I've recently been following this thread after I ordered an SM H8DAE-2 board with dual-core Opterons and some memories.

I think what I got was an old revision so apparently six-cores would not work or will post, but will not be stable. When you say resolved on your previous post, does this mean you made six-core Opterons work on an older revision with just a BIOS update?