I got a new system which will eventually make its way to my nephew, but in the meantime I wanted to run [email protected] and grab a few extra points.

Problem is the system reboots itself like clockwork after processing 2% of a WU.

Im running the newest 6.22 denio beta client in console mode.

Ive shut off the windows reboot on BSOD, so thats not the issue. ive checked temps. CPU cores run 45-48. GPU is 39-40.
Memtest86 ran for 1 hour with no errors.
I can play WoW for hours on end with no problems.

Your northbridge may be overheating or have inadequate voltage, or even both. If you grab the Northbridge heatsink with your fingers you should be able to maintain grip, but if you must let go in like 10 seconds or less it's probably too hot.

Run CPU-Z while your running Prime 95, this way you see voltages under load. DDR2-1066 rarely runs at the boards lowest voltages at 1066. You could configure it as DDR2-800 and see is that gets you passed 2 checkpoints.

I've run Prime 95 for 18 hours before my first failure. A small bump in memory voltage didn't fix it, but a small bump in NB voltage did.

There is also a non-DeinoMPI version of SMP, that's the one I use without any issues.

_________________People who put money and political ideology ahead of truth and ethics are neither﻿ patriots nor human beings.

welp thats the version of prime95 i was running. It ran for 24 hours with no issues.

I had a feeling you'd say that

Good luck with the Gromacs stress tester! I wish I could help you more, but I've never had any SMP folding hang-ups bleed into the operation of the rest of my computer...

EDIT: On an unrelated note, wouldn't it be much better to Fold on the GPU and one core instead of messing with SMP? If you ever get fed up with tracking down this bug, ignoring the problem and avoiding its causes may be a good option...

EDIT: On an unrelated note, wouldn't it be much better to Fold on the GPU and one core instead of messing with SMP? If you ever get fed up with tracking down this bug, ignoring the problem and avoiding its causes may be a good option...

In theory yeah... but
a) this pc isnt going to stay around all that long. its getting a new home come christmas.
b) I have other issues with the GPU client (nvidia + XP x64 ~ 10ppd).

and lastly the reboot problem has me worried theres something buried in the hardware. If it was just a crashing fah client, I wouldnt feel so bad.

Updates:
I ran the gromacs tester for about 4 hours with no issues, then I fired up the SMP client and it rebooted again right on schedule after processing 2% of the WU.
So its now back to running 2 single core instances.

now the single core fah processes are causing the PC to reboot where I was able to complete a few WU before.

1 step forward... 3 steps back.

I messed with the bios settings a little, then realized I had no clue what i was doing, so I reset the cmos back to its defaults and went from there. No change in stability.
However I notice that cpu-z shows a core frequency fo 2ghz when this is supposed to be a 3ghz cpu. The bios shows 3ghz on startup, so im confused. is cpuz outdated? .. or is there some funky power saving mode going on even though something is running... ?

This may be apocryphal, but I recall hearing that Folding at Home will not kick a processor out of SpeedStep or Cool'n'Quiet when it's set to run at Idle priority. If cpu-z reports that the voltage is also lower than what you're expecting, then we've identified at least one part of a potential suite of problems. Try monkeying around with the power-management console on the windows control panel (or disabling "Enhanced EIST" or somesuch in the bios), or setting [email protected]'s Core "Priority" to low. The last option might be easiest...

the CnQ settings might have something to do with it as well.
Ive gone into the windows power control panel applet and set everything to "always on"
I do see the voltages in cpu-z jump up to normal looking levels when I start fah. But -- if what your saying is true, then the system may try to drop back to CnQ mode later on and be playing havok with things.

I'll look for something relating to 'EIST' in the bios and try to disable it when i get home tonite.

Any other names that setting can go by? The bios in this motherboard has ZERO documentation, and was not written by english speaking people, so other abbreviations may not make sense to me.

Someone there can probably help you out with bios settings. Some of the helpful people who used to hang out on DFI-Street (later DIY-Street; later still merged with http://forums.overclockersclub.com/) are over there now.

well i found an EIST setting in the bios. turned it off. but there was no visible change in the reboot frequency.

Is cpu-z still reporting that your CPU is running at a lower speed than you expect while you're folding?

Quote:

somebody in the FAH forums mentioned fussing with the memory divider. not sure where that is.. nothing in the bios talks about a divider of any kind.

Just from looking at RGone's pictures, I'm positive that the memory divider is controlled by the "DRAM Speed" setting on the first Genie Bios page.

Quote:

Im way beyond the info posted in that thread -- what I need is the WHY of all those numbers and settings.

I saw you posted over at DFUClub, I should have warned you to make a signature! Sorry!

Anywho, DFI boards are top-notch performers but they're notoriously exacting and fussy. I think the only place where you're guaranteed to find people familiar with the quirks of your board is over there. I'm not surprised that RGone blamed your power supply, but maybe someone else there will humor you.

interesting on the power supply too - i may have to goto Fry's and buy a 800w power supply for "testing" or mabye i could try my server grade PSU that was replaced with a quieter one - its only 420w but it has a lot more amps on all the channels.

I did some more fiddling last nite and im getting the impression that something is indeed failing. because the reboot timer is now random instead of at the same spot all the time.
I also managed to find the NB heatsink -- buried between the graphics card and the HSF. It was barely warm to the touch, so no heat issues there.

I mentioned it because RGone has a history of mistrusting Seasonic S12s. He could very well be right, but in my very, very limited experience (one DFI board (a Lanparty UT NF4 SLI-DR) and one old-school Seasonic S12 500W), the power supply was up to the task.

Hi, I have a few random thoughts. The PSU, as long as not faulty somehow, is way big enough. I estimate your system at 150-200w full load. By all means try another PSU for testing. The E8400 is 9x333 = 3GHz. EIST (Enhanced Intel Speedstep Technology?) allows it to run at 6x333 = 2GHz when idle. I think XP can only set 6x or 9x but I think when I was running Vista it could set 7x & 8x as well if it saw fit. (I have E6600, 9x266 = 2.4GHz). You can likely fix the CPU multiplier in the BIOS.
You list your memory as 1066, have you tried setting the DRAM frequency lower, 800 or even 667. Try with one DIMM at a time, try different memory slots. Have you tried under-clocking, maybe set the CPU for 9x266 = 2.4GHz to see if that makes any difference?
What OS are you running? (sorry if you've mentioned it and I missed it)
Have you reinstalled the OS, or tried a different one?
Personally I find Memtest86 not to be that stressful and can fail to show errors for hours, I'm barely that patient! I find Orthos to be a pretty quick way of picking up stability problems, 3D Mark is also quite sensitive. I also find the Windows install process to be quite a good test, if it goes through fine the system is likely stable, if the Windows install fails randomly then it usually indicates a hardware problem.
One other thing I've found is that you can have problems from lack of Southbridge cooling. /Long Ramble My Dad's Abit NF-M2 AMD system wasn't very stable for about the first year! It would be fine for a day or two or three and then crash to a blue screen. Putting it under load, e.g. [email protected] tended to make it worse. I eventually decided to investigate properly and found that heavy cooling of the north and south bridges fixed the stability issue. The size and fit of the northbridge cooling seamed fine but the southbridge cooler was pretty weedy and once off only had tiny dot of thermal paste under it. Replacing with an old much bigger northbridge heatsink and good thermal paste has fixed the issue and the machine is now stable, overclocked, folding 24/7 for days on end! My current PC (Asus P5B-E Plus) has had the southbridge heatsink swapped for a Zalman NB47J. I don't think I had issues as such but did notice errors in the System Event Log relating to disk issues, I don't have these any more. I had similar experience with my previous AMD Socket A system, except that could actually crash. A better sink on the southbridge seamed to help. It may well be that my quiet systems with limited air flow don't cool the southbridge as the motherboard designer intended. /Long Ramble.
Anyway, good luck!
Seb

Swapped the PSU my noisy supermicro server-grade PSU - no changes
Moved the video card to a different slot - no changes
uninstalled video drivers - no changes
put in my old Asus EN6800 card -no changes
moved the old card to the 2nd PCIe slot - no changes
ran with 1 stick of ram - no changes
swapped the memory sticks - no changes

I also noticed the problems seem to be behaving like a heat issue. From a cold start, [email protected] will run for an hour or so, then reboot. after that, the reboots come much more frequently until I run out of paitence and just shut the thing off.

@SebRad
I fiddled with the memory timings a little. but I know very little about what to do and what not to do with respect to setting up memory timings.
I think I breifly tried underclocking the CPU at 2.8ghz -- but again no change in behavior.
the memory is listed as 1066 but i want to run it at 800 -- and thats the way the system was shipped to me.

im running win XP with SP2.

As for the cooling - where is the SB and/or NB on the motherboard? how do I tell one from the other? where does one find acceptable temps? The monitoring tools have a "chipset" temprature which sets steady at 40-41c no matter what im doing.
There are 3 extra heatsinks on the motherboard. one with a tiny black heat sink. and 2 that are connected via a heatpipe near the cpu socket. None of them seem to get real warm while the system is running. I can easly grab hold with no fear of burning.

I can try 3dmark and/or orthos to see what happens. I assume they are freely downloadable?

They're named "North" and "South" because of their relative positions in a standard tower case - the closer chip to the top of the case is the northbridge or Memory Controller Hub (MCH), and the other is the southbridge or I/O Controller Hub (ICH).

Disjointed thoughts:
Is Orthos necessary now that Prime95 25.6 is around?

Looping test 5 will find errors faster than running the whole Memtest86+ suite. Or at least that's what the old Athlon 64 Overclocking Guide on DFI/DIY Street (and a little experience) taught me.

OK - SMP folding launches four processes and the Gromacs stress tester only launches 2... If two instances of the tester won't crash your system, we're in the same boat we've been in all thread. If they do cause a reboot, you'll have your first result that can eliminate MPICH, Deino, or something else closely tied to [email protected] as the culprit.EDIT: Nevermind - I forgot that single core folding causes you to reboot, too.

ok thanks for the NB vs SB descriptions. I can figure out which one is which by looking at the motherboard manual.

And theres actually 3 controller chips on the board with cooling fins on them. none of them get hot enough that I cant grab and hold tightly while the system is under load.

I also ran raytraced video render monday nite and all day yesterday. it completed without any issues. This one would stress the CPU and a little bit on the HD.

Seeing as all the other stress test programs run w/o issue I highly suspect some kind of software issue with FaH. The guys at the fah forums are starting to admit they are stumped -- so i see that as a little progress.

Who is online

Users browsing this forum: No registered users and 0 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum