About my unstable NV host again...
Now it entered into period of increased instability again.
Blue screens or reboots almost immediately after login or even before login (BOINC running as service).
And this happens under both installed OSes, Win2003 Server x64 and Win7 x64.

I booted Win7 into safe mode and moved BOINC's data folder so was able to re-boot into Win2003 server w/o BSoD.
Then I started to test GPU with MSI Afterburn.
Burn test (like FurMark) ran ~10 mins, GPU temp increased over 70C, GPU load was 98% or more, one CPU core was completely busy... and no BSoDs/restarts.

But when I restored BOINC setup (that configured to run 1 CPU core + GPU) BSoD happened almost immediately.

So, the puzzle is: in what system load from FurMark/MSI Afterburn differs so radically from BOINC load?

IMHO power draw from PSU should be even higher with burn-in test...
Unfortunately, I can't measure power directly, but GPU temperature was lower with CUDA app....

Assuming the card or anything else isn't broken,
If you applied Windows updates since June/July this year, then you have a fairly major technology mismatch (as far as Cuda is concerned) between Windows, and using an old driver. There are substantial changes to texture/font cache management, most of which would be resolved by using the newest WHQL [clean install advanced option] & x41zc public beta application. These synchronisation issues aren't 'correctable' using old setup, as they are deemed critical security issues (hence BSOD), and are a function of the evolving landscape of gpgpu technology.

Happy new year,
Jason"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.

The latest 310.70 NV driver has been getting good comments.
You might try a clean install of that on the Win 7 OS, as Jason suggested.

I updated my Win 7 rig (my daily driver) to it a few days ago, and it seems to be working very well. I am still running the x41z app, but will be updating that soon as well.Always remember.....kitties are all Angels with fur.
'Cat lives matter.'

The latest 310.70 NV driver has been getting good comments.
You might try a clean install of that on the Win 7 OS, as Jason suggested.

I updated my Win 7 rig (my daily driver) to it a few days ago, and it seems to be working very well. I am still running the x41z app, but will be updating that soon as well.

Running legacy GPUs on the 304.xx and later drivers introduces quite a slowdown on the x41 Cuda32 and Cuda42 apps, while the Cuda5 app is even slower, the Cuda22 and Cuda23 apps don't seem to be affected.
(at least on my 9800GTX+ Win Vista x64 host, i haven't managed to get anyone with a GTX2** GPU to do similar benches yet), that's why my 9800GTX+ runs 301.42 Cuda42 drivers, my last posted bench: Message 1284483

The latest 310.70 NV driver has been getting good comments.
You might try a clean install of that on the Win 7 OS, as Jason suggested.

I updated my Win 7 rig (my daily driver) to it a few days ago, and it seems to be working very well. I am still running the x41z app, but will be updating that soon as well.

Running legacy GPUs on the 304.xx and later drivers introduces quite a slowdown on the x41 Cuda32 and Cuda42 apps, while the Cuda5 app is even slower, the Cuda22 and Cuda23 apps don't seem to be affected.
(at least on my 9800GTX+ Win Vista x64 host, i haven't managed to get anyone with a GTX2** GPU to do similar benches yet), that's why my 9800GTX+ runs 301.42 Cuda42 drivers, my last posted bench: Message 1284483

Claggy

I believe Jason recommends 2.3 for 200 series cards.Always remember.....kitties are all Angels with fur.
'Cat lives matter.'

I'd tried 310.70, the beta version, I'd gotten a BSOD, it might have been a problem with My hardware before I thoroughly cleaned out the PC, I'm running 306.97 x64 on Win 7 Pro x64 and x41zc, I see regular slow downs from around 10 minutes to about 18-20 minutes per wu crunched, temps fall from in the mid to low 70's to the mid 60's when this happens, I'm also using Boinc 6.10.58 x64 and BoincTasks 1.44 x64 too, I don't know if the author sees this as important or not, but it should be looked into, all this happens on an EVGA GTX590 Classified(a model #1598 in fact), I run from 7pm to 7am in the winter and 8pn to 8am the rest of the time with the fan at 100%(not a mere 95%) using Precision X 3.04 and I do have clean 12v power going to the pcie bus as I have an EVGA Power Booster x1 pcie card in place, so I have plenty of power going to the GTX590 card, I get driver crashes once a day when doing Seti, but the card just picks up and just keeps going. This isn't a complaint...Batman: Some days you just can't get rid of a bomb.

I get driver crashes once a day when doing Seti, but the card just picks up and just keeps going.

Did you try to increase watchdog timer value via Windows registry ?
Driver restart can be because of just by too lenghtly kernel (or sequence of kernels) call. If so, increasing that timer value will solve problem or will make driver restart condition less frequent.SETI apps news
We're not gonna fight them. We're gonna transcend them.

I get driver crashes once a day when doing Seti, but the card just picks up and just keeps going.

Did you try to increase watchdog timer value via Windows registry ?
Driver restart can be because of just by too lenghtly kernel (or sequence of kernels) call. If so, increasing that timer value will solve problem or will make driver restart condition less frequent.

If this is the DCI value of 7, then What do suggest Raistmer? Would 60 be alright?

1) TDR only applies on displays with an active display connected. So if the issue is TDR related (at all), it should only show on particular GPUs with a monitor connected. There are two inbuilt settings in x41zc for individually controlling both process priority either globally or for individual GPUs. To use them you create a mbcuda.cfg text file in the project directory & reference it the app_info.xml, as per the provided example mbcuda.cfg. As the default settings are conservative (for Pre-Fermi belownormal, pfblockspersm=1, pf=100) I doubt this is an issue unless there is something particularly unusual about the particular system, but a stripped down example to reduce the settings while retaining abovenormal process priority would look like this for global control:

2) TDR period is pretty long by default on XP, like 10 seconds or something. As this hasn't been reported by others to particularly manifest with default settings on newer OSes with much shorter TDR timeout period, I would recommend to investigate/diagnose all hardware and BIOS settings in detail, as well as apply the reduced settings as in 1 while diagnosing.

HTH
Jason"Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.

In Windows 7 x64 the Timeout is set at 7, I set it at 60, I also did the following:

what i did was adding to the registry (using "regedit") the following DWORDS: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicDrivers\ [added "TdrLevel=0" and "TdrDelay=10"] && HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicDrivers\Timeout [changed "Timeout" value to 0x60]

I put both newly created 64 bit Dwords in the same folder as DCI Timeout, after wards I rebooted to PC and Windows 7 Pro x64, if one has a 32bit Windows OS one would use by default 32bit Dwords. Whether this is the right place or not I don't know.

Completely disabling CPU again makes host much more stable. Looks like it's hardware problem after all. Need some burn-in CPU tests to check.

As well, as a side note on the original thread issues:
I had a stark reminder today on my i5 w/GTX560ti, with a BSOD, that it needed a cleanout & reapplication of heatsink goo. As it uses the stock heatsink which IMO is too small, any kind of paste tends to dry out over a few months, so needing a good going through. Combined with several months of dust bunnies that was enough for its only issues."Living by the wisdom of computer science doesn't sound so bad after all. And unlike most advice, it's backed up by proofs." -- Algorithms to live by: The computer science of human decisions.

Completely disabling CPU again makes host much more stable. Looks like it's hardware problem after all. Need some burn-in CPU tests to check.

As well, as a side note on the original thread issues:
I had a stark reminder today on my i5 w/GTX560ti, with a BSOD, that it needed a cleanout & reapplication of heatsink goo. As it uses the stock heatsink which IMO is too small, any kind of paste tends to dry out over a few months, so needing a good going through. Combined with several months of dust bunnies that was enough for its only issues.

It's winter here now, and the crunchers heat my house...
But during the summer months, any time I get a rig that starts to act up in any way.......the first thing I do is shut it down and clean the kitty furs out of the heat sinks. Many times, that is all that is wrong.Always remember.....kitties are all Angels with fur.
'Cat lives matter.'

In Windows 7 x64 the Timeout is set at 7, I set it at 60, I also did the following:

what i did was adding to the registry (using "regedit") the following DWORDS: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicDrivers\ [added "TdrLevel=0" and "TdrDelay=10"] && HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicDrivers\Timeout [changed "Timeout" value to 0x60]

I put both newly created 64 bit Dwords in the same folder as DCI Timeout, after wards I rebooted to PC and Windows 7 Pro x64, if one has a 32bit Windows OS one would use by default 32bit Dwords. Whether this is the right place or not I don't know.

Here is what AMD recommends to do to disable watchdog timer under Vista:

Under Windows Vista, to prevent long programs from causing a dialog to be displayed
indicating that the display driver has stopped responding, disable the Vista Timeout Detection
and Recovery (TDR) feature, which is trying to detect hangs in graphics hardware. To do this,
use regedit.exe to create the following REG_DWORD entry in the registry, and set its value to 0:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\TdrLevel
This avoids the constant polling by the driver and the kernel to prevent long work units from
monopolizing the device. (To restore default functionality, set the TdrLevel to 3.)
Note that Microsoft strongly discourages disabling this feature, and only recommends doing
so for debugging purposes. Do so at your own risk.

But, as Jason stated, try to tune app first. This measure just to check if too long kernel call applies or not to the problem.SETI apps news
We're not gonna fight them. We're gonna transcend them.

In Windows 7 x64 the Timeout is set at 7, I set it at 60, I also did the following:

what i did was adding to the registry (using "regedit") the following DWORDS: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicDrivers\ [added "TdrLevel=0" and "TdrDelay=10"] && HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicDrivers\Timeout [changed "Timeout" value to 0x60]

I put both newly created 64 bit Dwords in the same folder as DCI Timeout, after wards I rebooted to PC and Windows 7 Pro x64, if one has a 32bit Windows OS one would use by default 32bit Dwords. Whether this is the right place or not I don't know.

Here is what AMD recommends to do to disable watchdog timer under Vista:

Under Windows Vista, to prevent long programs from causing a dialog to be displayed
indicating that the display driver has stopped responding, disable the Vista Timeout Detection
and Recovery (TDR) feature, which is trying to detect hangs in graphics hardware. To do this,
use regedit.exe to create the following REG_DWORD entry in the registry, and set its value to 0:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\TdrLevel
This avoids the constant polling by the driver and the kernel to prevent long work units from
monopolizing the device. (To restore default functionality, set the TdrLevel to 3.)
Note that Microsoft strongly discourages disabling this feature, and only recommends doing
so for debugging purposes. Do so at your own risk.

But, as Jason stated, try to tune app first. This measure just to check if too long kernel call applies or not to the problem.

I went with what I've found and I've not had one video driver crash since, so I'll not disable such and such, as I'm happy right where the pc is set at.Batman: Some days you just can't get rid of a bomb.