I had the same prob. I am running W7-64bit, nVidia GTX 470. Had the prob with driver package from nVidia 296.10. Removed and installed 280.26 and the prob went away. Tried that twice just to make sure. There seems to be a prob with 296.10 as far as the 400 series of vid cards, at least. I hope this helps. In years past this sort of prob often popped up. That's why I always keep the last two revisions of a driver package. Saves having to dnload them again in case of probs. This is the first that I've heard of the "sleep bug" but it sounds like the culprit. D

Driver 301 seems to work fine but is slower than 280. Milkyway@home nVidia increased from 4'50" to over 7' for completion. That's 2.5 minutes for those work units. Unacceptable. Think that I'll go back to the 280 drivers for now.

Unbeknownst to me, the GPU has apparently been kicking out lots of errors...

Rig is a Win 7 x64 with two nVidia 560Ti in SLi, 16GB DDR3 and all top notch components.

Have been a SETI partecipant since 1999 and have never had any dramas, till some guy from the US messaged me the other day saying my cards send lots of errors and have a serious problem!

How does he know... is beyond me; why would he care... same!

Sure, none of us wants to be a nuisance for the project, but I would have thought the system would have alerted me directly if there was anything untoward.

Now, my machine is in tip top shape and also used to play some serious graphics intensive, latest gen video games (not at the same time as crunching). The nVidia cards are always run on the latest stable driver available and never BSOD, freeze or frame dropped anywhere.

Hibernation is physically turned off (elevated DOS prompt > powercfg -h off) and the power scheme is "High Performance" with sleep disabled.

So now this guy got my attention... I have a look at my results and find most of my CUDA Fermi WU are inconclusive or computational error. Most errors appear to be "-9" (result overflow?) or some other triplets result exceeding the max allowed by the CUDA application; so not sure it actually is a graphics card problem.

Nevertheless, I resorted to come out here and ask the question, to see whether I could really be harming the project and there is something I can do about it.

Unbeknownst to me, the GPU has apparently been kicking out lots of errors...

Rig is a Win 7 x64 with two nVidia 560Ti in SLi, 16GB DDR3 and all top notch components.

Have been a SETI partecipant since 1999 and have never had any dramas, till some guy from the US messaged me the other day saying my cards send lots of errors and have a serious problem!

How does he know... is beyond me; why would he care... same!

Sure, none of us wants to be a nuisance for the project, but I would have thought the system would have alerted me directly if there was anything untoward.

Now, my machine is in tip top shape and also used to play some serious graphics intensive, latest gen video games (not at the same time as crunching). The nVidia cards are always run on the latest stable driver available and never BSOD, freeze or frame dropped anywhere.

Hibernation is physically turned off (elevated DOS prompt > powercfg -h off) and the power scheme is "High Performance" with sleep disabled.

So now this guy got my attention... I have a look at my results and find most of my CUDA Fermi WU are inconclusive or computational error. Most errors appear to be "-9" (result overflow?) or some other triplets result exceeding the max allowed by the CUDA application; so not sure it actually is a graphics card problem.

Nevertheless, I resorted to come out here and ask the question, to see whether I could really be harming the project and there is something I can do about it.

Any help clarifying this issue would be highly appreciated.

Cheers

Paul
Meerkat

The reason the person knows is he is paired up with you on a work unit and saw the excessive inconclusives.

The -12 errors are not a problem with your card so they can be dismissed.

All the -9 errors are usually caused by slightly too low voltage that were factory set. The cards run fine for games, but cause errors when running full out.

Thanks, done a little reading and a little tweaking over the past couple days.

I have separated the two cards in AfterBurner, so I can tweak the settings separately as they are not the exact same.

I raised the voltage ever so slightly and lowered considerably all three clocks (Core, Shader and Memory) and raised the fan slightly thus reducing the top temp during 97% GPU usage.

It seems that most of the errors have disappeared since the first tweak was applied a couple days ago; now can only wait to see what result the latest 1100 odd pending WU will return.

One thing I must disagree with is the assumption that the cards do not run full steam when playing games: trust me, when you are in Battlefield 3 - with the highest details settings on - the fans go full speed and the cards are both running flat tack, the heat from the 200mm exhaust fans can be felt; so, if there are no errors then, I do not believe the cards are responsible.

As a further comparison, I looked at Einstein@Home which also uses CUDA: never any errors at all... so I am afraid the common denominator here is either SETI app or the CUDA implementation with these particular cards.

One thing I must disagree with is the assumption that the cards do not run full steam when playing games ...
Happy to be proven wrong.

1) You can't be sure that a game loads all the computing parts of the GPU all the time
(you can 'run full steam' jumping on one leg, you will sweat if you do that for a long time but one leg will be 'idle')

2) Games can tolerate GPU computing errors (wrong bits), all that may happen is some pixel with wrong color or other small intermittent glitch for a split second.

3) SETI CUDA app try to use all the computing units (shaders) (this happens especially with VHARs, the computation is highly parallel) and need mathematically accurate results.
(of course SETI (and all CUDA apps) do not use some parts of the GPU, e.g. texturing units, video decoder, ...)

- Get OCCT and test your GPU for errors (at previous settings (MHz, V) with No BOINC running, use several 'Shader Complexity' values to find the 'best'):http://www.ocbase.com/

While I am on this, I may as well ask a seemingly silly question that has been bugging me...

Why is it that SETI@Home only uses one graphics card?!?

It does seem a real waste that, although there are two identical 560Ti, linked in SLi, recognised by the pc as such and equally accessed (with great results) by all high end video games, SETI@Home routines are always utilising the one and same card (#2)?

Sure there must be a relatively simple way of getting the application to either share the unit load between the two GPUs or running two units at the same time, to avoid stressing and heating up only one card...

You can actually create a configuration file to make BOINC recognize all GPU cards in a system. Once BOINC recognizes all GPUs, the projects can then assign tasks to each GPU.

Personally, I would like to see a graphical interface option as easy as checking a box to use all GPUs, or even better make a detection routine to see if there's more than one GPU and to start using it if the user has "Use GPUs" set in their preferences. Requiring users to create a special configuration file is not exactly user friendly.

BOINC will by default use the best GPU. If these two 560Ti's are really identical, BOINC will by default use both of them.

However, the actual selection takes place based on 1) compute capability; 2) CUDA version; 3) available memory and 4) speed. If any of these differ between the two 560Ti's then the one with the least values will be considered lesser than the first, and will not be used by default.

You can see these when you start up BOINC, in the event log it'll show per GPU what the values of these criteria are.

Here it goes: the two cards are admittedly different, but only in make and model number, while they are identical in specs:

The only difference is the BIOS version, and to be honest, although I have a sound computer hardware knowledge and build experience, I am not prepared to flash a video card BIOS just yet.

Also, by using MSI AfterBurner I am forcing both cards to identical settings: following many reports that these 560Ti needed to be tweaked down in MHz and up in Voltage, as you can see from the pic, I have done just that and hence resolved most if not all of my unit "errors".

As you can see on AfterBurner, the usage is only on GPU2:

Interestingly, I have twice set the Phys-X card settings in NVIDIA Control Panel to "Auto", which defaults to GPU1; but every time I reboot, the Phys-X settings return to GPU2. Not sure what that really means.

This is the BOINC Log, from which it seems the choice of one GPU may not be necessarily the best... is that possible? GPU(1) is not used by has little more memory available and has slightly higher GFLOPS capability...

Lastly, I have searched the entire computer for the "cc_config.xml" file with no luck. Since I am now using the Lunatics optimised apps, should I not have found one already there, or I may still have to create one?

Thanks again for your help, I will wait for your replies before attempting to create a dangerous config file... ;-)

The Lunatics optimized apps creates a "app_info.xml" file. The Core Client Configuration file "cc_config.xml" is never created by default and must always be manually created - and has nothing to do with the Lunatics optimized apps and therefore not created by their installer. In other words, you'll have to create one on your own.

The higher available memory calculation (which is a bug in BOINC) on the first GPU comes before the higher GFlops value of the second, therefore the second isn't used. You can't have 8381331MB available out of 1024MB though, so you've got hit by a bug! This bug is fixed in a later BOINC version, still in testing (7.0.31, see post 44713 in the Change Log thread for links).Jord

Ancient Astronaut Theorists suggest that in many ways, you can be considered an alien conspiracy!