I noticed that my system gets spontaneous freezes when I suspend GPU tasks, or when they get suspended automatically ("Use GPU based on preferences" setting, which suspends GPU tasks when computer is in use).

Finally I found the reason: if I force GPU apps to run always no matter whether the box is used or not, freezes disappear completely.

After some experimentation with OpenCL programming, I noticed that the freezes are most likely caused by NVidia drivers: if I terminate an OpenCL app that is actively running GPU code, I sometimes get familiar nasty freezes.

The solution is simple: your app must always gracefully shutdown GPU computations (i.e., wait till the kernels are done, read the data from GPU, release buffers, contexts, kernels, etc). For windows apps this is simple (you get a Windows message about close/shutdown event, and can do cleanup). Console apps seem to get terminated without any notifications, but that's in fact not exactly true: you can install a control handler that will receive console events.

I guess SETI CUDA/OpenCL apps are in fact all console applications on Windows platform.

I checked SETI CUDA source code for any mention of SetConsoleCtrlHandler() and found none, so here is my proposal: could you please add a handler to CUDA/OpenCL console apps under Windows, so that they could shutdown gracefully? Like this:

Plus with your computer/s hidden we can't see what apps you are running and if you take a look at my PC's and compare that to what you see with yours you will see that no one can see any private details of yours to hack but what we can see may be of help to you.

Both CUDA and OpenCL scientific apps are running under BOINC's control.
It's boinc.exe process that spawns them and terminates them.
Both app supposed to communicate with BOINC about termination and not terminate until GPU processing is finished. Special BOINC API (assumed to be portable between OSes) is used for this instead of low-level Windows (non-portable) API.
Could you please check stderr state of app (in BOINC's slot directory) after such HW freeze and reboot?
Will it contain lines about termination request and smth like "device synched" (wording slightly differs between CUDA and OpenCL apps) ?
We need to know this to decide if synching missed on your host for some reason or current precautions work but not enough for your config.
____________

Both app supposed to communicate with BOINC about termination and not terminate until GPU processing is finished. Special BOINC API (assumed to be portable between OSes) is used for this instead of low-level Windows (non-portable) API.
Could you please check stderr state of app (in BOINC's slot directory) after such HW freeze and reboot?
Will it contain lines about termination request and smth like "device synched" (wording slightly differs between CUDA and OpenCL apps) ?
We need to know this to decide if synching missed on your host for some reason or current precautions work but not enough for your config.

Thanks for the info, Raistmer! I suspected that BOINC didn't use signals to communicate with spawned processes, but I never gave it any deeper thought.

The problem with HW freezes is that disk caches are left out of sync. After reboot I can see that files which had been opened or written to shortly before the freeze have become corrupt: empty, truncated, or containing all zero bytes. Yet, I will reproduce the freeze when I come home. I hope stderr file will not get too corrupt. :)
____________

Both CUDA and OpenCL scientific apps are running under BOINC's control.
It's boinc.exe process that spawns them and terminates them.
Both app supposed to communicate with BOINC about termination and not terminate until GPU processing is finished. Special BOINC API (assumed to be portable between OSes) is used for this instead of low-level Windows (non-portable) API.

Sorry for late posting, but I have remembered a couple of HW freezes that occurred at system shutdown time.

In the long run, portable ways of ending GPU tasks may not be enough for Windows. Look:
- You shut down the system. Windows sends WM_QUERYENDSESSION message to all windowed applications in unspecified order. I say "unspecified" because this order has already been changed a couple of times in different versions of windows.
- Console-based applications without a control handler are terminated as soon as their turn to receive the message comes.
- In case BOINC receives WM_QUERYENDSESSION after its tasks have been terminated, it has no ways (portable or not) to gracefully shutdown GPU tasks.

So, while I am all for code portability (seriously), I still insist on a small windows-specific "trick" - just to be on the safe side.
____________

I will try and will see what can be done in this direction. But expect some problems from BOINC API side.
At app beginning there is BOINC API call that configures BOINC's diagnostic subsystem. That in turn installs own signal handlers. In particular this leads to inability to catch exception via try/catch(...) block.
Structured exception-related option did not help with this. BOINC's control thread intercept exception before my code and terminates app.
So I'm not sure what behavior will be if app will try to install own signal handlers directly...
____________

Well, I added code you proposed to OpenCL apps sources anyway, don't think it will cause any slowdown.
But already released OpenCL apps didn't contain it. It will be included in forthcoming OpenCL MB7 app and new releases of AP6.
Regarding CUDA app issues contact with Jason G. on these boards.
____________

That output is from the Stock Cuda app, it doesn't contain the improved exit code that Jason's apps have, the author of that app is also not Jason, but Nvidia.
If Zmey Petroff has problems with Jason's x41zc apps, then he should report any problems to Jason.