Hi,
I just run into a situation where boinc does not request any work for my CPU, only for GPU. CPU has nothing to crunch at the moment.

I'm attached to 4 projects with following shares: CPDN 30, Einstein 25, LHC 150 and SETI 100. CPDN and Einstein are NNT (Einstein is down anyway), LHC doesn't have anything to crunch so that leaves only SETI. But CPU is just idling. This system has previously been doing both GPU and CPU WUs (last 6.03 CPU WU finished yesterday). Since then CPU was doing Einstein and now those are finished too (5 ready to report and 5 uploading).

Boinc is 6.6.36 (WinVista 32), opti apps SSSE3 (Lunatics 0.2 Unified Installer w. VLARnokill), CPU=Q9400, GPU=9800GT (driver 190.38, CUDA 2.3). Preferences are set to accept all kinds of work and all types are in my app_info.xml

I switched on the log_flag for work_fetch_debug and here is output for that:

The idea is, starting with the v6.6 range of BOINC clients, that if no work is available to be sent out, BOINC doesn't waste everybody's time by continually asking for it. The retry interval ('int') doubles at each failure to get any work, up to a maxmimum of 86,400 (seconds - 1 day). At the moment, you still have 10,501.36 seconds ('dt' - just under 3 hours) to wait until the next retry - but read on, all is not lost.

Unfortunately, BOINC servers don't tell the client why they're not receiving any work, and the BOINC client wouldn't behave any differently even if it knew. So you get the same one-day maximum backoff whether Einstein is down (fileserver crash - could be a couple of days), LHC has vanished into its own private black hole (for the last several months), CPDN hasn't written a CUDA app (and probably never will) - or SETI is a bit busy right now and should have some more in the feeder cache in a couple of seconds. I've said that before, but it fell on deaf ears.

The backoff interval, and the time until the next retry, are reset to zero when any of three things happen

You get some of the work it isn't asking for

An existing task finishes running (but you haven't got any)

You click the 'Update project' button

So it turns out to be very simple - just click that button (once is all it needs), and work requests will resume. If you don't succeed after the first few attempts (which will come in quick succession), click it again.

Just why BOINC management chose to hide all this information in the changelogs and debug messages, I'll leave for someone else to explain.

Edit - to show I'm not making all this up, it's in changesets [trac]changeset:17664[/trac] and [trac]changeset:17665[/trac]. Those came out in BOINC v6.6.18

Does the backoff work only halfway if it was only affecting the CPU requests and not the GPU requests? If a request to server is made anyway shouldn't it then ask work for both GPU and CPU at the same time if host is lacking work for CPU and GPU?

Does the backoff work only halfway if it was only affecting the CPU requests and not the GPU requests? If a request to server is made anyway shouldn't it then ask work for both GPU and CPU at the same time if host is lacking work for CPU and GPU?

The backoff is calculated and applied separately for every individual project/resource combination. So SETI/CPU and SETI/CUDA have different backoffs, and one may be requesting work when the other isn't. I had a situation yesterday while I was researching that reply, where a host had a shortfall of 40,000 seconds for CUDA, and 200 seconds for CPU - yet it only asked for (and got) the CPU work, during a CUDA backoff.

That sounds odd, but you have to remember that BOINC is designed to be as general as possible, and not make assumptions about how any paticular project or its applications are going to work.

So, you and I know that for both SETI and Einstein, the CUDA work is exactly the same as CPU work, and a CUDA task can be 'rebranded' for processing on the CPU. So the temptation is to lump them together, and say "if you get work for anything, clear both sets of backoff". But in doing so, we're using the typical human selective memory - we're forgetting to allow for AP and S5R5, which can be processed by CPU but not (yet) by CUDA. So maybe clearing the CUDA backoff isn't always such a good idea.

And it gets worse. AQUA have just suspended further development of their CUDA application (because it's so much slower than their CPU application - yup). So although CUDA work was available in the past, it won't be for a long time to come (and CPU and CUDA work was never interchangeable there). And CPDN will probably never develop a CUDA app - too much data to handle. That's another backoff you never want to reset.

No, unless you want BOINC to implement a whole rules-based management system (and if you do, you'd better try writing it yourself), it's probably better to keep everything independent. And it's less work for projects - if you allow for complex rules, the projects would have to supply and maintain the data inputs for the rules to work on - we're having enough problems getting them to manage simple values like <rsc_fpops_est>!

@ Ned,

Yes, those two changesets were only the final coats of polish on a system that was implemented even earlier - trac is quite cumbersome for historical research, and I gave up looking when I'd found something even vaguely relevant - even that took me back five months! It's just interesting that the backoffs have been in use for all that time, and this is (so far as I can remember) the first time that anyone has asked in detail about them on this board.

Yes, those two changesets were only the final coats of polish on a system that was implemented even earlier - trac is quite cumbersome for historical research, and I gave up looking when I'd found something even vaguely relevant - even that took me back five months! It's just interesting that the backoffs have been in use for all that time, and this is (so far as I can remember) the first time that anyone has asked in detail about them on this board.

All of this is based on very conflicting needs. You need to keep work on the clients, and you need to keep the clients from DoSing the servers.

... and you can't use the servers to tell the clients that they need to stop DoSing the servers because the servers are undergoing a denial-of-service attack and can't answer to tell you to stop being part of the attack.

Edit: it's also something that may look fine 99.9% of the time, and only cause these odd "edges" under very unique circumstances.