Don't forget that the tasks here automatically regenerate as they're returned - my short-queue cruncher fetched a new one within the last hour.

Yes, it's true. However, this automatic regeneration is a finite process. So the workunits will eventually run out, if no new batches issued soon. The first sign is when the number of the unsent workunits is below 10. The second sign is when the number of workunits in progress is dropping below it's value before the number of unsent workunits was above 10.
Now there is 1684 workunits in progress, it was around 1750 before the unsent queue ran out.

Note: I believe setting a Resource Share to 0 means "Only ask that project if all other projects have been asked first. And, when we ask this 0-project, only get 1 work unit per idle resource. We don't want to create a cache for this 0-project."

It's good to have certain projects set to 0 Resource Share, as a backup, for when your main projects happen to run out of work for a while.

But I wonder how many of them dropped out because of lack of communication...

I would bet 1% or less. This isn't Facebook. The admins don't post about thinking all morning they wanted a cheeseburger for lunch then deciding at the last minute to have a taco instead. This communication thing has been ballyhooed way out of proportion. Stop and think... what genuine harm does it cause any of us if a project runs out of work? If the long queue running out of tasks or a project running temporarily out of work was my biggest problem then I would have a very comfortable life.

Now if you intend to argue that we have no other explanation so it must certainly be lack of communication then go right ahead but be prepared to be severely ridiculed.

And another thing... pissing and moaning about no work has been known to motivate projects to reissue work that's been done just to shut the whiners up. I would not want to buy electricity to crunch tasks that don't need to be crunched just because some cry baby can't find anything else to whine about.

One last item... maybe, just maybe, nobody at the project knows exactly when more tasks will be available. Maybe, just maybe, nobody has time to go around to every scientist on the team and ask and come up with a firm date. The point being that should one of them dare to just estimate the time and date and be wrong then 20 seagulls suddenly arrive out of nowhere, squawking and pissing and moaning all over everything about lack of comms and how their important nose is just soooo out of joint and the world's coming to an end all because of no tasks.

Anybody else need a new one ripped while I'm at it? Or maybe ya just wanna take a valium and a laxative and relax and have a good bowl movement instead?
____________
BOINC <<--- credit whores, pedants, alien hunters

Yes we do hear you. We heard you yesterday too. Once everyone is back in the lab later today we will discuss what needs to be simulated or if we have any new interesting projects. As Dagorath says it can take a while sometimes to come up with something good. Unfortunatelly projects don't just flow out of our brains so we need to do some research and find interesting systems which answer interesting biological questions or make new collaborations with other teams. It can sometimes take a while.

Personally a thing I find problematic is that AFAIK we don't have a good estimate on when the queues will run out so that we can plan ahead. You can only really tell in the last few hours before they go dry. Before that since WU's get resent it's quite tricky.

Also, out of curiosity, can't BOINC users configure BOINC to crunch other projects once one runs out? What would the disadvantages be for the crunchers if that happens? We obviously prefer to keep you here :D I just want to know why it is a problem if we shortly run out of WU's.

Also, out of curiosity, can't BOINC users configure BOINC to crunch other projects once one runs out? What would the disadvantages be for the crunchers if that happens? We obviously prefer to keep you here :D I just want to know why it is a problem if we shortly run out of WU's.

Yes, crunchers can crunch for other projects. You can set their weighting to be very low, or zero, and only get the odd task, or only get tasks if the main project runs dry.

You will find that many crunchers purchase GPU's to facilitate their choicest project. For a start, if we wanted to crunch for some other projects we would buy ATI cards, and while NVidia's work on some of these projects their relative performance to ATI cards is poor. MilkyWay@home is an example of such a project. Obviously Boinc credits are important for many, so if you use a GPU that is not suited to a project you will have reduced credits. Some projects might not be stable, or there could be compatibility issues running some mixes of work.

Other projects also run dry, and come to an end (WCG and POEM no longer have GPU WU's). You might need to configure to have 3 or 4. So for example, you might have Einstein, MW, and Albert as backup projects. This can be a lot of hassle especially if many projects frequently run dry.

If your project reputation deteriorates you become perceived as an intermittent project and lose many, if not most, of your crunchers. Crunchers come and go, but even those that would hang around cant stay at a project with no work! When they leave they might not come back for a long time.

What you could do is have another Long queue, and that way always have a backup project here. Send the work out from that project/queue as low priority; only if the other work is low or unavailable.

IMO you should be trying to model some more fundamental bio-techniques, and thus better establish your working model.
____________FAQ's

Also, out of curiosity, can't BOINC users configure BOINC to crunch other projects once one runs out?

Yes, we can. This is maybe my personal obsession, but I don't have any spare GPU project. I've built my rig especially for GPUGrid. Sure I could crunch with them for some other GPU project(s) as well, but then I could have built my computers using cheaper AMD GPUs. So their best use is here, at GPUGrid.

What would the disadvantages be for the crunchers if that happens?

My computers provide the heating of our apartment. So in the winter we could feel cold if there's no work on them :) (Don't worry, we have natural gas heating also, and we use it when it's really cold outside.)

We obviously prefer to keep you here :D I just want to know why it is a problem if we shortly run out of WU's.

The annoying part of it is that the RAC is dropping very quickly when it happens. But no real harm is done.

But I'd like to turn this question the way around.
You're using a 1.3PFlop supercomputer, which is granted to you free of charge. Its building cost is around 455.000€, and its monthly running cost is around 17.000€ (probably more, as there are older GPUs also). I would go mad, if I couldn't feed this monster with work. Don't you feel the same way?

Wow, guys. It is seriously not a big deal, if a project runs out of work. It's your decision on whether you want your computer to be able to be "kept busy" in that scenario. If you do, you can add other projects. And if you only want to crunch those other projects when your main project runs out of work, then set those other projects to a Resource Share of 0.

Want to know something interesting? This outage triggered a bugfix in BOINC!

In my 3-GPU computer, I had previously set up my 2 main GPUs to exclude even the backup projects. But when GPUGrid went dry, I realized that was the wrong setting for me, because I wanted to keep those GPUs busy.

So, in my quest to set things exactly like I wanted, I found a bug. Check out the following email exchange. :) The bugfix won't land until the next version of BOINC is released, but it's good to have been found and fixed.

Anyway, again, it all boils down to: If you want to keep your resources busy even when a project goes idle, then add more projects, and set the settings appropriately! For reference, I'm attached to 29 projects, and have the settings set such that the 2 main GPUs focus on GPUGrid.

Keep up the good work, GPUGrid, and if you run out of work sometimes, so be it.

I'm seeing a behavior, and I'm having trouble deciding if it's unsupported functionality, or if it's a bug. Can you please answer and help?

- On the BOINCstats BAM! account manager website, for one of the projects (SETI Beta), I have set a host-specific Resource share (to 0 in my case)
- This is a host-specific setting, separate from the "global project" resource share value.
- I tell BOINC to communicate with BAM!, and I see my client Resource Share get updated (RS changes to 0)
- I click Update on the project, and because the project does not have a "Resource Share change" queued up, my client still correctly says 0.
- I then go to the project website, and change the Resource Share, to 2 for instance
- I click Update on the project, and this time, because the project DOES have a "Resource Share change" queued up, my client now says 2.

Shouldn't it have respected my host-specific account manager value of 0? Or is the concept of a host-specific value, unsupported functionality?
For reference, the client sees that the Project is indeed managed by the Account Manager (I cannot click "Remove")

Please let me know,
Thanks,
Jacob

PS: For anyone who cares to hear the detailed reason of why I'm even playing with this, it goes like this:

- My setup was: 2 beefy GPUs set to work on GPUGrid, with other projects excluded, alongside 1 tiny GPU that had GPUGrid excluded. Exclusions are all done using <exclude_gpu> config in cc_config.xml
- That third GPU was set to only work Albert/Einstein/SETI/SETIBETA, but because I didn't want CPU tasks from them, and wanted to minimize RPCs, those 4 projects were set to Resource Share (RS) of 1.
- GPUGrid recently ran out of work, and my beefy GPUs were idle, due to my settings.
- So I changed the GPU Exclusions to allow the 2 beefy GPUs to work on the 4 projects, but because I wanted to prefer GPUGrid, I wanted to change the 4 projects RSs to 0.
- But I have a separate host with a tiny GPU that actually works those 4 projects and cannot do GPUGrid, and to minimize RPCs, I wanted to change the 4 projects RSs to 1.
- So I decided to leave the project RSs set to 1, but do an Account Manager host-specific setting of 0, on my main 3-GPU-rig, for the 4 projects.
- But that host-specific setting of 0, is getting trumped, and changing to a 1. Bug, right? Sigh.
- If anyone has a better way to accomplish what I want, without doing venues, I'm listening.

Ok, thanks for the info :) So mostly dedicated equipment, RAC and messy BOINC configuration.
Yes of course we want to use it to it's maximum! And I am actually impressed at how much we do considering the amount of scientists in the lab. There is always though the "problem" that after simulating we need to analyze the data and publish :P And in parallel we need to send out new projects. It's tricky to time everything so well and sometimes the timing doesn't work out, like now.

I'm sickened by the number of "volunteers" who think this project should serve them instead of the other way around. "My RAC is falling, the sky is falling, I broke a nail, it's all your fault, now do the impossible and get me some tasks or I'm a whine like a baby, wah, wah, wah, wah, hurry up you're not doing the impossible fast enough".

I don't think I want to be part of a group that acts that way. I feel like I'm in with a bunch of 15 year old Twitter twits and bubble-gummers. I don't want anything to do with people like that. Thanks for all the advice and jokes but you're not the kind of people I want to be associated with.

I'm sickened by the number of "volunteers" who think this project should serve them instead of the other way around. "My RAC is falling, the sky is falling, I broke a nail, it's all your fault, now do the impossible and get me some tasks or I'm a whine like a baby, wah, wah, wah, wah, hurry up you're not doing the impossible fast enough".

I don't think I want to be part of a group that acts that way. I feel like I'm in with a bunch of 15 year old Twitter twits and bubble-gummers. I don't want anything to do with people like that. Thanks for all the advice and jokes but you're not the kind of people I want to be associated with.

Detached.

I think it's more the times, almost everything we do not is instant, a cup of coffee on every corner, a food place too, internet with ALL of it's info is at our fingertips 24/7, need a new part for something it can be delivered tomorrow! And then when there is a problem we want it fixed NOW too, just like everything else in our lives. We have forgotten about THE PEOPLE behind the scenes making everything work, unless of course we our selves are involved in those problems or fixes, or whatever. Want to file a FOI request, no problem here's the form. You say you want to wait for the response, GIVE ME A BREAK, it can take MONTHS or YEARS for it to be even considered let alone filled! People have forgotten what it takes to make this an 'everything at our fingertips World', the data is available and we want it NOW!!!

ps I HOPE you didn't REALLY leave, but if you did good luck in your next endeavor!

In the meantime, I recommend attaching to multiple GPU projects, so that you can still keep your GPUs busy until then. You can even set their Resource Shares to 0, which will only get work when your main project(s) are empty.

I'm sickened by the number of "volunteers" who think this project should serve them instead of the other way around. "My RAC is falling, the sky is falling, I broke a nail, it's all your fault, now do the impossible and get me some tasks or I'm a whine like a baby, wah, wah, wah, wah, hurry up you're not doing the impossible fast enough".

I don't think I want to be part of a group that acts that way. I feel like I'm in with a bunch of 15 year old Twitter twits and bubble-gummers. I don't want anything to do with people like that. Thanks for all the advice and jokes but you're not the kind of people I want to be associated with.

Detached.

Wow..... that must be a world record distance for spitting the dummy out!!!

GPUGRID must be one of the most reliable BOINC projects and is a much more worthwhile project than a lot of them. A lot of people have invested a lot of money in expensive GPU cards to run this project and "Credit" and "RAC" is almost addicitive,

Alright!! It looks like we're actually going to run the server dry of GPU tasks again - Congratulations everyone!

I hope you've got your backup projects in place. I know I do!

I've got the following projects ready to accept GPU work:
- World Community Grid (which currently only has CPU apps)
- POEM@Home (very rarely has GPU tasks available)

... and I've got the following 4 projects set at 0 resource share, for exactly this situation, so my GPUs stay busy while I don't have any work available for my main projects:
- Einstein@Home
- SETI
- Albert@Home
- SETI Beta

So, remember, instead of complaining about lack of GPUGrid work, treat it as an accomplishment (we killed it - hurray!), and also keep those GPUs busy (to further serve humanity!)

Yep - GPUGrid running out of work, is a GOOD thing. It means we are keeping pace with their needs :) This is exactly why I have backup GPU projects, to keep my GPUs busy, for science.

It's easy to setup backup projects -- just add another project, and set its Resource Share to 0. That way, the backup project only gets work when your main projects are dry, and it only gets a single task of work at a time, so it can frequently check the main projects to see if they're no-longer dry.

There are no unsent workunits in the long and the short queue.
There are 1712 workunits in progress in the short queue, and 1445 in the long, so there could be a shortage in a couple of days when these numbers begin to drop.

Would be nice to have all servers completely empty for a few hours.
I still have error WU's, from last year's August and November! on my error list. Would love to see those leave so I will get an error free page.
New weaponry needs to be installed tomorrow, so no jobs for half a day suits me.
I did a re-wiring a few days ago, as it took longer then anticipated, my RAC dropped like a brick...

But I'm being selfish here, I hope you all get new WU's to crunch very soon.
____________
Greetings from TJ

This is a good thing, to some extent, because it means we've done a great job chewing through the work they had for us.

Hope you have those backup projects configured (projects with 0-resource-share). I'm anticipating my BOINC installation to automatically start running some Einstein/SETI tasks, when my current GPUGrid units get done. Then, it'll automatically switch back over to GPUGrid when they have more work again.

My house is getting cold. Might have to turn on some space heaters since the computers aren't doing the work.
____________
1 Corinthians 9:16 "For though I preach the gospel, I have nothing to glory of: for necessity is laid upon me; yea, woe is unto me, if I preach not the gospel!"
Ephesians 6:18-20, please ;-)http://tbc-pa.org

It will be interesting to see how many of the new users remain as 'active' users, or 'users with recent credit', over the weeks and months to come. A recent encounter with something close to the "new user experience" makes me rather pessimistic.

I recently upgraded the video driver on two older hosts (43362, 43404). The GTX 750 Ti GPUs are a little marginal for returning long tasks within 24 hours - especially Gerard's latest offerings! - but do contribute successfully to this project.

The significant result of upgrading the driver was to increase my capability from cuda60 to cuda65 - and I started being allocated cuda65 tasks on those machines for the first time.

So I was braced for the BOINC server's generic handling of runtime estimates for a new app_version, and it was as bad as I expected. This isn't a criticism of the GPUGrid project - they have to use the server software provided by BOINC - but it makes for a very bad user experience.

Both these hosts show a long-term APR of over 100 GFlops for the cuda60 long tasks, both averaged over more than 250 tasks. But on a version change, BOINC throws all that accumulated knowledge away, and starts at rock bottom all over again.

And I mean rock bottom. I monitored 43404 most closely: BOINC started the new version off with an estimated speed of 2.1 GFlops, and gradually dropped it to 1.77 GFlops (probably those long Gerards again!). Long tasks at those speeds translate to estimated runtimes of 789 hours and 887 hours respectively - around 5 weeks. With a 5 day deadline, the BOINC client locally is clearly in deadline trouble, and reacts by preempting running GPU tasks from other projects to give priority to the GPUGrid task.

Those new users will see the same behaviour: multi-week estimates, 5-day deadlines, and GPUGrid 'monopolising' (as they will see it) their GPUs. That sort of thing gives projects a bad name, but - I stress - it's not GPUGrid's fault.

If the users persevere, and complete their initial 11 tasks, estimates will normalise and become realistic, and the 'high priority' running will go away - but how many users will have that patience? It took me over a week to nurse 43362 back to normality, and 43404 still isn't there yet.

I have written - yet again - to David Anderson urging him to address this problem of initial speed estimates for GPUs, but I'm not optimistic. His algorithm is designed to cope with the steady-state estimates for hosts which have completed hundreds or thousands of short tasks, and he sees that it is adequate for that purpose. But it's incomplete. I would urge the project administrators of GPUGrid (any any other project administrators who read this) to monitor the drop-out rate for those newly-recruited users, and if it causes them any concern, to raise the subject with David Anderson directly.

As I am not sure if this thread about empty feeding servers is directly related to speed estimates, I would like to add that I have never seen multi-week estimates on any of my systems. From the start of a new system and through upgrades of software and of GPU hardware, I have not ever seen any estimate over 40 hours actually on any of my machines. I think I have actually noticed the exact opposite. I have noticed that sometimes (and it may be around updates and upgrades) that it will show something like 7 hours to complete and then end up with the countdown time not moving a second for a few seconds of more and lengthening the actual time to longer than the initial estimated time sometimes by 2-3 days. I have also noticed estimates of like 38-40 hours and then the countdown time counting 2-4 and even 5-8 second per actual second, and when that happens, it always ends up being an actual of exactly that amount of second divided by the estimation (meaning if it says there is 30 hours and it is counting 3 seconds per actual second, it will definitely finish in right around 10 hours.)

Those are all long run only computers. Now I do have one short run only computer because it is a laptop with a Quadro K2100M in it that cannot make even the shorter of the long runs in the deadline, which unfortunately I have had to prove to myself by waiting them out, but even then, the countdown (estimated) time counted one second for every actual 3-10 seconds counted on the count up timer. So maybe I am not seeing what you are seeing, maybe I have newer cards that all estimate better than older ones and maybe you are speaking of older ones (that is probably told in the Cuda versioning that you mentioned, but I don't know the versions and the cards as well as most probably), and maybe I just missed it completely whenever it happened, but I don't think even if I saw it and then it actually finished much much quicker than the estimate, it would turn me off much, because I would know that the work is being done and the bug of estimating the time was something that was just a bug or something that needed time to estimate my actual time. I mean, when I have to run something else (outside of BOINC) to take up time on the GPU because GPUGrid is out of work, when it comes back and I don't catch it for a day or two, it will take an actual 40 some hours to finish a task that normally would have finished in 7 hours and then when I turn that other program off again, it still estimates these actuals into my tasks on that computer for a while and counts 4-10 seconds on the countdown for each actual second on the count up. Then after a few tasks, it gets back to normal again. My point is, I have never seen what you are describing, if I am understanding what you are describing correctly.

Now, more to the point of this thread...

The queues went dry on and off on April fools day, but went completely dry on the night of April 3rd (Eastern Time zone, maybe early morning on the 4th in Europe and Asia) and since then my laptop has not gotten any work at all long enough for me to notice it in the short queue, and of the 4 long queues workers, I have opened 2 up with the 780s in them to short and long, left the one with the 980s in it only open to accept longs, and it just so happened that the 4th died on the 4th (the heatsink popped loose from the motherboard and was blowing smoke from the processor, so I am not sure how long it ran and I hope the processor isn't dead or the motherboard and some new thermal paste can be applied and it will be alive, though I think by the time there is smoke, it is worse off than paste and popping the heatsink back on can fix.) So one computer down and out, one computer down from not getting anything, and 3 computers running somewhere between 3 a day and none for hours, everyone's RAC is about 3/4 what we started at on the 4th and dropping every hour.

So I really hope more work comes out, these suspected new-comers that gobbled up new work will soon show that they are here to stay or give us back the work reassigned by the servers, and or that GPUGrid would even, to keep us happy and running, allow us to crunch on already run work units as a second and third validation of work to account for possible jitter of hardware, which is always present and possible, and most likely probable, at least during times when the queues are empty and there is no work to give out. I mean, would it be too difficult to feed us more than one time for each task when around 2500 tasks are fed to around 7,000 computers with recent credit with probably more than one GPU on several of those computers? If an adaptive task feeding server could be done to give 2500 tasks when 2500 computers are asking for work on 4000 (or more) GPUs/CPUs/Andriods, then give us 7500-10,000 task units by giving us those same 2500 tasks 3 and 4 times. I mean I think not getting work is a much bigger turn off for any project than getting bad estimated run times that end up being wrong and the work gets done in much shorter times. I mean if a 5 day deadline is set and a work unit is fed to me and the estimated time of finishing it says 336 hours, but then it finishes in 33.6 hours but my estimated time is dropping at the rate 10 times faster than real time, I think I get it and hang in there and learn to ignore it. But if I sign up for a project and never see work from it or only see work once a day or two and finish that work in 9 hours and the well runs dry every time I come for water, I find another well and leave this one behind. And if I learn that I am getting repeat work, it may sit a little wrong with me, but it sits better than not getting any.

Being hungry and not getting food is worse than getting too much to eat and not finishing and it is much worse than being told you have too much to eat and then only getting enough to fill you. That is for sure!

Now, the solution to that, I suppose, is to sign up for other project, set those projects to lower priorities than this one, and let it come back here when there is food..... but that leaves people like me out. People who 1) believe in one thing is best and more is too much (minimalists)(well minimalists willing to go all out maximum on one or two exclusive things) and also people want to see BOINC as it is kind of meant to be, which is a set it and forget it program, who now are being told, "Your house is cold because your computers are not running hard because GPUGrid is out of work and who knows when there will be more, but certainly, they could give you more work, because old work units can be reissued, but they won't do that and who knows why, so turn on a heater or run something else, even though you donated about $1,000 to them, spent money enough to buy 8 GPUs solely as Cuda based in order to run this project (which is no inexpensive thing to get 780's and 980s or even the Quadro in the laptop), and you only run their projects exclusively on purpose because you really believe in your work and none of the other projects, even though they are scientific or medical related, you don't believe in the people or the project itself." <---Well maybe nobody would ever ever say that, but you get the point. (BTW, I think a while back, I started talking to the project and not to Richard, so sorry for stating this was for you and then going off talking to you, Richard.)(Also, I am king of the run-on sentence, so please forgive me for running on and on.) But again, you see my point. Thanks for reading if any of you got through this rant/tired mumbling run-on. And please, if someone who runs or helps run this project reads this, reissuing tasks would solve a TON of user related frustrations when the well runs dry and may help eliminate some jitter in the task results themselves.

If an adaptive task feeding server could be done to give 2500 tasks when 2500 computers are asking for work on 4000 (or more) GPUs/CPUs/Andriods, then give us 7500-10,000 task units by giving us those same 2500 tasks 3 and 4 times.

If the work units do not need validation, then your proposal is a horrible idea.

I'm not here to waste my energy. I am here to complete the work that is available. To complete scientific work that can better humanity, and in as optimal away as we can.

Why are you here? You make it sound like all you care about is that your GPUs are busy, the heat is kept up in your home, and the stats keep coming in. You may need to rethink your priorities, and join some additional projects.

The moment a project decides to just reissue tasks for the sole purpose of "keeping devices fed", is the moment I leave the project.

GPUGrid would even, to keep us happy and running, allow us to crunch on already run work units as a second and third validation of work to account for possible jitter of hardware, which is always present and possible, and most likely probable, at least during times when the queues are empty and there is no work to give out.

Couldn't disagree with you more.

I don't want (busy) work. Who wants to rack up electricity costs to run WU's that have already completed sucsessfuly without some solid justification.

Betting Slip is right, both of your GTX 980 computers are showing that message "Simulation has become unstable", which means (so far as I know) that the GPUs are clocked too high for their current voltage.

You should be able to complete work units, without receiving that message at all. Try lowering your clocks. You'll even have more valid results!

As I am not sure if this thread about empty feeding servers is directly related to speed estimates, I would like to add that I have never seen multi-week estimates on any of my systems.

Sorry about the hijack. After I posted, I did report myself to the moderators for being 'off topic', and suggested that they split off your post about new users, and my response, as the starting point for another discussion in a new thread. They didn't choose to do that, as you can see.

I concede because that all makes sense. And yes, I do make it sound that way, but I am speaking more out of a tongue in cheek off humor than reality when I speak of heating the house and that sort of thing. But I think my experience with distributed projects may, and I stress MAY, be out-dated, but it may be based in reality. My experience with medical distributed projects goes back to the days of United Devices and the original cancer research project that they ran for Oxford and the National Foundation for Cancer Research. Back then it was all CPU and no GPU. It was also Pentium 3 and original Pentium 4 CPUs that were doing the work. So maybe the technology back then and the technology now has completely eliminated jitter between the hardware, but they needed several different computers to finish every work unit so that the jitter between them could be figured and only then could a single work unit be validated. So maybe now, a distributed project only needs to issue and return one copy of any work unit and either the technology on the user end is always completely trusted to always give back a proper result or the error detection in the process itself will detect every error possible OR the back-end "validator" servers can detect all the jitter of every unit based on one single result returned, but I doubt either of those are completely true.

And yes, it may sound like I am asking for "busy work", but I think there are 2 sides to that. One, the practical one that you made that it takes electricity to run them and that does cost money AND that there are other projects available for people who are not as selective as I am for what I actually do want my computers doing all day and night. But the the second side is the ones I have stated, which are that I believe in this project and want to do all the work I can for it, I spent the money not for stats, but that stats are there as the indication of how much work you are actually doing for the medical science and if not for the stats, you would not know how much work you or anyone is doing, but would have to take the word of the project that "you are doing a good job guys", and that because the problem of varied results would still exist from hardware and software differences and jitter of usages, then the solution would be to compare multiple result and not take one result per work unit as the end result of validation.

I don't believe technology has solved the problems of varied technologies. I actually think that the different cards, different software, different versions of software, different CPU/GPU combinations, different I/O rates and configurations, different everything in general has become more diverse and what people are doing with their computers since 2003 is much more diverse to the point that jitter in results is most likely a bigger problem now than back then! I am not doing this and spending all this time and money too heat the house, keep the GPUs full, and keep up my stats. I am doing all of this and care so deeply because the work is so valuable and important. All my stupid humor aside with heating the house and feeding the GPUs, the work, I honestly believe, needs validation of more than one run. At least 3 runs on different hardware/software/usage configurations, I believe is needed, even if it takes us longer to complete papers and whole tasks.

Now the other side of this is the end users, and I was specifically talking to Richard's claims that we will lose users based on bad estimation times. The users will/would be much more turned off by a project if the project continues to run out of work much more than an estimated time thing that works itself out anyway. When you go for a job interview and the company says they have no work for you, you go find work somewhere else and rarely if ever do you come back to ask again if they have any work now, if you find it somewhere else. When the well is dry, you find water elsewhere. When mom smacks your hand when you reach for the cookies, you learn not to reach for the cookies. Get it? To retain the users that Richard is saying will leave because of his problem leave because they can't get work, Richard is proved right over an issue that never even happened. I was not "yelling him down", I was only stating a different point that was more relevant. So when my most productive computer goes from 9-12 units a day to 2-3 and some of my computers don't see a task for 2 days, I know that people coming onto the project will be leaving as fast as they are coming. So let's say they do leave at the rate Richard expects and let's say it is a combination of both his and my pointed reasons and some others... and then a month down the road, we need a lot of users in a short period of time to complete a lot of work units (which is known to happen), and those users are gone... the project is in danger of not completing the task and we are stuck maybe not doing the medical science needed in time OR it was given as a test to see what our computing power could do in order to gain a new scientist who needs our GPUs for their work, just because we could not prove we could do our work and his/her additional work in time? Again, coming from United Devices cancer research, the whole almost 2 year non-profit research that did yield great medical scientific results in folding, cancer, anthrax, and a few other areas, all also had a double reason, and that was to prove the United Devices GRID platform could be effectively used by large companies to complete large scale tasks by their proprietary, for profit, software. And after they proved it with us in the non-profit realm, they agreed with the partners to stop the project so they could move on and sell the software to companies. They are still in business today as Univa and they are still selling that software. And in addition to selling the software after proving it, they also spawned the idea of medical research through computational GRID computing which primarily WCG and Folding@Home picked up and rolled with and which eventually led to GPUGrid itself. Without the pioneers in the for profit company doing non-profit work in medical sciences, we would not be here having this discussion. So the idea that you retain large amounts of people for the time when they are needed for something bigger by simply continually giving them work when their computers ask for it is a proven reason to make sure more people stay around FOR THE SCIENCE TO GET DONE when it is needed by more people in shorter time periods. Proving that you are the best in the group of options brings more scientists to you to get their work done. And that proof only comes by a consistent large computer base to feed work to.

So to all that, maybe there is other solutions that would keep you happy on your concerns and meet the concerns I bring up (which you may not even see as valid, but I can't not see as valid, since they are valid to me and valid based on my own experience and observation) and that might be that maybe all original work is set to use the current amount of GPU, which is like 65-75% or the GOPU and then all validation work is set to use like 10-15% and last longer per unit and have longer expiration dates. That would allow for validation, allow for the validation work to not cost a ton more than (but yes, more than) not using the GPU at all, and would keep the work units flowing to GPUs so that the "well" (as I have been referring to the feed server) doesn't run "dry". And, of course, an option in your "GPUGRID preferences" that you could reject all "Validation work units" by simply unchecking or checking a check box. And the stats would be less than original work or maybe not, but also based on time taken to complete and whatever else other distribution of points rules there are now with the added variable that it is not original unit work. I mean, if validation is needed, and I obviously think it is, then I would definitely leave that checkbox always checked, regardless of less points and longer run times, simply because 1) I know people will opt out of it for points and 2) I know it is needed for the science to produce more accurate results. And i would only hope that my computer never got the same validation or original unit twice, because then it might not add the needed amount of jitter needed to properly validate the unit such as multiple computers would for the same work unit.

I really think this either at the very least needs discussion and not just 2 sides OR it needs for the scientists themselves (or someone who represents them and knows how the work is done and validated at the computational back-end level) to explain why the work units do not need multiple runs on differing computer configurations in order to validate that the current single run/validation server configuration really produces 100% accurate results 100% of the time. 99% correct in medical science is just about 100% wrong when it reaches the publication level or when it reaches the application on human beings level. You need to know you got the right result or you have to assume you got the wrong one and need validation. All experiments need to be proven by duplication before they can be validated as fact. So even if not for the validation of the work units, the validation of the scientific process itself. Hypothesis leads to proving it once, then it is theory. Theory if duplicatable becomes fact. What we are currently doing, unless someone says otherwise is making theory out of hypothesis, but not duplicating it to prove it is fact. Has the scientific process changed since I was in high school? If not, we need validation and the idea of

If the work units do not need validation, then your proposal is a horrible idea.

All my stupid humor aside with heating the house and feeding the GPUs, the work, I honestly believe, needs validation of more than one run.

I really think this either at the very least needs discussion and not just 2 sides OR it needs for the scientists themselves (or someone who represents them and knows how the work is done and validated at the computational back-end level) to explain why the work units do not need multiple runs on differing computer configurations in order to validate that the current single run/validation server configuration really produces 100% accurate results 100% of the time.

If I recall correctly, the scientists/admins here have previously explained, in a thread, why they set the minimum quorum at 1. They are intentionally not verifying the work, and they have their reasons. I'll try to dig up the thread for you, but I'd encourage you to do some digging.

... the idea of "If the work units do not need validation, then your proposal is a horrible idea." is not valid. Please consider. TY

What I said is absolutely correct, because if they don't need validation, then your proposal would only serve to waste.

By the way, there certainly is more than "electricity costs" involved here. In fact, I don't even care about the electricity costs. I care more about the harm that computing does to the environment! And so, my philosophy is that we should compute as efficiently as possible, and not waste, in order to preserve the environment.

I appreciate your concerns, I really do, and I understand where they're coming from. Validation makes sense, and people who love this project love to keep their devices busy doing work for this project. However, I think you need to reconsider your priorities, realize that they will sometimes run out of work, and have backup projects (who also desperately request your devices' usage) ready with 0-resource-shares.

TY on the 980s. I didn't realize that was happening over the past few days on that on GPU. I don't know why other than they happen on Windows Updates and other software updates days. I will see what I can do about that GPU. I am not sure that I am overclocking it, but I do use the MSI Afterburner to keep the temp down, so maybe that is also affecting the clock in some way. All 3 are reading the same settings and only the one is having that error and only since the 2nd.

I set the "Core Clock" into negative numbers now by a few Mhz, so we shall see if that does something for the Errors. But that is not the "jitter" I was referring to, although overclocking is another issue not many computers back in the day had either except the "enthusiast". Now everything comes out of the box overclocked, it seems. I was referring to the fact that if I do something on my computer and you do the same thing on yours, the results may have a slightly different byte-for-byte result if it takes both computers hours to perform. So not "jitter" of one task on one computer, but the kind of differing results that differing computers would create ever so slightly differing results and getting at least 3 of each and the more the better, would help "triangulate" where and when that "jitter" of differing results did occur between the results. Trusting one result without learning the jitter factor of the result can't lead to anything but a result that needs to be compared to another for verification, thus validating it. The "jitter" I was referring to is the jitter between rigs, not the jitter inside one rig.

(Thanks again for bringing those errors and solution to my attention.)

Two say they ran old work units during an empty queue in order to validate the work. Kind of my idea exactly.
The other says the error reporting allows 4 errors and then on the 5th it drops the work unit and fails it out. This seems to say that 4 errors is the "jitter" threshold and would therefore mean 1-4 errors needs the validation.

Then I think down in the depths of this one for 2009, if nothing has changed since then, which it probably has and even improved over what he promises and describes, it is at least AN answer, if not the one I was looking for from GDF:https://www.gpugrid.net/forum_thread.php?id=901#8041
But I still KINDA like ExtraTerrestrial Apes' "intersting opportunity" in Message 8144 down in there before GDF's answer too. lol

I am happy with this discussion, that at least I got my thoughts "on paper" as it were and got great feedback. I do apologize again for using my stupid idiomatic humor to state my thoughts clearly originally. I would appreciate Stefan's additional feedback to the discussed points, but again, I concede the points I made to be less important than the ones made to get the project to where it is today. This has obviously been not only brought up, but discussed in different ways and the current path was chosen for specific and better purposes and reasons. I will take Nate's reissues as oddities and the general single issue/error checking/validation server path to be the best for the science we are all participating in today. TY again and again.
____________
1 Corinthians 9:16 "For though I preach the gospel, I have nothing to glory of: for necessity is laid upon me; yea, woe is unto me, if I preach not the gospel!"
Ephesians 6:18-20, please ;-)http://tbc-pa.org

And many thanks for sharing what you found. It's useful and informative.

:) I'm not looking to win. I just challenge your assumption that validation is needed here on GPUGrid. It's actually a great question, and I hope a scientist/admin chimes in with an explanation of their current approach. Edit: GDF's 2009 explanation is sufficient, in my mind, meaning that they are actively choosing not to do BOINC-based quorum validation. However, I hope Stefan does reply to my PM, and maybe makes a FAQ post regarding GPUGrid's decisions on validation and minimum quorum.

So... Have you got those GPUs prepared to work on backup projects, yet? I think it might be time, as the well will indeed run dry here occasionally :) I am attached to 35+ projects, even though my GPUs are setup to work for GPUGrid exclusively (by me setting my other GPU projects either to "don't use NVIDIA" or "0-resource-share backup project"). ... and once you have it set up, you don't need to manually fiddle with it. BOINC really is meant to be set it and let it do it's thing!

I think I will continue to manually, as needed, turn on and off the other non-BOINC project that I do when GPUGrid has no work for them to do. That is, instead of other BOINC projects set to zero priorities inside the client.

I am sure there are many, like me, who would welcome a status report from the project team. The silence is deafening.

I'm not in the project team, but I'm keeping one eye on the project's webpages.

Well! Just got two Gerards for one of my PCs. Queue remains empty...

I see on the webpages (and on my hosts) what you've just experienced:
The long queue is continuously (couple of times in a day) filled with work, without significant reserve.
I guess that the results are needed as soon as possible for the ongoing research, so it's practical to produce short batches with no reserve, as if there were twice as many batches in progress at the same time, it would take twice as much time to complete all of them.

As Retvari suggested, the amount of WU we have now in circulation allows us to have a fast turnover which is ideal for our adaptive protocol, that analyses new returned simulations as soon as they arrive and generates new ones from the results obtained.

Anyway, I will start preparing some extra WU.

I apologize for not answering right away, but as you can imagine sometimes academic life doesn't give you time for even checking the forums...

Would it to not be beneficial to the project to have a policy of only 1 WU for each available and free GPU. You would save a lot of time as some users cache WU's for hours that could be running on a machine that has no work.

As Retvari suggested, the amount of WU we have now in circulation allows us to have a fast turnover which is ideal for our adaptive protocol, that analyses new returned simulations as soon as they arrive and generates new ones from the results obtained.

Could be the "very long" workunits more appropriate for this method? Perhaps a bit better prepared than last time :)
There should be a 3rd queue for this kind of workunits, which would send work only for GTX 780, 780 Ti, 970, 980, 980 Ti, Titan, Titan X cards?

Agreed the process is taking too long, or people are too busy to keep up with it. Lack of units is forcing people into backup projects, which then means you have to be extra lucky to ask for new units just at the right time when there are some, or people will stay on the backup projects forever. At some point alot of people will just stop asking, which then means those who stay could get some work, but will it be worthwhile doing at that point? I hope this does not become a downward spiral for GpuGrid!!

Not sure how I have setup my backup project which is poem@home. But when there's work available for gpugrid my client downloads the WU and my backup project stops and the GPUgrid WU starts. Works well so far

Lots of software changes have been happening in the last three weeks (essentially all analysis/simulation building programs in Matlab have been recoded almost from scratch to Python) and most of the lab has been dedicated to this endeavour. This left me as the only net contributor to GPUGRID and, as I explained elsewhere (https://www.gpugrid.net/forum_thread.php?id=3570), the strategy I took is to send the exact number of Workunits to keep the whole the network busy and allowing a fast turnover following our adaptive scheme (for which each simulation generates a new one after analysis of the results of previous ones). This is the cause for the slight WU availability drop.

I normally set my "Resource share" to 0% anyway for this project, in order to obtain a zero-buffer size. That of course maximizes the time available for crunching and allows me to get the 24-hour bonus with some cards.

Maybe you could set up a separate queue for that purpose? Some people may like it, and others may not.

If they don't have work to give, then... How are they going to satisfy your request?

The solution is simple. Set up 0-resource-share backup projects, and whenever GPUGrid runs out of work, your GPUs will still keep busy with other projects until GPUGrid has more.

Patience ... and backup projects ...

Yes, the main goal should be to do good work crunching. To that end, it's best to have backup projects that you care for too.

That being said, I prefer crushing for GPUGRID for a couple of reasons, both for the kind of work that it does and that it can utilize my computer's available resources more effectively. So, it's nice when there are work tasks available for it.
____________
My BOINC Cruncher, Minecraft Multiserver, Mobile Device Mainframe, and Home Entertainment System/Workstation: http://www.overclock.net/lists/display/view/id/4678036#