Now that the bandwidth problem seems to have gone away (good job on a nice smooth transition - all thanks to the guys in Berkeley!!!) will they think about going back to (at least) the old per cpu core and per GPU limits, rather than the current 100/200 per machine limits?

Now that the bandwidth problem seems to have gone away (good job on a nice smooth transition - all thanks to the guys in Berkeley!!!) will they think about going back to (at least) the old per cpu core and per GPU limits, rather than the current 100/200 per machine limits?

Pretty please!

Let us pray!!!

Unfortunately, the reason for the current limits is that the science data base program has reached the limits of its capacity to handle transactions, not network bandwidth.

Due to this I can't see any raising or removing of the limits in the near future.

I understood that the limits were imposed due to database problems nothing to do with bandwidth.

I'd go further, and say that the smooth-flowing bandwidth and more robust infrastructure (power supply, cooling, remote access, staff available to deal with lockups and hardware failures) should - although it's early days yet - do away with whatever few justifications there were for holding large caches in the first place.

I understood that the limits were imposed due to database problems nothing to do with bandwidth.

I'd go further, and say that the smooth-flowing bandwidth and more robust infrastructure (power supply, cooling, remote access, staff available to deal with lockups and hardware failures) should - although it's early days yet - do away with whatever few justifications there were for holding large caches in the first place.

That's a good point, and one I hadn't thought of.

But maybe, if there is a planned outage again, they could allow us to load up in such circumstance? To bridge the anticipated gap, I mean.

Now that the average turnaround time is under 36 hours would it make sense to shorten the deadline to return the workunits? What is it now, 8 weeks? That means some of those workunits are sitting on the database for many months just politely waiting for a quorum. Eg. is this one ever coming in?

http://setiathome.berkeley.edu/workunit.php?wuid=1168264448

Would halving the deadline mean that the client cache size could be doubled for little affect on the size of the database? Does it work that way?

I understood that the limits were imposed due to database problems nothing to do with bandwidth.

I'd go further, and say that the smooth-flowing bandwidth and more robust infrastructure (power supply, cooling, remote access, staff available to deal with lockups and hardware failures) should - although it's early days yet - do away with whatever few justifications there were for holding large caches in the first place.

Hi Richard,

The database size issue shouldn't be an issue if they just dropped the "number of days" down to two or so, from the standard ten. If we were bumping the limit at ten days, then allowing two days should significantly cut the number of database entries.

It's also "smarter" since there are machines out there that will not do 100wu in two days, so they have too much cache while some of us have not-enough. That's probably why the choice in BOINC was made for "days" instead of "units" in the first place.

And halving the number of CPU units allowed to 50 (or even lower), then giving us at least a 150wu (or more) cache for GPUs could be done without changing the number of database entries. (since a GPU does work 10-20x faster than a CPU, it would make sense to compensate for that a little)

Why would 100wu not be enough for GPUs?

Because there is still the matter of "no work available" messages and the new, longer delays between requests.

Some of us could use a little more buffer against running out of work.

I'm not complaining; I'm just observing. Since the move, a couple of my machines have run dry (and some more than once) due to "no work available" messages. No big deal, but I'm guessing that some less ham-handed limits could eliminate that problem.

What I don't know is if there is something else, like a set proportion of CPU to GPU work allowed that's been programmed into the scheduler. I admit that it's likely I'm showing my ignorance.

The database size issue shouldn't be an issue if they just dropped the "number of days" down to two or so, from the standard ten. If we were bumping the limit at ten days, then allowing two days should significantly cut the number of database entries.

That's been my thoughts pretty much since the server side limits came into place. That way the faster crunchers will be able to remain busy during even the usual weekly outage, and the size of the database will remain small.Grant
Darwin NT

I think the larger work units, once introduced, will be the solution for longer caches. Which will probably also reduce the db load to a point that a limit higher than 100 can be easily chosen.

Hopefully once they feel that everything is happy in its new home they can get back to work on that, or maybe some of the other issues they have not had the time to work on.SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the BP6/VP6 User Group today!

+1 on the larger workunits. That's the way to get more work done per database entry. Fewer files to handle, fewer server requests, also fewer load/unload breaks on GPU's. Hope a new workunit would load down a GPU for many hours.

The database size issue shouldn't be an issue if they just dropped the "number of days" down to two or so, from the standard ten. If we were bumping the limit at ten days, then allowing two days should significantly cut the number of database entries.

It doesn't work that way.
The 'number of days' is a BOINC client cap. The limit on tasks in progress is a server cap. 'they' have no influence on the amount of work the client requests, AFAIK they can only limit 'tasks in progress'.A person who won't read has no advantage over one who can't read. (Mark Twain)

The database size issue shouldn't be an issue if they just dropped the "number of days" down to two or so, from the standard ten. If we were bumping the limit at ten days, then allowing two days should significantly cut the number of database entries.

It doesn't work that way.
The 'number of days' is a BOINC client cap. The limit on tasks in progress is a server cap. 'they' have no influence on the amount of work the client requests, AFAIK they can only limit 'tasks in progress'.

That does complicate things. I was thinking that each project could customize the allowed values for that field so that BOINC behaved with projects with differing needs.

BOINC is estimating how much work a computer can do in any given science task to know how much work "10 days" of work for that project is (in work units). There must be some way of "tricking" it, or setting it, so it calculates SETI tasks taking five times as long as they actually do.

Either that, or set the servers to only serve 20% of what BOINC calls-for, but then I guess BOINC would keep calling...

I suppose I really have no idea how to accomplish setting a two day limit, so I should shut-up.

So, the next-best ham-handed way of doing things would be to limit CPU work units per machine to 25 or 50 (whatever approximates a day's work) and raise the GPU limits by 50 or 75.

... think, think, think, bret....

Ok, ten days of work was too much, too big of a database... check.

We can't lower the number of days... check.

Ten days of work was thousands of tasks... check.

We're currently limited to 100 tasks ... check.

Raising the limits to 250 tasks should result in a database much smaller than the thousands of tasks that were formerly cached... check.

Conclusion: lower the hard number of CPU tasks and raise the freaking limits by some number of GPU tasks and see what happens. Slower machines will still be limited to 10 days and faster machines might keep work; if it doesn't behave, lower them again ...check.

The database size issue shouldn't be an issue if they just dropped the "number of days" down to two or so, from the standard ten. If we were bumping the limit at ten days, then allowing two days should significantly cut the number of database entries.

It doesn't work that way.
The 'number of days' is a BOINC client cap. The limit on tasks in progress is a server cap. 'they' have no influence on the amount of work the client requests, AFAIK they can only limit 'tasks in progress'.

That does complicate things. I was thinking that each project could customize the allowed values for that field so that BOINC behaved with projects with differing needs.

BOINC is estimating how much work a computer can do in any given science task to know how much work "10 days" of work for that project is (in work units). There must be some way of "tricking" it, or setting it, so it calculates SETI tasks taking five times as long as they actually do.

Either that, or set the servers to only serve 20% of what BOINC calls-for, but then I guess BOINC would keep calling...

I suppose I really have no idea how to accomplish setting a two day limit, so I should shut-up.

So, the next-best ham-handed way of doing things would be to limit CPU work units per machine to 25 or 50 (whatever approximates a day's work) and raise the GPU limits by 50 or 75.

... think, think, think, bret....

Ok, ten days of work was too much, too big of a database... check.

We can't lower the number of days... check.

Ten days of work was thousands of tasks... check.

We're currently limited to 100 tasks ... check.

Raising the limits to 250 tasks should result in a database much smaller than the thousands of tasks that were formerly cached... check.

Conclusion: lower the hard number of CPU tasks and raise the freaking limits by some number of GPU tasks and see what happens. Slower machines will still be limited to 10 days and faster machines might keep work; if it doesn't behave, lower them again ...check.

Why would you ask for my 24 core box to go from about a 7 hour cache with 100 tasks to even less? That is just plain rude sir! :P

If they were to adjust the limits again. I would like something along the lines of how they were doing it last time. Which was per CPU/GPU core/processor. IIRC they were using 50 for CPU & 100 for GPU. I imagine they didn't use those values again as they knew it wouldn't do the job.

As they seem to have control over CPU & GPU limits independently. They might considering bumping the GPU limit +50 or so to see how the db takes it. If not they would need to back it down again.

Ideally having the controls on the back end to consistently keep the db under control is the best answer. Larger jobs is another way. Which is probably easier to accomplish.

I think PrimeGrid also uses a limit of 100 in progress tasks. However some of their tasks can run 30+ hours on mt machines that do SETI@Home work in ~2 hours.SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the BP6/VP6 User Group today!