Wouldn't this mean all your pending uploads would get backed by the same delay time? Then you again get x numbers of jobs trying to upload at the same time. I think the present system, where each failed upload gets a pseudo-random delay time, more effectively spreads out the downstream retries.

YES, and NO... Boinc has an inbuilt setting which only allow x number of simultaneous tranfers (Default = 2 per project)

So after the delay BOINC would try x uploads, then when they fail x more, and so on. That sounds exactly like what happens right now when a frustrated user hits "Retry Now" for all his umpteen pending uploads. Not an improvement, in my opinion.

Wouldn't this mean all your pending uploads would get backed by the same delay time? Then you again get x numbers of jobs trying to upload at the same time. I think the present system, where each failed upload gets a pseudo-random delay time, more effectively spreads out the downstream retries.

YES, and NO... Boinc has an inbuilt setting which only allow x number of simultaneous tranfers (Default = 2 per project)

Just to clarify...(using the default setting of two)

Yes ... ALL would be ready to attempt upload at the same time.

IF the first 2 attempting upload failed, they would all back off again.

or IF the first two succeeded, two more would immediately attempt upload...
continueing until all uploaded ( or one failed, which would initiate a new back off)Flying high with Team Sicituradastra.

I have a simple mind and I am not following this discussion too well ;-\
What is the estimate for me to be able to upload completed work units??
Or should I suspend processing until uploads are again functioning??

I have a simple mind and I am not following this discussion too well ;-\
What is the estimate for me to be able to upload completed work units??
Or should I suspend processing until uploads are again functioning??

Many thanks,
Dave G.
"Per Ardua, ad Astra"[/img]

Wecome to the message boards.

Uploads are getting through, although it's still a bit patchy.

No need to suspend processing,things should be improving with time.Flying high with Team Sicituradastra.

But I accept your point about needing to check the CPU overhead of the unzip process on Bruno when the zips arrive.

Why would they need to be decompressed on Bruno? As zip they take up less space on the disk as well as in throughput. I would expect that once you really need the results that you can import them on whatever server you're using to look at them and decompress them there. Or is that too simple?Jord

Ancient Astronaut Theorists can tell you that I do not help with tech questions via private message. Please use the forums for that.

you all see those short time work units? they take only a few minutes for the cpu or gpu to complete. why not stack those shorties into a big zipped file and send one single to a user. it would reduce the number of simultaneous connections to the server AND client if you had 40 or 50 work units in 1 large compressed file. when the client is done with the work units they can compress them back into one file and send it back to the server.

if a client got bored and only managed to crunch half the work units, it could still compress the half it finished into one file and send that. when the file gets to the server it would decompress the file into the invidivual work units, and count and file away the ones that made it back, and re issue work units for the ones which the client did not complete or error ed out.

this would greatly reduce the amount of hammering the seti servers see from clients begging to upload or download. instead of seeing thousands of little files comming and going, we would see hundreds of large files which means reduced inbound and outbound connections. the 100mbit pipe would still get saturated to capacity, but at least the number of connections would decrease.

you could even go so far as to limit the number of downloads an individual client is allowed per day. set it at for example, 2. only twice a day can a client request a new compressed stack of work units, and only if he has sent the previous ones back already. the boinc app only lets you download up to 100 wu's a day when you first join. why not compress those 100 into 1 file, send it out and hope you get some of it back a week later. and just like the current boinc app, if a client shows that it can handle 100 a day, then the number could gradually increase.

again, it wouldnt solve the bandwidth issue, but it would greatly reduce the number of connection attempts, which are in and of themselve bandwidth hogs.

on the computers on my account alone they are trying to upload a completed work unit every few seconds. this could be reduced to 2x per computer per day.

It seems a lot of the problem is the continual hammering of the upload server with attempt to upload by each result individually.

Why not get Boinc to apply the backoff to ALL results attempting to upload to that SAME server that caused the initial backoff.

This would mean having a backoff clock for each upload server, instead of for each result.

This would mean just one or two (whatever your # of simultaneous tranfers setting) results would make the attempt, then the rest of the results waiting (up to 1000's in some cases) would be backed off as well and give the servers a breather.

Not being a programmer, I'm not sure how difficult this would be to implement (proverbially it doesn't seem like it would be to me), and the benefits of reduced bandwidth wasting should be substantial.

Please feel free to comment.

We were just talking about that last night in the panic thread here. Seems that something like that might be comming. :DSETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the BP6/VP6 User Group today!

But I accept your point about needing to check the CPU overhead of the unzip process on Bruno when the zips arrive.

Why would they need to be decompressed on Bruno? As zip they take up less space on the disk as well as in throughput. I would expect that once you really need the results that you can import them on whatever server you're using to look at them and decompress them there. Or is that too simple?

The advantage of decompressing them on Bruno is that they end up in exactly the same place as they would have done under the existing upload handler: no change is required to all the cross-mounted complexity of the SETI file system. And the files are available immediately and individually: you don't have to teach the validator how to go fossicking around in any number of zip archives when it needs a file.

And that also makes the whole system reversible: if something goes wrong with the remote concentrator, use DNS to point all of us back to Bruno. Gets sticky again, of course, but lets the project limp along until the concentrator is revived.

It seems a lot of the problem is the continual hammering of the upload server with attempt to upload by each result individually.

Why not get Boinc to apply the backoff to ALL results attempting to upload to that SAME server that caused the initial backoff.

This would mean having a backoff clock for each upload server, instead of for each result.

This would mean just one or two (whatever your # of simultaneous tranfers setting) results would make the attempt, then the rest of the results waiting (up to 1000's in some cases) would be backed off as well and give the servers a breather.

Not being a programmer, I'm not sure how difficult this would be to implement (proverbially it doesn't seem like it would be to me), and the benefits of reduced bandwidth wasting should be substantial.

Please feel free to comment.

This idea has already been checked in by the Boinc developer. It will probably be incorporated in the next version released for testing. As I understand it, this was tried a couple of years ago with some negative effect that will need to be looked at again.

Since CUDA was introduced into the project, the Boinc servers have been taking on an ever increasing load, as people populate there spare PCIE slots with extra GPU's adding an extra 112 or more cores each time they do that, your bandwidth woes will only increase exponentially.

Seti@home has become a victim of itâ€™s own success where CUDA is concerned, the best thing to do here is to limit the amount each GPU can download each day through the web interface, cutting it by one third or one half will free up a good portion of bandwidth. This will also decrease the load on the backend as you will not need to create so many multibeam Wu's, increasing the chirp rate will affect the slower CPUâ€™s far more than GPUâ€™s. :-(

I have been browsing stats, and looking at computers attached to Boinc, and I have notice fellow participants who have between 1500 to 5000 wu's downloaded onto there pc's, I would consider this somewhat excessive, this is why I am making this suggestion.

Once implemented you should notice a difference within 24 hours, then hopefully people will not feel so frustrated when trying to upload there finished Wu's. This then will limit the amount of people going red in the face and blowing off steam on this forum, well maybe until the next managed emergency comes along, participants need to remember science does not hurt anyone if it is running late.

Matt mentioned that they are starting to run low on work. I wonder how much of the old tapes are available to run Astropulse. Obviously the regular S@H has been run on the old (1999-start of astropulse) tapes but there are years of tapes to run astropulse on. Does this have to do with the RFI and other factors previously mentioned that they aren't available?In a rich man's house there is no place to spit but his face.
Diogenes Of Sinope

Matt mentioned that they are starting to run low on work. I wonder how much of the old tapes are available to run Astropulse. Obviously the regular S@H has been run on the old (1999-start of astropulse) tapes but there are years of tapes to run astropulse on. Does this have to do with the RFI and other factors previously mentioned that they aren't available?

Yes, it does have to do with the RFI/radar. A while back, the recorder was set up so that one of the 14 channels of data holds the "chirping" of the radar, and the remaining channels have the data that gets cut into WUs. That's what I think I remember reading a long time ago.

Actually, what I remember reading is that we were only using 12 channels at the time, so there were two free channels left, so one was used for the radar chirping, and the other was still available for future use. No idea where to even try to find that reference now.

But at any rate, before that 13th channel was used for radar chirping, there is no way of knowing where the chirps actually are, and that's where the software radar blanker that Matt has been working on comes into play. Once he gets that up and running, it can pre-process the older tapes, find where it thinks the radar is, and fill that 13th channel with the chirps so the splitters can do what they normally do.Linux laptop:
record uptime: 1511d 20h 19m (ended due to the power brick giving-up)

It would need a change to the transitioner/validator logic and timing, to avoid those 'validate error' we get when the report arrives before the upload: but with BOINC doing delayed/batched reporting anyway, the overall effect wouldn't be big. It would finally give me an excuse to ditch v5.10.13 (no point in early reporting!). All it really needs is that if the validator can't find the file it needs at the first attempt, it goes into backoff/retry (like much of the rest of BOINC) instead of immediate error.

It's possible for the scheduler to ignore reports if the file wasn't uploaded yet; and the client would just keep them queued for a while longer and try reporting them later. This can be done for individual workunits (there is a separate 'ack' for each which the client must receive before it gets rid of the task locally).Contribute to the Wiki!

Work is uploaded, and the moment the upload completes it is available for processing on the upload server.

Then, the result is reported. At this point, it is marked in the database as received, and subject to validation.

The validator doesn't have to check to see if the result is in local storage, because it is in local storage by definition.

This change means you have a new state: reported but not in local storage.

BOINC would have to know about that, and have some way of dealing with it (rescanning the database and checking to see if the result is actually here), probably by making the "unzip" process on the upload server report.

There is also a chance that the result gets lost between the off-site server and the "true" upload server.

I like the idea of doing just one, near Berkeley.

What I'm not sure about: the change that Eric made to shorten the "pending connection" queue suggests that the number of simultaneous connections is a big issue, this just moves that issue from the upload server to the server near the edge.

... but, a better idea (related to the thread which I haven't worked my way through) might be to zip all of the pending uploads into one file. All the client really needs to know is what is in the zip -- then let that go to Bruno.

The downside is that you have to push all of the work through in one session, and the bigger the .zip file the more bytes/packets you have to push through in a row....