But doing the compression is a load on the servers. It doesn't matter if it is done on the fly, or pre-compressed, its a load, and most of S@H's servers are running hard enough already.Bob Smith
Member of Seti PIPPS (Pluto is a Planet Protest Society)
Somewhere in the (un)known Universe?

But doing the compression is a load on the servers. It doesn't matter if it is done on the fly, or pre-compressed, its a load, and most of S@H's servers are running hard enough already.

Thats why you'd only compress the scheduler request and not the raw data.

The scheduler requests are typically < 10$ of the size of the raw data. Not going to help much.

The fast hosts with big caches send very big request files, and they send them very often, too.

Which brings us round robin to my original post....
Asking if it is necessary to send a hosts entire cache list on every request or if it would be practical just to send the difference in the cache since the previous request, just sending the entire list for housekeeping every 10 requests or so.

A lot of good ideas have been floated in this thread. Hopefully at some point it can be perused by those in the project able to determine if any of it can be implemented in a manner beneficial to comms and server load.A day without cats is like a day without sunshine.
I speak meow, do you?

And i endorse my opinion of compressing the data files before they are sent!
Cpu's are getting faster and faster, ram cheaper and cheaper. Just look on the latest three servers that the s@h community has sponsored.

Every server has 96GB of ram!

That's alot.

I just tried compressing sched_request and it was 510KB and the file became 17Kb in size.
The stress involved on the cpu's on s@h would increase ofcourse but it could be implemented soo simple, the upload and download servers do their work as usual but they can have a script that watches the up/download queues and every minute a ls *.zip process can be run to pipe the zip files to a decompressor and then delete the file afterwards.
The routine for this doesn't need to be run at the up/download servers, another server internally could just "poll" the directorys and handle the compressing actually freeing that duty from up/download servers.
Ofcourse you need to have some "how much space is there left" in the batch script but this could be implemented as well with a "no biggie" in the boinc infrastructure.
So could we implement Marks and my ideas then the bandwidth would drop tremendously keeping more bandwidth available for true workunit transfers as it should!

(edit)And you need to have a routine that polls what version of Boinc you are using so that this would be plausible. So that the scripts can watch for that version number so it could be supported otherwise leave it uncompressed as usual..(/edit)

And i endorse my opinion of compressing the data files before they are sent!
...
I just tried compressing sched_request and it was 510KB and the file became 17Kb in size.

The sched_request is no data file. And indeed, with the gigantic queues of some hosts, it would be a good idea to compress the scheduler files, but that would require a new client. Perhaps BOINC 7?

Gruß,
Gundolf
[edit]Which, by the way, was the topic of the very first post.[/edit]

Well, almost.
My original idea did not involve compression of data, but rather allowing the host computers to not send the entire cached task list back to the Seti servers on every work request, but perhaps only every 10th one or so. That would involve no extra overhead to the host or the Seti servers for compression/decompression/zip/unzip.
'Just' a change in the Boinc and server logic to handle it.
And LOL...as we have witnessed, any change in Boinc can have it's unforeseen pitfalls.A day without cats is like a day without sunshine.
I speak meow, do you?

I just wanted to correct myself about the statement of the approach.
Actually the boinc client which send units to the servers could be named in a way in the filename so the servers know which boinc id which is sending compressed files.
In that way the servers automatically know which client is using compression just to list the filename for example:

_Idname_filename.zip
_3050453_xxxxxxx.zip

In that way it's easy implemented that boinc at the other end is responding to compressed files as well so the servers can tag 3050453 computer id with compressed files to be sent and recieved.
And if everybody thinks of security and manipulation there is no issue because pure uncompressed files is not secure either at all so there is no problem with that approach either.

I just wanted to correct myself about the statement of the approach.
Actually the boinc client which send units to the servers could be named in a way in the filename so the servers know which boinc id which is sending compressed files.
In that way the servers automatically know which client is using compression just to list the filename for example:

_Idname_filename.zip
_3050453_xxxxxxx.zip

In that way it's easy implemented that boinc at the other end is responding to compressed files as well so the servers can tag 3050453 computer id with compressed files to be sent and recieved.
And if everybody thinks of security and manipulation there is no issue because pure uncompressed files is not secure either at all so there is no problem with that approach either.

//Vyper

But the client can't send a compressed request file unless it is known that the server can accept compressed request files. BOINC != SETI.BOINC WIKI

But the client can't send a compressed request file unless it is known that the server can accept compressed request files. BOINC != SETI.

Ofcourse!

This needs to be implemented as follows.

1. Those who "design" BOINC needs to get this information and apply this on their application.
2. They need to test that the solution works and figure out a futureproof design of the implementation.
3. When they're done, they can release a new "alpha" server version to be installed followed by an "alpha" BOINC client.
4. Seti would update their "Boinc" seti installation on the beta site and request people to try the latest "alpha" boinc client installation.
5. Tests would be performed for perhaps two weeks to see that it works as intended or they would report it to the boinc developers.
6. We pretend that there were errors and a new version has been released. Seti updates beta site and people are encouraged to try the new alpha Boinc client again.
7. Things worked as intended for a week without any hickup. Seti starts to design a implementation protocol and schedule to implement this on seti main.
8. One week has passed with the design and is about to be implemented on the next Tuesday outage. People are beeing informed about the upgrade and could be encouraged to download a new Boinc client and two days before this Tuesday outage occurs the new Boinc alpha client which hopefully hasn't have had any sort of "side effect" is beeing upgraded to be a "latest stable" version.
9. Seti is monitoring their sides and perhaps notices an "ooops" and noticed that the server process which monitor and processes the in/out flow of status files is beeing "capped" by a high load and starts to fall behind.
10. They implement four virtual servers (or instances, scripts on four different loaded machines) , two for each drawer which by filtering is checking for differnt files thus the handling is divided by four different processes guarding those in/out directories and notices that it would now work even if they manage to have a weeks outage that could and will occur in the future by Murphys Law.
11. Things are smooth but they need to trim the servers so they wouldn't drop connections if things aren't as sleek as it should behave.
12. They are an a roll within a week and all major flaws has been ironed out, the patch has been implemented and they never again looked back on uncompressed datafiles and the bandwidth/transactions/requests/resends wouldn't be as capped as it were two months earlier.

There you have it, a small miniproject of a implementation basis and schedule for Boinc infrastructure!
It wasn't that hard was it! ;-)

Kind regards Vyper

P.S All this are based on pure speculations from my PoV D.S
_________________________________________________________________________
Addicted to SETI crunching!
Founder of GPU Users Group

well if they did the data format change as one part and the conpression of request data as a separate part they could stagger it.

The changing of the data format would require no changes to BOINC, but more than likely an app change and various back-end bits, which Josef would be better to comment on than me. It would be a substantial undertaking.

My understanding on the compression of request data is Apache supports gzip and I believe BOINC supports gzip. However it doesn't currently compress the request or response, so that would require a client change. There are other people with better knowledge of it than myself.BOINC blog

And i endorse my opinion of compressing the data files before they are sent!
...
I just tried compressing sched_request and it was 510KB and the file became 17Kb in size.

The sched_request is no data file. And indeed, with the gigantic queues of some hosts, it would be a good idea to compress the scheduler files, but that would require a new client. Perhaps BOINC 7?

Gruß,
Gundolf
[edit]Which, by the way, was the topic of the very first post.[/edit]

Well, almost.
My original idea did not involve compression of data, but rather allowing the host computers to not send the entire cached task list back to the Seti servers on every work request, but perhaps only every 10th one or so. That would involve no extra overhead to the host or the Seti servers for compression/decompression/zip/unzip.
'Just' a change in the Boinc and server logic to handle it.
And LOL...as we have witnessed, any change in Boinc can have it's unforeseen pitfalls.

Perhaps an easier implementation would be an option "Only do communication every n tasks". Instead communication it might be "work request" or something along those lines.SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the BP6/VP6 User Group today!

And i endorse my opinion of compressing the data files before they are sent!
...
I just tried compressing sched_request and it was 510KB and the file became 17Kb in size.

The sched_request is no data file. And indeed, with the gigantic queues of some hosts, it would be a good idea to compress the scheduler files, but that would require a new client. Perhaps BOINC 7?

Gruß,
Gundolf
[edit]Which, by the way, was the topic of the very first post.[/edit]

Well, almost.
My original idea did not involve compression of data, but rather allowing the host computers to not send the entire cached task list back to the Seti servers on every work request, but perhaps only every 10th one or so. That would involve no extra overhead to the host or the Seti servers for compression/decompression/zip/unzip.
'Just' a change in the Boinc and server logic to handle it.
And LOL...as we have witnessed, any change in Boinc can have it's unforeseen pitfalls.

Perhaps an easier implementation would be an option "Only do communication every n tasks". Instead communication it might be "work request" or something along those lines.

This is more-or-less what is being planned for the next version of BOINC (being tested as v6.13.xx, planned for release as v7): rather than continually pestering the servers for one or two tasks to top up a static cache, the idea is to let the cache level run down to a chosen minimum level, and then request work to top it back up to a chosen maximum.

This is more-or-less what is being planned for the next version of BOINC (being tested as v6.13.xx, planned for release as v7): rather than continually pestering the servers for one or two tasks to top up a static cache, the idea is to let the cache level run down to a chosen minimum level, and then request work to top it back up to a chosen maximum.

Hysteresis.
Something I suggested about 4 years ago. So glad they are on the ball bringing in new ideas.

This is more-or-less what is being planned for the next version of BOINC (being tested as v6.13.xx, planned for release as v7): rather than continually pestering the servers for one or two tasks to top up a static cache, the idea is to let the cache level run down to a chosen minimum level, and then request work to top it back up to a chosen maximum.

In the situation we are in now with limits on the number of tasks I think this would be especially useful. Where the heftier machines are returning and requesting tasks 1 at a time. SETI@home classic workunits: 93,865 CPU time: 863,447 hours Join the BP6/VP6 User Group today!

One of the biggest problems, IMHO, are the, too big, caches.
Together with, still too many hosts, returning too much errors!

When new work has being split and is distributed, after a period of little or no work, at all, the demand of filling, Ten Thousends hosts, with
10 days caches, is what were seeing right now! NetWork CONGESTION!

I'd rather see a 1 or 2 day cache, more hosts getting a chance to get work and
when a host produces errors, # is probably, also smaller.

Nobody, benefits from this, more likeley the down-side, a complete stall of network
activity, or with dazzling speeds of 200 Baud per second....

Another result, AstroPulse work, has very little chance of comming through!
The chance of Down-Load errors, leading to the Ghost Problem!

Doubling the size of a MB WU, isn't possible, whithout a major change in the Result
Storage and therefore, not an option.
Leaves, some way of compression, as an only workeble option, also putting more load
on SERVER/SCHEDULAR and hosts connected.

Maybe sending USB-sticks, with 4 GByte worth of work to 'big-cruchers', in the U.S.A., is becomming an option, but only if this can be fully automated.....just a thought, though.

This is more-or-less what is being planned for the next version of BOINC (being tested as v6.13.xx, planned for release as v7): rather than continually pestering the servers for one or two tasks to top up a static cache, the idea is to let the cache level run down to a chosen minimum level, and then request work to top it back up to a chosen maximum.

Hysteresis.
Something I suggested about 4 years ago. So glad they are on the ball bringing in new ideas.

edit]it wasn't 4 years ago, I discussed it with JM7 in March 2008.

And early in BOINC's development it had hysteresis in the form of high water and low water marks. Quoting from David A's checkin_note of July 14, 2002:

- When the client's estimated work falls below low water, it ranks
projects according to their "resource deficit", then attempts RPCs to
project in that order until the estimated work is above high water.

More recently, the DCF sawtooth has been providing hysteresis here by the excursions in runtime estimates it causes.