And the files are going to have different names XXXXX_0 and XXXXX_1. Will squid know that these are the same file?

At least I believe that this is the case.

I looked at a work request, and it appears that the file name does not have the suffix. In other words, the scheduler says "go get XXXXX and return the result named XXXXX_0" -- but I haven't read the documentation to make sure.

The squid cache machine would need a SETI@Home IP, so it used the HE bandwidth, boinc2.ssl.berkeley.edu would point at the squid address, and told how to find the "true" download server.

It looks like it is possible. The best part of the idea is that once configured, it would require a minimum amount of attention (as simple as making sure the machine is up). It would require a little bit of rack space in a high-bandwidth spot, and some help from Campus to make it all play.

Not convinced, yet, that this squid box idea will work, but isn't there still going to be lots of comms between the squid box and the main servers.
How long do the WU's live on the quid box?
What happens with re-issues,
Who or what moves the WU's up and down the hill? Surely not a student as this is a 24/7/365 job, and also not one for the present staff, they are too overworked and there are not enough of them as it is.
Who re-stocks squid at 3 in the morning when it is realised that all the previous issued batch of tasks were VHAR and therefore are only a third of what is really required for the next period.

The more I think of this, the more and more I am convinced this is not a good idea, the manpower and logistics are just not in place or can be afforded to make it work.

Suggest if you do, that you need to completely think the idea through and then present it.

Not convinced, yet, that this squid box idea will work, but isn't there still going to be lots of comms between the squid box and the main servers.
How long do the WU's live on the quid box?
What happens with re-issues,
Who or what moves the WU's up and down the hill? Surely not a student as this is a 24/7/365 job, and also not one for the present staff, they are too overworked and there are not enough of them as it is.
Who re-stocks squid at 3 in the morning when it is realised that all the previous issued batch of tasks were VHAR and therefore are only a third of what is really required for the next period.

The more I think of this, the more and more I am convinced this is not a good idea, the manpower and logistics are just not in place or can be afforded to make it work.

Suggest if you do, that you need to completely think the idea through and then present it.

Re-issues are only a couple percent of the total number of results downloaded.

The problem is the file names. If they are different, SQUID is unlikely to realize that they are the same contents.BOINC WIKI

Not convinced, yet, that this squid box idea will work, but isn't there still going to be lots of comms between the squid box and the main servers.
How long do the WU's live on the quid box?
What happens with re-issues,
Who or what moves the WU's up and down the hill?

I'm not convinced either, but these questions are not the reason.

If you look in the BOINC data directory, you'll see the last scheduler response, scanning through it:

Your BOINC client resolves boinc2.ssl.berkeley.edu and connects to it to transfer data.

The resolved IP is the Squid box.

Squid searches the cacne and doesn't find /sah/download_fanout/3e4/21no08aa.25514.8252.11.8.39, so it uses rules in the configuration file to change the host name (maybe it's "secret-boinc2.ssl.berkeley.edu") and connects to the REAL download server, and transfers the file.

It stores a copy (it doesn't know if anyone will ask for it again) and sends a copy to the original request.

Answers to questions:

How long do they stay: Depends on the hard drive, but it's not important as long as most paired downloads occur before the cache expires. Something in my head says an hour would be plenty. It can always get the WU again.

(A second request within an hour would skip talking to the "real" download server)

What happens with reissues: Squid checks the cache, and then gets the work from the download server.

Who or what moves the work: Squid does, automatically. It also automatically manages the cache.

The idea is that the cache would give a 2:1 leverage -- that files currently transit the 100mbit line twice, and this way, they'd transit once.

In reality, it'll be something between 2:1 and a slight bottleneck caused by extra overhead.

100% agreed. But would this box even need to be connected to the SSL servers? Why not a dedicated box that only distributes pre-split WUs to clients on request. This could be housed somewhere down the hill from SSL and would simply require someone from the Lab (one of the students) to come by and hot-swap in a fresh drive full of pre-split WUs every few days.
That's all this box would do, just a distribution mirror - clients still return all processed data to the SSL servers, as usual.
I emphasise that this box does not take over any function of the current servers - it just acts as an additional distribution service, which should take some pressure off the SSL servers. If it works, then maybe several could be set up on campus?

This is exactly what I suggested in my post in this thread. I think the existing 100 mbit pipe can handle the DB traffic, so maybe the splitter could also be moved outside the lab, if needed.

100% agreed. But would this box even need to be connected to the SSL servers? Why not a dedicated box that only distributes pre-split WUs to clients on request. This could be housed somewhere down the hill from SSL and would simply require someone from the Lab (one of the students) to come by and hot-swap in a fresh drive full of pre-split WUs every few days.
That's all this box would do, just a distribution mirror - clients still return all processed data to the SSL servers, as usual.
I emphasise that this box does not take over any function of the current servers - it just acts as an additional distribution service, which should take some pressure off the SSL servers. If it works, then maybe several could be set up on campus?

This is exactly what I suggested in my post in this thread. I think the existing 100 mbit pipe can handle the DB traffic, so maybe the splitter could also be moved outside the lab, if needed.

The DB is getting hammered. There are already problems with DB access speeds (this was the last bottleneck that was worked on).BOINC WIKI

100% agreed. But would this box even need to be connected to the SSL servers? Why not a dedicated box that only distributes pre-split WUs to clients on request. This could be housed somewhere down the hill from SSL and would simply require someone from the Lab (one of the students) to come by and hot-swap in a fresh drive full of pre-split WUs every few days.
That's all this box would do, just a distribution mirror - clients still return all processed data to the SSL servers, as usual.
I emphasise that this box does not take over any function of the current servers - it just acts as an additional distribution service, which should take some pressure off the SSL servers. If it works, then maybe several could be set up on campus?

This is exactly what I suggested in my post in this thread. I think the existing 100 mbit pipe can handle the DB traffic, so maybe the splitter could also be moved outside the lab, if needed.

Moving parts of the setup is not an option according to Matt's post. Its an all-or-nothing kind of deal.

More importantly, you'll get a 10:1 improvement over that at present in that the squid box will be facing the 1000Mbit/s link.

A hypothetical cache would be limited by the 100Mbit link for fetching files from the download server.

It could at most get a 2:1 improvement because one workunit file is sent to no more than two people.

To get a 10:1 ratio, you'd need to send that work to ten people.

That's where the idea starts hitting limits: there is too much randomness.

Not to mention that if the outgoing connection to everyone else is 1Gbit, but the incoming connection to the splitter/db is 100Mbit, you're sending faster than you're receiving so there will still be a slight bottleneck, but it should still be an improvement.

More importantly, you'll get a 10:1 improvement over that at present in that the squid box will be facing the 1000Mbit/s link.

A hypothetical cache would be limited by the 100Mbit link for fetching files from the download server.

It could at most get a 2:1 improvement because one workunit file is sent to no more than two people.

To get a 10:1 ratio, you'd need to send that work to ten people.

That's where the idea starts hitting limits: there is too much randomness.

Not to mention that if the outgoing connection to everyone else is 1Gbit, but the incoming connection to the splitter/db is 100Mbit, you're sending faster than you're receiving so there will still be a slight bottleneck, but it should still be an improvement.

The place where it would work really well is whenever a new science app. needs to be downloaded -- assuming it comes from the DL server of course.

More importantly, you'll get a 10:1 improvement over that at present in that the squid box will be facing the 1000Mbit/s link.

A hypothetical cache would be limited by the 100Mbit link for fetching files from the download server.

It could at most get a 2:1 improvement because one workunit file is sent to no more than two people.

To get a 10:1 ratio, you'd need to send that work to ten people.

That's where the idea starts hitting limits: there is too much randomness.

Not to mention that if the outgoing connection to everyone else is 1Gbit, but the incoming connection to the splitter/db is 100Mbit, you're sending faster than you're receiving so there will still be a slight bottleneck, but it should still be an improvement.

The place where it would work really well is whenever a new science app. needs to be downloaded -- assuming it comes from the DL server of course.

I don't know, but if the high bandwidth was caused by downloading the AP 5.05 application, then this particular "burst" could have been downloading the app. as needed.

100% agreed. But would this box even need to be connected to the SSL servers? Why not a dedicated box that only distributes pre-split WUs to clients on request. This could be housed somewhere down the hill from SSL and would simply require someone from the Lab (one of the students) to come by and hot-swap in a fresh drive full of pre-split WUs every few days.
That's all this box would do, just a distribution mirror - clients still return all processed data to the SSL servers, as usual.
I emphasise that this box does not take over any function of the current servers - it just acts as an additional distribution service, which should take some pressure off the SSL servers. If it works, then maybe several could be set up on campus?

This is exactly what I suggested in my post in this thread. I think the existing 100 mbit pipe can handle the DB traffic, so maybe the splitter could also be moved outside the lab, if needed.

Moving parts of the setup is not an option according to Matt's post. Its an all-or-nothing kind of deal.

Matt doesn't have time to post all possible details of how BOINC works, there is documentation and source code elsewhere for those who want full detail. Certainly most parts of the BOINC backend need to interact with the database and therefore need to be kept local. Gigabit ethernet is needed for those transactions, 10Gbit would be even better. The splitters definitely need database access.

The download server is an exception to the rule, it does not interact with the database. In fact, there is no "download server" code in BOINC, the splitters simply store created work in a directory structure. The URL of the WU is passed to hosts, after they report the work done and validation is complete the file deleter removes the WU. That allows the download server to be located anywhere, and I believe some projects operate that way. But of course that requires a remote file server to have all the WUs which it is serving, around 6 terabytes for this project now. The proposed squid proxy would need about 45 gigabytes for 1 hour of cache.

As the project is operating now, the 2 initial replication tasks are sent within a few seconds of each other. The proxy would certainly provide the 2:1 reduction for those cases. If one of those hosts immediately trashes and reports the error, a reissue is generated and put at the end of the "Results ready to send" queue. If that queue is short the reissue can be sent before the proxy cache expires, and typically the queue is shortest when the project is in difficulty. But it's a small advantage, probably squid would perform better with a smaller cache. Could a 3 minute cache ( ~2.25 GB) which would easily fit in memory be a better approach?

As the project is operating now, the 2 initial replication tasks are sent within a few seconds of each other. The proxy would certainly provide the 2:1 reduction for those cases. If one of those hosts immediately trashes and reports the error, a reissue is generated and put at the end of the "Results ready to send" queue. If that queue is short the reissue can be sent before the proxy cache expires, and typically the queue is shortest when the project is in difficulty. But it's a small advantage, probably squid would perform better with a smaller cache. Could a 3 minute cache ( ~2.25 GB) which would easily fit in memory be a better approach?

Joe

Just gaming this around in my head, I doubt there would be a big difference in "gain" between a few minutes of cache and a few hours of cache, since most of the "hits" would be two downloads in short succession, and then the file would sit until expired.

If I was doing the work, I'd set up to measure the effects, and then experiment until I found the optimal rate.

And the files are going to have different names XXXXX_0 and XXXXX_1. Will squid know that these are the same file?

At least I believe that this is the case.

Input filenames are unrelated to workunit/result filenames. And the file isn't duplicated for each result (that would make no sense at all). Only output files have the _0, _1 suffix.Contribute to the Wiki!

A possibility that might help would be to generate a second task or not based on the "reputation" of the first computer to get the task. A computer that keeps returning no errors that validate would keep getting its reputation increased. You ought to be able to get around 40% of the bandwidth back.