I have a hard time imagining this has not been considered before, and I know the SETI@home team has been pretty busy recently, so this is a "maybe someday" request which might already be in the planning stages.

Perhaps allow home users to do splitting? I know this would require several different layers of implementation, but it seems as if the science the team is actually capable of performing is limited by the sheer amount of beancounting and non-analysis work that is required to keep new data coming in.

I'm envisioning the following implementation:

1) The same raw data is sent to several different users.

2) That data is then split onto work units independently by each user.

3) The resulting hashes are crosschecked against each other.

4) When enough identical readings are received, each user with a "good" dataset is assigned to analyze a certain % of the resulting data that they already have on their machines.

This implementation would remove splitting from the serverside, it would remove the need to send the independent workunits to at least some users, and these resources could be used for actual science.

How would one get the large initial data sets to users? I'm thinking that one might be able to find a respectable p2p provider out there that would be willing to work with SETI@Home & the BOINC project. That way the data would only need to be sent once, and the p2p network would propagate it to other SETI@Home users. I know that several MMO's use dedicated channels in p2p networks for their client downloads, so this would be nothing new for them.

The same p2p network could be used for crosschecking the workunits generated by splitting.

No can do unless you can include some completely robust checking to check for cheats.

... And the robust checking would require wasted duplicated effort in doing the splitting...

<snip some interesting stuff>

Keep searchin',
Martin

Well, the Robust data checking would simply be to give the same data to multiple users with comparable work completion rates, compare results, and not allow clients to work on and report results on work units that are not "registered" as coming from one of the set of clients that generated identical data. Since I don't know how many work units are generated by a typical "split" data set, I can't say what would be an efficient number of clients per data set. I strongly suspect that each work unit is distributed several times in any case, under the current system. The level of clients required to be inline with the redundancy level required for a sufficiently robust data cross checking for client splitting might be perfectly in line with the number of clients that rework data to perform current levels of work unit verification.