Wednesday, June 10, 2009

Distributed Computing: More than CPU Cycles

I wrote a previous post about wanting a system to automatically seed Ubuntu disk images, but I have been thinking about this concept at a more general level.

Distributed computing involves people from across the Internet donating their CPU time to a common cause. Why does it have to just be the use of your CPU? In my previous post, I began to introduce this concept involving bandwidth. The other resource that I thought about sharing is disk space.

As an algorithmist, I love the idea of having a constant time algorithm, sometimes called an oracle. One why to simulate an oracle is to cache every possible answer. Well, most algorithms, including my favorite numerical algorithms, have infinitely many answers, so we should just try to cache everything we know.

This is not a foreign concept. GIMPS is a distributed computing project which searches for prime numbers even though we know that that there are infinitely many primes. (As a side note, GIMPS found another prime in 2009.) Similarly, there are infinitely many positive integers and each of them have a unique prime factorization. I think that it would be interesting to create an online service that would return the prime factorization of a positive integer.

This factorization oracle could greatly benefit from a distributed computing project in which people donated their disk space, because this project would require a very large amount of disk space. When a user would query the service for the prime factorization of a number, they would be redirected to the distributed "disk space" user's computer containing their answer.

Whether or not you like my online oracle idea, I think there is some benefit in expanding the distributed computing concept to resources other than the CPU. The mirroring of file servers and the BitTorrent protocol are both ways to distribute bandwidth. The distribution of bandwidth by either of these methods is still more difficult than the distribution of computation, so there is still room for improvement. Finally, I do not know of any current solution that could be considered a type of distributed disk space.

One reason why things might be the way their are now is because of cost. Computation probably costs more than bandwidth which in turns costs more than disk space.

Are their other services that a computer could donate in a distributed computing-like fashion?