Today's snazzy systems paper is High
Availability, Scalable Storage, Dynamic Peer Networks: Pick Two,
another 6-pager from HotOS IX. Surely you
are familiar with all those peer-to-peer systems papers motivated by
greed-tinged descriptions of gigabytes of unused disk space just
waiting to be used on people's personal computers? Charles Blake was
skeptical of a world where unused storage falls like manna from the
sky, so he created a few models and looked at some real-world data.
It turns out that with realistic node membership times, member node
bandwidths, and number of data copies necessary to provide reasonable
availability, the amount of data that can be stored in a peer-to-peer
network is severely limited by cross-network bandwidth, rather
than available disk space. Blake puts it all so much more politely in
his paper, with sentences like "We conclude that when redundancy, data
scale, and dynamics are all high, the requisite cross-system bandwidth
is beyond reasonable expectations."

This makes sense intuitively: How did most Gnutella users behave?
They connected to the network long enough to download the songs they
wanted, over their 56Kbps modem or 256Kbps cable modem, then left the
network as soon as they were done. Most files were actually served by
a relatively small number of reliable, high-bandwidth hosts; over a
3-day trace, the most reliable 5% of Gnutella peers provided 40% of
the service. More pointedly, the authors estimate that the best 5% of
Gnutella peers could be replaced by 10 university-run servers on cheap
PC hardware - at which point the question becomes, "Why use many
unreliable peers when a few reliable servers can serve as much data?"
The answer is "When anonymity, security, or freedom from central
control is important" - but never for performance or storage capacity
reasons. In my opinion, this paper is a must-read for anyone working
on peer-to-peer systems - but don't blame me if you switch fields
shortly thereafter...