OpenCola offers two products that use peer-to-peer
techniques to make content management and retrieval easier,
both of them open source. I talked to chief evangelist and
founder Cory Doctorow by phone and email to obtain some of
the details of their architecture.

Swarmcast distributes content across multiple
systems and provides sophisticated retrieval. Folders allows
people to search for the content they want.
Interestingly, Folders preceded Swarmcast. In other words,
the developers started by helping people find content on
other systems. They had trouble scaling that service
and realized they needed another service to distribute
and retrieve the content efficiently.

Swarmcast is meant for rapidly propagating large (over 1MB)
files that are very popular, such as recently released
movies. People can request a file from a Web site in the
usual manner, but the request is
redirected to Swarmcast. The developer is freed from
providing the enormous bandwidth and complicated storage
networks traditionally required to serve content to a lot of
people. Ultimately, the files spread out in an unplanned but
very efficient way through a far-flung network of users in
classic peer-to-peer fashion.

Swarmcast stores a file by breaking it into multiple chunks
and storing it on a variety of system. So in making the
click that requests the file, each user ends up getting a
variety of different chunks from different servers.

If the requester's bandwidth is large, the system can receive
multiple chunks in parallel and thus retrieve the file much
more quickly than a traditional download from a single
source. Swarmcast monitors throughput on the requester's
system, and checks the availability of peers, so it can
increase the number of sources to the maximum possible. As
an example of the success of the method, Doctorow said that
students in dorms at Carnegie Mellon University were getting
500 kilobytes/second throughput, or better, during tests of
Swarmcast.

As people using Swarmcast get chunks of a file, the system
automatically makes these chunks available to other
requesters; this is where Swarmcast exploits the duplication
and redundancy that peer-to-peer systems are famous for.
However, once somebody has the complete file, his or her
system stops serving up content. Swarmcast does not expect
systems to be available for serving up data all the time.

To make the storage more robust (because systems hosting
content can go offline, as in any P2P system) Swarmcast adds
parity information to each chunk. Each chunk gets 1.5 times
its original size when parity information is added; however,
the requester can regenerate the whole file by combining a
subset of the chunks. For instance, if parity turns a 2MB
file into a 3MB file, each requester needs to download only
2MB of different chunks to reconstruct the file. Parity
thus makes downloading more flexible while guarding against
file corruption.

As with Napster or Freenet, people who request files end up
storing a copy on their own systems. Thus, popular content
multiplies quickly, and people are more and more likely to
find available servers nearby. Like Freenet and Gnutella,
Swarmcast does a cascading referral: when you're asked for
something, you send it back if you have it, and also pass on
the request to your peers.

OpenCola works best for content that suddenly becomes
popular, like a fast-breaking news story or the clip from a
just-released movie. Like many P2P systems, Swarmcast
substitutes redundancy for reliability. (That is, if
somebody takes down his system, it's OK because other people
are likely to host the same content.)

Like Napster, if you visit the system of someone from whom
you've downloaded Swarmcast material, you are likely to find
more files of interest to you. (This is not true in Freenet
because of the anonymity requirement; you don't even know
what's on each system there.)

Like XDegrees, Swarmcast can also determine intelligently
the best system from which to download a file. For instance,
it chooses a system on the same LAN before trying a system
that's far away.

Folders: TiVo for the Internet

Folders is a search system -- sort of a "TiVo
for the Internet," in the words of Doctorow. If somebody has one thing
you like, they probably have other things you like too--and Folders lets
you check what they have. The way to find interesting
content, he says, is to find interesting people. You then
automatically get the content they find interesting.

There is no centralized database as in TiVo. Folders is
designed for anything and everything, so there's no hope of
being able to control the metadata.

In fact, Folders manages to figure out metadata for content
without people having to add it explicitly. When you force
individual users to tag their own content, you are plagued
by low participation, errors, and inconsistent tagging.
Folders just works with anything it can figure out. Content
that interests the same group of people tends to accumulate
in the same place. One O'Reilly editor who looked at
Folders in action said it was a "unique pleasure" to see a
directory fill up automatically and gradually with new, related material.

OpenCola's business model consists of providing servers that
insert data into Swarmcast systems for their customers and
provide access to people behind firewalls who can't run
Swarmcast and Folders directly.

Andy Oram
is an editor for O'Reilly Media, specializing in Linux and
free software books, and a member of Computer Professionals for Social
Responsibility. His web site is www.praxagora.com/andyo.