Distributing Content with BitTorrent

Growing up with the internet, it's a bit hard to relate to an earlier time
when the prime means of sharing knowledge was through the medium of paper. It
wasn't that long ago that the only way to brush up on the latest research was to go down to the university library and read your favorite
trade magazine.

Now all you need to do to share your latest and greatest is to mount a
web site on the space your ISP provides. But what happens if you don't have
the available space for something large? Worse yet, what do you do if the
entire rest of the internet suddenly discovers your site and your bandwidth
goes beyond the acceptable limitation? Your ISP will either shut you down or,
worse yet, charge you mightily for "excessive" bandwidth usage.

Everybody knows where I'm going with this: peer-to-peer file sharing is a
technique allowing anybody to publish and access files such as documents,
videos, CD ISOs, even music, without the restriction of bandwidth. It does so
by harnessing the power of other people's computers that also have clients
currently running on the file-sharing system at the same time.

There are many peer-to-peer protocols, each with their own strengths and
weaknesses. Some of them are not well known, others are infamous, while still
others have faded away and gone out of use.

This article shows how easy it is to publish your content online by using BitTorrent.

How BitTorrent Works

BitTorrent has three distinct components: the client, the web server, and
the tracker. The client is the person/machine that downloads the content. The
web server provides a link to a file called a torrent. The torrent is
a specially created file that describes the shared file and the location of the
tracker. This third component is a service that waits for a connection
from a client. It sits on a user-assigned socket that can be either on the same
machine as the web server or at another location. The tracker not only
supervises the sharing of the content between multiple clients, but also logs all
downloading activities. The tracker can manage many files at the same time
from many different torrents on many different web servers. You can even refer
to the tracker by a torrent that you have downloaded as a file on your machine,
eliminating the need for the web server.

Beware the trap of false assumptions: surfing and downloading is so familiar
that people take it for granted that supplying a file via BitTorrent is pretty
much a case of uploading it to some server. No! Peer-to-peer file sharing means
that files come from other clients and not from a server. Instead, the server
manages the mechanics of sharing the file between clients.

Suppose you wish to share an ISO image and you're the only person who has
it. Others can have it only if you are running a client yourself. This first client
is the seed. If another client comes along and wants to download your
content, the tracker guides it over to the first one and his download
begins. Now suppose a third client shows up and wants the same content; the
tracker coordinates three clients together providing the new content to client
No. 3. As more clients connect, the faster the connection becomes, because each
client provides additional bandwidth.

The Scenario: Sharing an ISO

The company I work for, Software
Research Associates (SRA), has supported the creation of a community
project, a live CD called pg_live.
This Knoppix-based distribution profiles
PostgreSQL. It comes equipped with replication, a half dozen programming
languages, and documentation that includes how-tos, FAQs, references, and a
book. It's the only live CD distro I know of that boasts a full-scale
enterprise-ready RDBMS. It first saw use at OSCON, when the PostgreSQL
community gave away pressed CDs at the booth. Recently, we updated the ISO
where it made a big splash at Linux
World Boston.

In early 2005, SRA decided to continue updating the CD and make sure the
PostgreSQL community had full access to the most recent version. The easiest
way to do this was by providing the ISO via BitTorrent.

The Assumptions

I've made a few assumptions in writing this article. If your setup is different, you'll have to adjust it.

I've opted to use the Debian implementation of BitTorrent, which includes
man pages as well as wrappers for the Python-based BitTorrent utilities.

The current version of pg_live image is between 300 and 400MB.

My test server hosts both the web server and the tracker.

Installing BitTorrent on a Debian machine is easy:

# apt-get update
# apt-get install bittorent

Don't worry about missing dependencies, because the installation procedure
resolves them automatically, including the Python programming language.

The Steps

The first step in sharing your file is to create the torrent. The utility
is the Python script bymaketorrent.py. The Debian wrapper is
btmakemetafile. The command line is:

$ btmakemetafile myfile tracker_announce_address [option...]

where myfile is the name of the file that I want to share and
tracker_announce_address is the location where I'll install the
tracker service. There is only one option switch,
--piece_size_pow2. A bigger number means that you can
share/transfer more of the file at a time, but it requires that you limit the number of
connections to your peer once you've reach the network's maximum bandwidth.
Thus with a smaller number your peer will accept more connections, though the file
will transfer less quickly.

Because this was a test platform, I decided to use the localhost URL. The
tracker can listen on any socket. The default is port number 80, but the
BitTorrent documentation recommends using port 6969. I chose port 8099. You can
use either a domain name or an IP address as your URL to the tracker's server.
Remember, you must be root to be able to set up a service that listens on any
ports less than 1024.

From the directory containing the ISO, I used the command:

$ btmakemetafile pg_live.1.3.3-SRA.iso http://localhost:8099/announce

This produced a torrent file named
pg_live.1.3.3-SRA.iso.torrent.

The btmakemetafile utility creates a hash used to verify the
data's integrity as clients download it. The larger the file, the longer it
takes to generate the torrent. The resulting torrent file size varies according
to the size of the file you want to share. For example, an 11MB tarball will
result produce a torrent of approximately 1.1K. On the other hand, a large ISO
of 380MB will increase the torrent to 31K.

Notice that I appended the path announce to my localhost's URL.
This is a hardcoded value in BitTorrent that must always be present in the
tracker's URL. A real directory by the name of
announce is not required to exist on your web server.