BitTorrent

Introduction

BitTorrent on its own is a protocol to enable file-downloading. When
used in conjunction with directory style websites (the most famous one
is the now defunct Suprnova), BitTorrent becomes a powerful tool enabling
users of the system to share large files. The users of the system download
the files from each other but they rely on a centralised system in order
to do this. The Overhaul protocol, however, is an example of a more decentralised
application.

File Sharing (Pre - BitTorrent)

A computer network is a system for communication among two or more computers
allowing exchange of traffic back and forth between them. The Internet
is an interconnected system of networks that connects computers around
the world via the TCP/IP protocol[1]

One of the most popular uses of the Internet is file downloading. Initally
users typically downloaded files from a central server. This method was
restricted by the number of people attempting to download files from the
server.
Napster, brought the P2P method
of downloading to the mainstream. In this paradigm, a user would download
and view files from others using Napster (generally known as peers)
after being connected to them through a central server. P2P file-sharing
is a more efficient way of downloading high-bandwidth material like music
and video. The increase in efficiency can be attributed to each member
of the network using some of their own upload bandwidth to share the files
with others, maximizing the speed at which downloads occur.

In order for a P2P application to be successful the system should adhere
to several criteria [2]

It should have a high availability of different files

The content should be of good quality - ie the network should not be
polluted with hoax files [3]

Flashcrowds should be efficiently dealt with

High download speeds should be available

File sharing applications which followed Napster's lead eventually evolved
into decentralised systems - with no central server(hence they could not
be shut down by the authorities like Napster was.) Examples of these applications
are Kazaa and Morpheous. However there were limits to the success of this
file-sharing paradigm. They were afflicted by "leeching" which is when
a user downloads but refuses to upload. Most importantly, a user could
only download as fast as the user they were downloading from, could upload.
This method does not take advantage of broadband where users had far greater
downloading power.

The BitTorrent Protocol

Illustration of the BItTorrent Protocol
The green blocks represent fully downloaded chunks, available for upload.
The red blocks are the blocks currently being downloaded. Each peer can
upload only two blocks at a time.

In BitTorrent, a file is split into chunks, typically of the order of
a thousand chunks per file. To download a complete file, a user downloads
different chunks of the desired file from other users. The chunks are not
downloaded sequentially, but are based on the rarity of the chunk at that
time. When all the chunks have been downloaded, the chunks are reassembled et
voila, the user has their file. This method of splitting a file into
many pieces, greatly facilitates the sharing of large files such as MPEGs and
software applications. In fact, one of the original applications of BitTorrent
was to share GNU/Linux software.[4]

A seed seed is a computer that has a complete copy of a particular
file, whereas a peer is one that has a partial copy. In order to download
this file, one must simply procure the .torrent file.
.torrent refers to the metadata available from a web server about the file
you wish to download - typcally the filename, size, and the hash of each
block in the file (which allows users to make sure they are downloading
the real thing) and the address of a tracker server The .torrent file is
sent to the downloader's computer when they click on a link and it can
be used for downloading via BitTorrent. When a client finishes downloading
from a seed, it will remain open until it is closed or the 'finish' button
is clicked.[5]

The .torrent file is not stored on the website itself but is distributed
among a number of tracker servers. These servers store a global registry
of all the downloaders and seeds of the corresponding file. When a user
wishes to start downloading, they click on a link from the website to a
.torrent file. The tracker server responds to the users request by sending
back a list of other users that have (part of) the file. A direct connection
is set up between the users after they have bartered for the file. In general,
if a user has high upload rates they will be allowed high download speeds.
If a user has completed downloading a file and stays online, they then
too become a seed for the file.
Should a tracker server go offline while a user is downloading, the process
is not affected as they are downloading from a peer - independant of the
server. However, the user will not be able to commence any new downloads.[5]

The above graph,(our representation of figure 7 of the Register's study [2])
deals with another aspect of Bit Torrent. Here the horizontal axis represents
the number of seeds for a file after its injection into mainstream traffic.
The vertical axis shows how long the file needs to stay available so
that a given number users can download it.

As the number of seeds increases, the lifetime of the file dramatically
declines. This exemplifies the need for users to become seeds, in order
to reduce download times. The attraction to becoming a seed is low because
all upload capacity is used for distribution of one file. Obviously, as
the number of seeds increases, not only are you a seed for a shorter time,
but the bandwidth used for uploading this file is reduced. Even though,
as time goes by the number of seeds for a particular file decreases, the
file is available so long as there is at least one seed available.
When no seeds are available, a user with the complete file must come in
and act as the new seed. This is known as reseeding.
A Swarm is a term given to a group of computers, potentially including
both seeds and Peers that are connected for a particular file. If a Swarm
of Peers has a complete copy between them, but none of them possess it
individually this is said to be a Distributed copy. [5]

Choked is a term from the BitTorrent protocol which indicates
a state an uploader is in if it refuses to send anything on that link.
It usually happens when a client has too many simultaneous uploads i.e.
a limit is set to the number of files a computer can upload concurrently
and when another file is requested it will be denied. Interested is another
term which specifies that a downloader wants something from an uploader.
It is used when the choked flag is in use, to let it be known that whenever
possible, a connection is wanted.Snubbed is term from the protocol that indicates that a client
has not received anything for a substantial period of time. It is used
to improve download speeds i.e. to let other peers know that it has been
ignored and that it should probably receive some attention.[6]

The above graph displays measurements taken during a study on the tolerance
of Bit Torrent to the "FlashCrowd Effect"(Our representation of Figure
5 of The Register's study[2]). The blue line
shows the number of downloads of the tracker for "Lord of the rings 3" from "FutureZone.TV".
The red line signifies the number of seeds for the file.

For the first five days, "FutureZone.TV" was the only seed, causing the
sustained high download rates. As the number of seeds increases, the download
rate dropped for the website. Notice that a small increase in the number
of seeds dramatically decreases the load on the provider. From these results,
it can be concluded that bit torrent is well capable of handling large,
sudden crowds.

Decentralisation

Traditionally, the bit torrent model has relied on a small number of centralised
servers to provide the trackers for specific files. There is an inherant
problem with this model. If any number of servers go down, this places
an extra burdon on the remaining few. Also, because of their public nature,
it makes them susceptible to attack. Although this has been shown that
this is difficult to do, it is also a comon fact that a determined hackers
usually finds a way.
When a popular new file is introduced by a seed there can be a large surge
of simultaneous downloads from this source. This is termed the 'flash-crowd-effect'.
A way to avoid or at least alleviate this bottleneck is to decentralise
the system. This eliminates the need for a central server, while increasing
the load on client machines, as they need to track who they are downloading
from and who they are uploading to. This user must then put his .torrent
file and tracker on offer, either by e-mail or on a website. In this scenario,
a new user must then initiate his download from a different location from
where the original file came. (One example of this is "Exeem". This is
not as popular among the bit torrent community, as it is closed source
and installs adware on client machines.)
A possible disadvantage of this system is administration. When supernova
was still in operation, an article was written detailing the performance
of bit torrent. [7] One test they carried out was to
donate an account for hosting a mirror, and adding spyware to the html
code. This experiment failed. All the corrupted code provided by them was
surprisingly filtered out. Unfortunately this system of moderation relies
on global components and is difficult to distribute because there are no
mediators to administer the file content. Serious security issues are thus
raised. A simple case to illustrate this is if a client downloads a file,
so is registered for having that file, and then renames a file containing
corrupted data. Since the infrastructure is not there to oversee the file
sharing, this file may well infect many machines. Another problem that
may arise is obtaining the torrent.There is no longer a "torrent site" from
which to download. So in order to get a torrent, a much more difficult
search is needed.

Pure Decentralisation with the Overhaul Protocol

When a file is large (as in a gig or more) it is better to divide it into
separate chunks. One way of doing this is using Overhaul ([8].
Overhaul changes the HTTP of an overloaded server so it acts like a peer
to peer network. It splits up the requested document into n chunks. Each
request results in a response that includes the ith chunk and the IP address
of m other clients accessing the document. A signature for each of the
n chunks is also provided in the header. A client supporting Overhaul connects
to other clients to retrieve the remaining chunks. This saves bandwidth
utilized by a regular fetch. By transferring only a small portion of the
document, the Overhaul process frees up the server to satisfy requests
from other clients.

The Overhaul Protocol

An example is shown above; four client request for the same document off
the server. The server becomes overloaded and goes into Overhaul mode,
where it splits the document into chunks and distributes these amongst
the clients. The client collaborate (using the headers of the chunks) to
merge the chunks together so that each can form a coherent document. This
is different to BitTorrent which is a specialized tool for distributing
large files over existing peer to peer networks, (the Internet). Also BitTorrent
requires a dedicated tracker and meta-info file for each requested document,
resulting in extra traffic to the server.

Conclusion

As it can be seen Bit Torrent may be ushering in a new paradigm in downloading.
At the moment its the cheapest and one of the fastest ways to share lage
files in the mainstream medium - the internet. But as with everything there
are the disadvanges. One of the less attractive features of Bit Torrent
is that a user in turn has to become a seed and stay on-line to share its
copy of the file requested. This can tie up one's upload bandwidth with
unwanted traffic. Whatever disadvantages lie in bit torrent, if it is used
properly, these are quite acceptable.