You are here

A bit about BitTorrent

I’ve been hearing about BitTorrent for at least a year. It’s an exciting technology in principle, because it solves traditional central repository file distribution problems, uses peer-to-peer (P2P) file sharing technology, and is written in my favorite programming language, Python. All interesting material—but what about practical utility? I first actually used BitTorrent earlier this year, when I needed to get copies of the Elephants Dream and The Boy Who Never Slept movies in order to view and review them. I found it to be a great tool that really ought to be used even more than it is. But many people still don’t seem to understand how it works. Not surprising, I suppose, since on the surface, it sounds impossible. It’s actually a pretty simple idea.

Time out

I know I promised a bit more philosophical issues for the next few weeks, but I’m still working on those posts and I have very little time to blog this week. This is based on an email I sent to a friend about a month ago, explaining why BitTorrent is so useful.

What is a torrent file?

Basically a torrent download is like a scavenger hunt and the .torrent file is the list of things you have to find. I can’t imagine how a client could possibly get the file without getting the .torrent file first.

It’s sort of a map of the file. It tells how many chunks it’s broken into, file hashes for each chunk, and the URL of the “tracker”. The client then goes to the tracker for pointers on who to start getting chunks of the file from.

The idea is to move the burden of the download from the “server” to the “clients” by having the clients serve each other (of course they’re no longer strictly “clients” once you do that—so BitTorrent has its own terminology, which I don’t completely know yet).

In a conventional download, if you have 100 people who want one 500 MB file, you send them all to one “server” which then gives each of those 100 people the entire 500 MB file. This is hugely demanding on the server.

With a torrent, you instead “seed” the file by breaking it up into 500 chunks and giving each downloader a few chunks. Then you tell them to sort it out amongst themselves. The clients trade chunks of file with each other until all 100 users have all 500 chunks.

The .torrent file is the spec that each client uses—that’s how it knows when it has acquired the whole file and whether all the checksums match.

The tracker is a server (probably also run by you) which acts like a traffic-controller—if clients can’t figure out where to get a piece of the file, they ask the tracker for directions (they can also discover this information from each other). So there’s a lot of short control packets going back and forth between clients and the tracker.

I know it sounds kind of crazy, but it works. And, especially for large files, it’s much more reliable than ordinary downloads because it does all those checksums along the way. And, of course, it greatly reduces the server bandwidth. Obviously the total bandwidth is a little bit higher because of the control data being passed around, but it’s distributed over all the downloaders’ machines.

I experienced this personally when I downloaded the movies The Boy Who Never Slept and Elephants Dream that I reviewed awhile back. Those were monster files (multiple gigabytes), but the .torrent files were about 30-40KB in size.

I installed and used “qtorrent”, one of several GUI torrent client applications (predictably, one based on the Qt GUI toolkit) that are available. It worked very nicely, but I haven’t got much to compare it with. My impression is that the clients are probably pretty consistent, so which one you use probably has more to do with your GUI environment preferences.

The BitTorrent documentation says that a torrent download should begin when you click on the torrent link in Mozilla, but this never worked for me. Instead, I had to save the torrent file to disk, then open it with qtorrent in order to begin retrieving the file. This is actually very convenient, because it means I can save the small torrent file immediately, and then actually download the monster files when I’m not making much other use of my internet connection—a nice feature if your broadband connection is intermittent or not quite as broad as you’d like!

When I start it, qtorrent stops and makes a big “skeleton” version of the file (it allocates all the necessary space). Then it starts filling in chunks, and tells me how many I need, how many I’ve got, and gives me a percentage completion bar, etc. It’s considered polite to leave the torrent app running for awhile after you download the whole file, because it then acts as a seed for others.

BitTorrent avoids cheating or “leaching” (only downloading, not uploading) by the algorithm it uses to decide priorities for which peers to serve—the ones that play nice get served first (i.e. it’s a “tit for tat” algorithm).

This is quite different from gnutella or kazaa which are completely decentralized file-sharing applications. With BitTorrent, you still have a server which moderates the download process and you have to get the torrent file to know what to download. It just uses peer-to-peer methods to complete the file.

Hosting BitTorrent files

If you want to distribute your own files via BitTorrent, you can save a lot of load on your server, which can be a life-saver for small, community-driven projects where maintaining a high-traffic server isn’t really an option. The great thing about BitTorrent is that it scales automatically: the more popular your project, the more BitTorrent saves you on bandwidth. Or to put it another way, the growth with project popularity is more or less nil—you have an almost constant effort regardless of how many people are downloading (in technical terms, the burden of sharing a file among “N” users probably follows “order of log N” growth).

The .torrent file which describes the download. Each client gets this first (by ordinary HTTP)

A tracker which will serve clients when they need help to find parts of the file

A “seed”, which is basically just like one of the clients, except it already has the whole file, so it only gives out chunks

I think the same tracker can be used for many downloads.

Anyway, there’s a HOWTO on providing torrents, which I was mostly trying to ignore while looking for information on how to download, but what I saw didn’t look that hard.

So you just keep track of the .torrent file downloads if you want to know who’s getting the file. This can be important for traditional, advertizing-based distributors. Theoretically, of course, people could pass torrent files to each other, but in practice, they won’t—if yours is the authoritative source, and the file isn’t big enough to load your server (similar in size to a normal HTML web page), then they’ll want to get the torrent direct from you.

Torrent distribution is optimal for large downloads for large numbers of users. The typical sorts of things that are distributed this way are entire GNU/Linux distributions (ISO 9660 CD or DVD images), entire feature-length motion pictures, and so on. Those tend to be hundreds of megabytes, or even multiple gigabytes in size, and have many people wanting to download them, often simultaneously. This is the case that BitTorrent is designed to optimize.

For large enough files, where downloads may take hours or days, however, even a couple dozen users will make overlapping requests. Alternatively, even relatively small (a few MB) files may have overlapping requests if there are thousands or tens of thousands of users—as in the case of a blockbuster hit song or video short.

The alternate case, where BitTorrent doesn’t help, is the “deep catalog” case where you have a server with a huge amount of small content and collisions between users asking for the same information is rare (this might describe a government document server, an OGG/MP3 music catalog site, or an ordinary static website). Such a site wouldn’t gain from BitTorrent, and loads on such a server can only be solved by upgrading the system—but note that in this case, bandwidth upgrades will generally only be needed when storage, memory, CPU, and even number of servers have to be increased as well.

Generally, though, if either the number of users is very large (100/day+), and/or your distribution file sizes are large (10MB+), then you really need to make friends with BitTorrent. Note that the motivations for using BitTorrent are pretty nearly the same as the motivations for upgrading your download server’s bandwidth. Basically any time you start thinking “I’ve got too much load on my FTP server, I need to upgrade my bandwidth,” you should start thinking about BitTorrent distribution instead.

I just read your article and found it quite interesting. But I also thought: "Why do so many people cherish BitTorrent while not seeing the potential of filesharing networks like Gnutella for their Websites?"

But since I don't want to write a full article just now, theres one specific thing I want to talk about: The part of Gnutella which works just like BitTorrent, just decentralized and without fair-use practics.

That part is called the "Download Mesh", and it works quite simply:
If you ask a client for a file, it also tells you up to 10 (in some cases some more) other sources, and it tells you which sources are "bad" sources.
If 3 other clients tell a client that a source is bad and noone else reports it as good, it is discarded, which means no download is attempted from that url.

Doing fair use is naturally much harder in a decentral system, where you have no central authority, so it isn't yet done.

But now you might ask yourself: What does using Gnutella get me, that BitTorrent doesn't?

The answer is: Users can search the Gnutella Network, and they can find your website by finding your files, so you can get publicity through the content you deliver.

There's also a pendant to torrent files: magnet links (one client, http://phex.org , also integrated magma-files, which are lists of magnets).

Formerly those magnet-links where quite ineffective, since those who clicked on them then had to search the network to find a source (or several) from which to download (and through which to enter the download-mesh of that file).
But this has changed through services like http://www.freebase.be/ , which may be in their early stages, but offer much potential.
To make a magnet more reliable, a webmaster can now simply add the freebase cache to the magnet, which then records the IP of the requesters and supplies the last 10 other requesters, which get the user into the download mesh of that file (more Infos: http://www.freebase.be/g2cache.php ).

Also this script keeps statistics, which (in my opinion) where a central key to BitTorrents popularity. That way the webmaster can see how often a file is being downloaded, even though it is being downloaded decentrally.

As another plus, magnets allow the webmaster to include a http-url. So delivering a file via p2p-networks which is already being distributed via standard http is dead simple. Just create a magnet, add the http-source and the freebase cache and put it on your wesite alongside the http-url.

I'll stop here, since I intended to write only a short summary, but if you did take interest in Gnutella, the Download-Mesh, Magnets and Magma for Websites, you might want to check http://gnufu.net , a guide to Gnutella with information about Magnets and Magma-Lists, all written in userfriendly style, avoiding code diving while using technically correct analogies, which make the guide readily understandeable. And mostly using shorter sentences than the last one ;)
And you'll also find on gnufu.net some more info on how to create magnet-links.

Best wishes,
Arne Babenhauserheide ( http://drakto.de )
PS: Yes, I did write that guide, but heck, I think I managed to do it right :)

I got in to BitTorrent when looking for shows that didn't get TiVo'ed for whatever reason, whether a power-outage or schedule change. So this would be a "bad" use. I was wrongfully taking intellectual property. Horrors. It took a little hunting and the downloads were fairly slow, but it worked. I didn't miss out on my shows. (Ok, I don't mean to be flippant about the legality of it. But come on, they broadcast these shows for free, and with TiVo I wasn't going to watch their commercials anyway. And now that they offer some shows through legit channels online, like "Lost," I'd probably pay the $2 or watch it with commercials in the rare case when I miss a show. Besides being officially sanctioned, it would be easier than digging around for BitTorrents.)

I tried it for one of Bob Cringely's NerdTV segments but there wasn't nearly enough critical mass to get any Bit Love.

I understood the principle of BT and thought it was a great idea, but it wasn't until I downloaded the Fedora 5 distribution that I finally experienced the joy. It was so cool to be able to get all 5 CDs in just a few hours.

So the entertainment industry might look at me and say, you filthy pirate, you stole our content. This is why we shouldn't have programs like this.

But the distro download illustrates perfectly legitimate uses. And it is a good example of why the RIAA and MPAA and so on shouldn't be dictating the technology we use. They would squash this wonderful mechanism for distributing files.

I suppose I might be hurting the cause by mentioning the TV show thing. I've been mulling over if we should try to strictly follow the rules as they exist today or if we should engage in some civil disobedience.

You should check out metalink (http://www.metalinker.org/), it uses bittorrent and mirrors.

'Metalink makes complex download pages obsolete by replacing long lists of download mirrors and BitTorrent trackers with a single .metalink file. As you might have already guessed, a .metalink file is a file that tells a download manager all the different ways it can download a file. The file itself takes the form of an open XML standard that can list an unlimited number of HTTP and FTP sources as well as BitTorrent trackers and ed2k and magnet links.'
http://www.downloadsquad.com/2006/08/28/metalinks-integrated-bittorrent-http-and-ftp-downloads/

I have been a user of BitTorrent since last year. It is indeed a very practical program to have. IN fact, I have been addicted to it for some months. I'm clean of it now though. But I still use it from time to time. Anyway, I never really did understand how this worked, just marvelled at it from afar. BUt now I do, thank to you!

Author information

Biography

Terry Hancock is co-owner and technical officer of Anansi Spaceworks. Currently he is working on a free-culture animated series project about space development, called Lunatics as well helping out with the Morevna Project.