Tuesday, March 25, 2008

Integrate Bittorrent into Apt-get

This was just an idle wondering to see if such a system was already implemented in some deep bowel of the internets.

BackgroundBittorrent is a peer to peer file sharing system with the advantage that the file being downloaded is controlled centrally. You download a special tiny file from the web site you're visiting, load it in your bittorrent client, and it downloads the big file. The tiny file contains the fingerprint of the large file, guaranteeing it to be the correct file.

Apt-get is a program that manages all the other software installed on your computer. It's like the Apple software manager, except instead of it only being for iTunes and Safari, it's for every single piece of software running on the computer, from the web browser all the way down to the bottom of the operating system. It's really nice because it means I KNOW my computer is up to date, because if there was a newer version of any program on here, Apt-get would know about it, download it from Ubuntu's servers, and install it for me.

So back to the Google search results. Not surprisingly, Ubuntu Brainstorm was the second result, so I started to read through it (here and here) to see what people had to say, and was a little dismayed by what I saw.

There is a fundamental lack of understanding how the two technologies work.

Strong wrote:I think this is a bad idea because of security reasons. I want my updates coming from a trusted source and not from 'some' other user. Of course Torrents have a checksum but it's theoretically possible to pad a virus with bogus data so the checksum will match again.

This is assuming that the hash being used is weak. If a peer to peer model were to be used for security updates, a strong hash (ie sha512) would need to be used. I can't find any information on it right away, but I can't believe such a robust protocol as Bittorrent doesn't allow for drop in replacement hashes. (Edit: Looks like Bittorrent uses MD5, but that doesn't matter anyways. Apt uses public-key cryptography to verify packages, so the injection security concerns are moot anyways. The package will be verified before being installed)

Eldmannen wrote:Also BitTorrent is slow, you must seed, and wait until the speed goes up, because initially its slow.

This is because of a shortage of seeds for a torrent. If a seed is being over-asked for a file, it will look for which peer is the most valuable to the network, which means newer peers are less valuable, and given less bandwidth. The problem is this isn't a standard Bittorrent arrangment. There should be a Bittorrent client running on Ubuntu's mirrors, then the speed of the Bittorrent download is at least the same as over http, since they're both running on the same server. If a new peer is getting shunned by the Ubuntu servers, that means the Ubuntu servers are overloaded, in which case there are probably plenty of other peers to download from, or the network would be slow anyways, so no loss.

Clemdup wrote:I wonder if p2p is suitable for small files' transfers, locating peers and enqueuing may be huge for 30kb language files...

If I were programming this, one of the first features would be to set a lower bound on file size. Many apt packages are token packages that depend on others and are only a few kB (ie ubuntu-desktop). Between the hash file and negotiating for peers, much more bandwidth would be used than just serving the file.

rgries wrote:Here on my college campus using bit torrent is against the acceptable use policy so if ubuntu switched to bittorrent for apt updates I would not be able to use ubuntu. Please Don't ruin ubuntu with bittorrent because if it is implemented it would leave many users in the dark as far as updates go.

Apt is designed with many different sources in mind. For example I have mine setup to check for updates to everything from Ubuntu. Then check winehq.com for updates to wine, then check another half dozen websites for updates to their software packages. Ubuntu wouldn't scrap all their http mirrors in favor of Bittorrent. If Bittorrent doesn't work, uncheck that line.

Remmy wrote:Not everyone's storage space is equal which could lead to serious problems if the disk should become too full.

Apt saves everything it downloads to your hard drive already. Look in /var/cache/apt/archives/ and you'll see every piece of software installed or updated through apt. The Bittorrent client would seed off these files, so there would be no hard drive space loss. There are ways to clean out this cache (apt-get clean) and when this is done so, the Bittorrent seeding would stop.

So all in all, people don't fully understand the technologies being mixed together, and while I agree Bittorrent proper isn't ideal for the workload generated by apt, it's very close and could probably be adapted to work well. The really sad part is that if you scroll down to the very bottom of the comment threads, someone mentions DebTorrent, which is what they're all asking for in the first place anyways. Now Ubuntu just needs to have it be installed be default, and it'll take off. Edit: It is available in Hardy, I'll need to look into how easy it is to get running on a new install.