Efficient Speed

Oct 6, 2009

Yesterday, I was given an interesting problem to tackle.

We were given a bunch of laptops, 8 of them to be exact, already cloned but missing almost 15 GB of important user-end data. There’s no way to re-clone all these machines, as the source image is not available to us. The only way is to copy the 15 GB of files to each machine, no two ways about it. The 15 GB of files lie on a 500 GB external USB harddisk. I have Ethernet cables and 2 Ethernet switches.

The big question is how?

Of course, copying from the harddisk onto each laptop one after another, manually, or via Sneakernet, is the favourite answer, but no. I can only call that desperate, physically constrained, or intellectually apathetic.

I’m a person who loves processes, systems, and automation. Having to copy a bunch of files serially and manually, and onto so many computers repetitively is unacceptable, especially when you have to rinse and repeat a whole 8 times. Suffering a little pain to get any infrastructure up, just to let it copy automatically painlessly is what I’m looking for. 先苦后甜.

Out of ideas, I pinged a few people via sms, “Hey, what is the most efficient way to transfer 15 GB of data onto 8 different laptops, without cloning.”

Portable Harddisk / Sneakernet, Samba CIFS were the few answers that came in. Someone suggested copying from one to two, two to four, four to eight, but that’s too tedious and not scalable, equipment wise. But what if the media used is the Ethernet?

I probed further, “multicast network solutions?”

“BitTorrent”. Bingo. Thanks to cflee for that great suggestion! That’s the term and I knew it would certainly work. I did read up on the BitTorrent protocol some time back and am quite disappointed that this didn’t occur to me earlier. He also mentioned that uTorrent provides a built-in tracker, and that there’s a handy guide available.

Spent 10 minutes reading through and successfully managed to give it a trial within my home network between 2 computers. Conceptually, a prototype has been demonstrated and there’s no way it can fail the next day.

Spent the following morning with a few co-workers digging up rarely used networking equipment and proceeded to wire-up the machines. The two 4-port switch cum wireless APs were miserable — they only leave us with 6 usable LAN ports. The other 2 machines had to do with 802.11G wireless. It’ll work, but just a little slower. I was hoping to complete this whole ordeal before the day is to end, i.e. 5.30 pm, and go home on time. After all, copying 15 GB from the portable harddisk onto one of the laptops already took a grand total of 60 minutes. If I had to do this serially and linearly, it’ll take no less than 8 hours. Portable harddisks are rare too, especially for filesizes that huge.

I configured the DHCPd and got the whole network running nicely and proceeded to install uTorrent on all the machines (skipping the rubbish, ad-supported nonsense). That took hardly 10 minutes as Samba CIFS came into play. It’ll be cool if there’s a automatic install distributor but I’ve not got time for it.

Created the initial seeding torrent according to the guide and that process took almost 15 minutes. Thousands of tiny files, coupled with gigantic files, whatever you can imagine, the limits of the filesystem are being tested here.

Started the seed on the tracker, turned on ‘Initial Seeding’ while I distributed the newly created .torrent to the rest of the machines.

Changed back to standard Seeding once all the machines have entered the swarm.

Thinking about the 8 hours that I would have to take, going by the conventional advice, I grinned and went on to do other work, while giving my forecast of completion to ‘End of the Day’.

The seeding started at around 9 to 9.30 am. I drove out to buy breakfast for everyone and came back at around 10.30 am.

I took a peak at the progress and I got a shock of my life.

All the wired Ethernet clients are now seeding! 100% download complete! With only the 2 miserable wireless clients left struggling with the slow connection. I exchanged the wire and wireless connection with 2 other computers and I saw the download speed race to the roof.

12.2 MB/s. It works out to ~100 Mbps.

Every 30 seconds, the download speed will slow a little and a uTorrent would pop a warning at the status bar, “Harddisk overload 100%”. Wow, a solid harddisk LED.

I’m impressed.

Darned. I thought the transfer would take the whole day, giving me time for a well deserved break, but little did I know, the transfer had completed before I even had lunch!

So, now you know. BitTorrent is extremely efficient in one-to-many, many-to-many, and many-to-one distribution tasks. As long as the overhead of installing and running uTorrent on every machine is well distributed and / or paid for, this is an extremely useful piece of software to add into any sysadmin’s arsenal.

Some other hidden benefits of BitTorrent are that it is resumable, repairable, distributed (many to many, any seeder / peer can enter or leave the swarm without much disruption nor require any human rectification), lightweight (300k installer), and automated (once past the initial start, and handles disconnections gracefully).

Really, BitTorrent has its legitimate use as above, quod erat demonstrandum (Q.E.D.).