Download All The Things, Round II

Those of you who have been reading this blog for a while may recall Download All The Things!, where I investigated the feasibility of downloading the entire Internet (lolcats included, of course). I’ve decided to revisit this, but with one small (or not so small) difference: change our estimation of the size of the internet.

For Round II, I’m going with one yottabyte (or “yobibyte” to keep the SI religious happy). This is a massive amount of data: 1024 to the power 8 (or 2 to the power 80) bytes (and no, I’m not typing the full figure out on account of word-wrapping weirdness); it’s just short of 70,000 times the size of our previous estimate. To give a more layman-friendly example: you know those 1 terrabyte external hard drives that you can pick up at reasonable prices from just about any computer store these days? Well, one yottabyte is equivalent to one trillion said drives. A yottabyte is so large that, as yet, no-one has yet coined a term for the next order of magnitude. (Suggestion for those wanting to do so: please go all Calvin and Hobbes on us and call 1024 yottabytes a “gazillabyte”!)

There’s two reasons why I wanted to do this:

Since writing the original post, I’ve long suspected that my initial estimate of 15 EB, later revised to 50 EB, may have been way, way too small.

In March 2012, it was reported that the NSA was planning on constructing a facility in Utah capable of storing/processing data in the yottabyte range. Since Edward Snowden’s revelations regarding NSA shenanigans, it’s a good figure to investigate for purposes of tin foil hat purchases.

Needless to say, changing the estimated size of the internet has a massive effect on the results.

You’re not going to download 1 YB via conventional means. Not via ADSL, not via WACS, not via the combined capacity of every undersea cable. (It will take you several hundred thousand years to download 1 YB via the full 5.12 Tbps design capacity of WACS.) This means that, this time around, we’re going to have to go with something far more exotic.

What would work is Internet Protocol over Avian Carriers — and yes, this is exactly what you think it is. However, the avian carriers described in RFC 1149 won’t quite cut it out, so we’ll need to submit a new RFC which includes a physical server in the definition of “data packet” and a Boeing 747 freighter in the definition of “avian carrier”. While this is getting debated and approved by the IETF, and we sort out the logistical requirements around said freighter fleet, we can get going on constructing a data centre for the entire internet.

As for the data centre requirements, we can use the NSA’s Utah DC for a baseline once more. The blueprints for the data centre indicate that around 100,000 square feet of the facility will be for housing the data, with the remainder being used for cooling, power, and making sure that us mere mortals can’t get our prying eyes on the prying eyes. Problem is, once the blueprints were revealed/leaked/whatever, we realised that such a data centre would likely only be able to hold a volume of data in the exabyte range.

How far off were the estimates that we were fed before? Taking an unkind view of the yottabyte idea, let’s presume that it was the implication that the center could hold the lowest number of yottabytes possible to be plural: 2. The smaller, and likely most reasonable, claim of 3 exabytes of storage at the center is directly comparable.

Now, let’s dig into the math a bit and see just how far off early estimates were. Stacked side by side, it would take 666,666 3-exabyte units of storage to equal 2 yottabytes. That’s because a yottabyte is 1,000 zettabytes, each of which contain 1,000 exabytes. So, a yottabyte is 1 million exabytes. The ratio of 2:3 in our example of yottabytes and exabytes is applied, and we wrap with a 666,666:1 ratio.

I highlight that fact, as the idea that the Utah data center might hold yottabytes has been bandied about as if it was logical. It’s not, given the space available for servers and the like.

Yup, we’re going to need to build a whole lot of data centres. I vote for building them up in Upington, because (1) there’s practically nothing there, and (2) the place conveniently has a 747-capable runway. Power is going to be an issue though: each data centre is estimated to use 65 MW of power. Multiply this by 666,666, and… yeah, this is going to be a bit of a problem. Just short of 44 terawatts are required here, and when one considers that xkcd’s indestructible hair dryer was “impossibly” consuming more power than every other electrical device on the planet combined when it hit 18.7 TW, we’re going to have to think outside of the box. (Pun intended for those who have read the indestructible hair dryer article.)

Or not… because this means that our estimate of one yottabyte being the size of the internet is way too high. So, we can do this in phases: build 10,000-50,000 or so data centres, fill them up, power them up, then rinse and repeat until we’ve got the entire Internet. You’ll have to have every construction crew in the world working around the clock to build the data centres and power stations, every electrical engineer in the world working on re-routing power from elsewhere in the world — especially when one considers that, due to advancements in technology (sometimes, Moore’s Law is not in our favour), the size of the Internet will be increasing all the time while we’re doing this. But it might just about be possible.