The Fourth Generation Petabox

Behind all the cool stuff users see on archive.org is some serious hardware. I was curious about the ongoing development of data storage here at Internet Archive. I spent a little time with Mario, Master of the Machines, while he gave me a tour of the newest generation of our staff designed and built Petabox storage units.

Here are some of the specs he gave me for the newest version.
• each has 480 terabytes of raw storage
• each Petabox contains: 240 2-terabyte disks in 4U high rack mounts
• each computer has: 2 – 4 core xeon processors, 12 gigs of RAM each, speed-2 GHz
• each machine has pair of 1Gbit interfaces that are bonded so it’s effectively 2Gbit
• the rack has a switch with uplink of 10Gbit
• Ubuntu OS is stored on a pair of mirrored internal hard drives separate from the data disks
• each has IPMI management interface (allows remote control power cycling and remote console)
• in all there will be a total of 8 units (that’s about 4 million gigabytes).

I like to believe that there’s a small inscription in this machine: “In memory of Alan Mathison Turing (1912 – 1954), one of the 20th Century’s greatest intellects.” Moreover, I propose that this work of art, math, and science showcased at University of Manchester in England celebrating A.M. Turing’s 100th birthday on June 23, 2012. Preferably, I would point at the School of Mathematics housed in the Alan Turing Building, Turing worked at The University of Manchester from 1948 to 1954. The demonstration would be a great honor for this organization, serve as reminder to Americans the power of our young intellects, and what a “jobs” program can produce.

Any info on if those newer systems are still designed / built by Capricorn Technologies or if it’s an entirely in-house work? Their website hasn’t been updated since the first or second generation petaboxes came out and there is definite interest in that kind of high density storage.

They now have an open design you can a) build you self or b) just buy for around $7,400 per 4RU 135 TB “node”.

Based on the physical design on a per rack basis, if you are using a standard 44 RU rack, take the top 4 RU and have 2 x 1RU switches with spacers between them for cable space ( facing forward ), you could have 10 x 4 RU BackBlaze boxes, which would be roughly 10 x 135 TB, or let’s round that out to around 1.3 PB per rack.

Surely that simplifies your build if you get the 3rd party guys to build and ship you the BackBlaze units prebuilt and configured for $7,400 per unit, or let’s say $74,000 for 10 units ( discounts most likely apply at that sort of volume ), thrown in the cost of a rack, 2 x switches, and some power and ethernet cables and you’ve got a PetaByte for around $80,000 USD.

The backblaze guys have a great design and are wonderful about sharing it. We talked with them when we were planning this generation.

We decided to go with our current design, which is more expensive, for a couple of reasons. We wanted to go with a more standard case (we bent our own metal the last time) and thought it would give us more flexibility and lower ongoing design costs. Also, buying cases built in the bay area took forever on lead time. The US really has slipped on manufacturing. it turns out one of the disadvantages of the cases we have used are the power supply fans are noisy. since more customers do not care, we think they will be unlikely to fix this flaw.

Another reason is we wanted the replacable disks. this request came from the system administrators. This has turned out to be very helpful.