Big Data, the Cloud and the Exascale Dilemma

For enterprises looking to build their own private clouds, the rule of thumb is quickly becoming: Go big or go home.

It’s been clear from the beginning that one of the chief advantages the cloud brings to enterprise data environments is scale. But even the most advanced cloud architecture in the world can only expand so far before it hits the limits of physical infrastructure. That’s why enterprises that have the resources, like General Motors, are doing their best to match the hyperscale infrastructure of Amazon, Google and other top providers. With a big enough physical footprint, they will be able to broaden scalability and flexibility without having to entrust critical data and applications to public resources.

Naturally, building data infrastructure on a grand scale is not without its challenges. One of the chief obstacles is delivering adequate power to hyperscale, or even exascale, architectures – something the NSA has apparently discovered at its new Bluffdale, Utah, facility. To the joy of civil libertarians everywhere, the plant has been experiencing unexplained electrical surges that have fried components and caused mini explosions. The situation is so bad that insiders are reporting that the center is largely unusable, and even leading experts in the facilities and data fields are at a loss to explain what is going on.

This should serve as a cautionary tale for any organization looking to push their high-performance computing (HPC) infrastructure into exascale territory. According to DLB Associates’ Mark Monroe, performance gains should push upper limit capabilities to 1018 executions per second (1 exaFLOP) within the next decade or so, assuming of course that we can come up with a way to trim down the nearly 1 gigawatt of power that would be needed to support a system on the scale. Low-power processors like the ARM architecture and Intel Atom are expected to come to the rescue, with current calculations estimating 1 exaFLOP performance can be had for about 30 MW of power – although, right now, you would need more than 100,000 compute nodes to get there.

But power is not the only challenge on the exascale obstacle course. Storage systems, for example, are perfectly capable of achieving exascale, but at what cost? As storage consultant Marc Staimer notes, storage requirements are growing at more than 60 percent per year, but drive capacities are only growing by about 1 TB per year. This means last year’s jump from 2 to 3 TB represented 50 percent growth, while this year’s increase to 4 TB is only 33 percent. Next year, we’ll likely see 5 TB models – 25 percent growth – and so on. And solid-state technology is expanding at an even lower rate. Current SAN and NAS storage models, therefore, will only scale up to the low petabytes, after which we’ll either have to come up with something else – object storage is a leading contender – or enterprises will need to deploy multiple massive storage systems to keep up with demand. Think of it this way: Whatever your storage needs are today, at this rate they will be 1,000 times greater by 2030.

Of course, all of this should only concern you if you plan to keep your entire computing environment in-house. The public cloud will most certainly take much of the pressure off, provided the front office can get over its fear of trusting someone else with their data. And cloud providers themselves will have an easier time justifying peta, terra, exa or whatever scale is necessary because infrastructure is their profit center, not a cost center as in most businesses.

The fact remains that all the world’s data will have to reside somewhere, and if we hope to accommodate worldwide demand cheaply and efficiently, then the groups that take on the task of building and maintaining physical infrastructure will have to start thinking big – real big.

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.