Amazon’s HPC Cloud: Supercomputing for the 99 Percent

The Amazon Elastic Compute Cloud is becoming increasingly popular for high-performance computing. It’s now capable of running many of the applications that previously required building out a large HPC cluster or renting time from a supercomputing center. But as you might expect, Amazon EC2 can’t do everything a traditional supercomputer can.

Scientific applications that require super-fast interconnects are still the reserve of the IBM Blue Genes and Cray supercomputers of the world. However, Amazon’s adoption of 10 Gigabit Ethernet connections in its high-end cluster compute instances have expanded the usefulness of EC2.

As we’ve reported before, clusters of 30,000 cores, even 50,000 cores have been run on Amazon’s cloud for real-world scientific applications. A caveat is that these tend to be “embarrassingly parallel” applications, in which the interconnect speed isn’t important because each calculation runs independently of all the others.

Even if you’re using Amazon’s 10 Gigabit Ethernet connection, you may still be using servers that are literally half the world away from each other. For Amazon customer Schrödinger, which makes simulation software for use in pharmaceutical and biotechnology research, the distance didn’t matter on that aforementioned 50,000-core cluster. Schrödinger pulled Amazon resources from four continents and all seven of Amazon’s data center regions to make the cluster as big as possible.

But the Amazon approach can slow down applications that do require a lot of communication between servers, even at small scales. Schrödinger President Ramy Farid tells us that in one case his firm ran a job on two 8-core Amazon instances with terrible results.

“Certain types of parallel applications do not yet seem to be appropriate to run on the Amazon cloud,” Farid said. “We have successfully run parallel jobs on their eight-core boxes, but when we tried anything more than that, we got terrible performance. In fact, in one case, a job that ran on 16 cores took more wall clock time than the same job that was run on eight cores.”

Farid was using Amazon’s eight-core server instances, so running a job on 16 cores simultaneously required two eight-core machines, Farid explains. (Amazon does actually offer a 16-core cluster compute instance, however.) “Slow interconnect speeds between the two separate machines does become a serious issue,” he said. “Those two eight-core machines might not even be in the same location.”

Cycle Computing CEO Jason Stowe, who built the 50,000-core Amazon cluster for Schrödinger, notes that the Linpack benchmark used to measure the world’s fastest supercomputers is “geared toward the 1 percent” of HPC applications that aren’t really suitable for a general purpose service like Amazon’s.

“I would say the 1 percent that need the uber-awesome supercomputers might not run on Amazon infrastructure,” Stowe said. “If you need something that has a crazy interconnect, that’s not going to run on a 10 Gigabit Ethernet interconnect, necessarily. It’ll run, but it’ll be slower.”

Amazon Cracks Top 50, but Lags InfiniBand Clusters

Amazon, Stowe said, has definitely geared its infrastructure toward the proverbial “99 percent.” Still, Amazon is trying to get closer and closer to the 1 percent with its cluster compute instances, which can run either Intel chips by themselves or in combination with NVIDIA GPUs.

Here’s a look at Amazon’s specs for a so-called Cluster Compute Eight Extra Large Instance:

60.5GB memory

Two Intel Xeon E5-2670 Sandy Bridge processors, eight cores each

3,370GB storage

10 Gigabit Ethernet interconnect

The Cluster Compute “Quadruple Extra Large” instance uses two quad-core Nehalem processors, instead, and comes with about half as much memory and storage. A Cluster GPU instance also uses two quad-core Nehalem processors, but with the extra boost from two NVIDIA Tesla M2050 GPUs. All three of Amazon’s cluster instances use 10 Gigabit Ethernet, as opposed to Gigabit Ethernet for standard EC2 instances.

In the most recent list of the Top 500 fastest supercomputers in the world, the highest-end clusters primarily used either InfiniBand, or one of numerous custom or proprietary interconnects purpose-built for high-performance computing. Ethernet actually accounts for 224 of the Top 500 systems, with 210 using single Gigabit per second speeds rather than 10 Gigabit. That puts Ethernet just ahead of InfiniBand overall, but InfiniBand dominates up top, with two of the five fastest systems and five of the 10 fastest.

The highest-ranking cluster that uses a 10 Gigabit Ethernet connection is actually one built by Amazon on its own cloud for the purpose of demonstrating its power. Amazon’s 17,000-core clusterwith Intel Xeon processors hit speeds of 240 teraflops for a rank of 42nd worldwide. Amazon ran the entire cluster in a single data center region, which would certainly speed things up compared to customer-created clusters that pull from Amazon data centers in multiple continents to ensure capacity needs are met.

In the 50,000-core cluster, Stowe notes that he was able to get about 40,000 cores in Amazon’s US-East data center region alone, but pulled from Amazon resources far and wide to meet the processing needs of the application.

It’s no surprise that Amazon EC2 doesn’t quite match the performance of specially designed supercomputing clusters when it comes to the most complex scientific applications, says Adam DeConinck, an HPC systems engineer at R Systems. The company operates more than 10,000 cores worth of compute capacity for HPC customers.

In the clusters R Systems builds for customers, DeConinck generally favors InfiniBand for its high bandwidth and low latency, but also uses Ethernet. InfiniBand is “extremely well optimized for running scientific code and very high-speed storage,” he said.

While Amazon does offer cluster compute instances specifically tailored to HPC, it’s not their bread and butter.

Unlike Amazon EC2, “We don’t do Web hosting. We don’t do tons of MySQL database stuff,” DeConinck notes. “Everything we do is HPC. It would take as much adjustment for us to serve an EC2 sort of model as it probably would for them to come up with a dedicated HPC option.”

Interconnect speed and I/O performance tend to be limiting factors on the Amazon cloud, problems that can be solved with InfiniBand, he said. Amazon offers something companies like R Systems and supercomputing centers don’t—instant access with a credit card and a Web browser. But that convenience isn’t always worth it.

“EC2 has done well for embarrassingly parallel applications because it makes it easy to access a huge pile of compute cores, so processes can work totally independently on small parts of a larger problem,” DeConinck explains. “The interconnect speed doesn’t matter very much, because in most cases they only really use the network at the beginning and end of a job when they either pull down data or report back.”

However, many applications require a lot of inter-process communication using the message-passing interface, or MPI. “Computational fluid dynamics is a good example of a domain with this kind of problem — every process in a given job is communicating with a bunch of other processes as they simulate specific points in something like the airflow around a wind turbine,” DeConinck says. “Often they’re so dependent on inter-process messages that the network latency is the major limiting factor in performance. In other cases, all the processes need access to a huge shared datastore, sometimes on the order of terabytes, so that bandwidth is the limiter.”

Tests run by R Systems using an MPI benchmark from Ohio State showed latencies in the passing of small messages (up to 4KB) to be between 1.4 and 6 microseconds in an InfiniBand-based cluster. Comparatively, Amazon’s 10 Gigabit Ethernet connections produced latencies of 100 to 111 microseconds. For passing of larger messages (4MB), bandwidth hit 3,031 megabytes per second with InfiniBand, and only 484 megabytes per second on Amazon.

DeConinck cautions that these tests were run without any network tuning for either cluster, but says they provide a pretty good look at the basic performance of the hardware.

InfiniBand on Amazon? Not Just Yet

Amazon hosted a conference titled “Big Data & HPC in the Cloud” last week in Boston, and at the show I caught up with Deepak Singh, Amazon’s principal product manager for EC2. He was reluctant to say whether Amazon is looking into adding InfiniBand support.

“We’re interested in figuring out from our customers what they want to run, and then deliver those capabilities to them,” Singh said.

Amazon has worked on optimizing its 10 Gigabit Ethernet connections, and that work showed in the Top 500 supercomputer run, he said. As for that 50,000-core cluster, Singh noted that getting 50,000 cores from a traditional supercomputing center is difficult, and that in many networks such a cluster could overwhelm the jobs being run by other customers.

Singh noted that Amazon can provide almost a teraflop of performance from a single compute node with the help of GPUs. But he acknowledged Amazon isn’t ready to replace a traditional supercomputer in every single instance.

“There are certain specialized applications that require very specialized hardware,” he said. “It’s like one person running it in some secret national laboratory.”

But tailoring cloud computing to meet the needs of the 1 percent isn’t the Amazon way. And for most customers, it doesn’t really matter.