Amazon’s cloud is the world’s 42nd fastest supercomputer

Amazon used its EC2 service to build one of the world's fastest HPC clusters

The list of the world's 500 fastest supercomputers came out yesterday with a top 10 that was unchanged from the previous ranking issued in June. But further down the list, a familiar name is making a charge: Amazon, with its Elastic Compute Cloud service, built a 17,024-core, 240-teraflop cluster that now ranks as the 42nd fastest supercomputer in the world.

Amazon previously built a 7,040-core, 41.8-teraflop cloud cluster that hit number 233 on the list, then fell to 451st. But Amazon submitted an updated Linpack benchmark test with the addition of a new type of high-performance computing instance known as "Cluster Compute Eight Extra Large," which each have two Intel Xeon processors, 16 cores, 60GB of RAM and 3.37TB of storage. The full cluster on the Top 500 list is Linux-based, with 17,024 cores, 66,000GB of memory, and a 10 Gigabit Ethernet interconnect.

Despite Amazon EC2's advances in high-performance computing, it's still a ways off from topping the world's fastest computers. In the most recent list, Japan's K Computer hit 10.51 petaflops, or 10 quadrillion calculations per second, to maintain its position as the fastest in the world, and more than 40 times faster than the Amazon EC2 cluster. Amazon's achievement is less about raw power than it is making supercomputing capacity available to the masses, an hour at a time.

I know this is off a wild tangent, but this puts into perspective how fast the industry continues to move. The idea you can order time on a supercomputer at relatively affordable prices is just the beginning--it isn't hard to fathom within the next decade this sort of power, to some degree, will be commonly accessibly. e.g. Amazon's 7,040 core (41.8TFLOPs) cluster is impressive (top 500) and while a GPGPU isn't going to handle the same workloads we are already seeing single chip GPUs >2TFLOPs (SP) performance and the forthcoming 28nm GPUs may see multi-SLI/Crossfire setups (e.g. SLI on a single board + multiboard in a single case) that could be knocking at the door 10 TFLOPs (and anywhere from 2-5 TFLOPs DP). Again, the workload capabilities are different but the idea that the common user within the next decade could have, essentially, very low end 2010 supercomputer level performance (for some workloads) in a local system (not just in the cloud) seems not just possible but likely.

So maybe I'm just lazy and didn't RTFM or RTF[Appropriate inital(s) for documentation], but is there a test suite you can run on your code before you buy hourly time on EC2 to make sure you don't spend $1000/hr to debug your code?

This is truly amazing, but it'd be nice to know if I could maximize the money I spend. I'd even understand if you had to buy the testing suite since it takes developer time to put together and maintain (though I don't know how well I could stomach an hourly fee for use of the test suite...).

So maybe I'm just lazy and didn't RTFM or RTF[Appropriate inital(s) for documentation], but is there a test suite you can run on your code before you buy hourly time on EC2 to make sure you don't spend $1000/hr to debug your code?

This is truly amazing, but it'd be nice to know if I could maximize the money I spend. I'd even understand if you had to buy the testing suite since it takes developer time to put together and maintain (though I don't know how well I could stomach an hourly fee for use of the test suite...).

Yes, there isn't "one instance" you pay for - it's 290 HPC Instances as per the article.

Test on 5 small instances.

Then test on 5 HPC instances.

Then test on 50 HPC instances.

Then test on 300 HPC instances.

The first test would cost you maybe a couple bucks, the more instances you spin up, and the bigger the instances, the more you spend.

One of the best things about EC2 is how easy it is to test different things - there's no real commitment or cost associated with spinning up new instances.

So maybe I'm just lazy and didn't RTFM or RTF[Appropriate inital(s) for documentation], but is there a test suite you can run on your code before you buy hourly time on EC2 to make sure you don't spend $1000/hr to debug your code?

This is truly amazing, but it'd be nice to know if I could maximize the money I spend. I'd even understand if you had to buy the testing suite since it takes developer time to put together and maintain (though I don't know how well I could stomach an hourly fee for use of the test suite...).

I think most would start with a much smaller cluster and scale it up - hopefully find / fixing issues on the way. It would seem to me there would still be a large margin of error if you just simulated your code, then actually scaling it to tens of thousands of cores...

So maybe I'm just lazy and didn't RTFM or RTF[Appropriate inital(s) for documentation], but is there a test suite you can run on your code before you buy hourly time on EC2 to make sure you don't spend $1000/hr to debug your code?

This is truly amazing, but it'd be nice to know if I could maximize the money I spend. I'd even understand if you had to buy the testing suite since it takes developer time to put together and maintain (though I don't know how well I could stomach an hourly fee for use of the test suite...).

Well, it's not a lot of money to the multi-billion dollar outfits that'd use it.

You're also making the assumption that any company renting the time on the system would be hyper-optimizing for that exact platform, rather than just running more generic small work units at a time. The cost associated with paying a team to optimize the code for the platform would be greater than just writing it generically and relying on the pure speed boost to get it all done. There are more than enough generic techniques to use without having to resort to that level of optimization.

No offense, seeing as you enjoy the subject matter, but when I see 42nd fastest I have the uncontrollable urge to start snoring. I'm sure the tech is all fancy, but I imagine the top 10's tech is a bit fancier, hence why they're top 10.

I’ve occasionally wondered, with amazon’s HPC-specific moves in web services, what sort of interconnect they provide.

Edit: I managed to miss the mention of 10 Gb Ethernet. While adequate for some applications, this would seem like a fairly large barrier to amazon moving significantly further up the Top 500, would it not? Only last night I was reading over on the register about the blazing new interconnects today’s most powerful supercomputers are using.

Request: a writeup on what some of the top computers in the world are doing all day, every day. Obviously amazon is offering cloud hosting services to many businesses and I imagine running their own as well, but what does the Pharma Co. do with that power 24/7? Governments? IBM (watson)? Are these machines ever idle or are they constantly producing something -- and what is that something?

So maybe I'm just lazy and didn't RTFM or RTF[Appropriate inital(s) for documentation], but is there a test suite you can run on your code before you buy hourly time on EC2 to make sure you don't spend $1000/hr to debug your code?

This is truly amazing, but it'd be nice to know if I could maximize the money I spend. I'd even understand if you had to buy the testing suite since it takes developer time to put together and maintain (though I don't know how well I could stomach an hourly fee for use of the test suite...).

Well, it's not a lot of money to the multi-billion dollar outfits that'd use it.

You're also making the assumption that any company renting the time on the system would be hyper-optimizing for that exact platform, rather than just running more generic small work units at a time. The cost associated with paying a team to optimize the code for the platform would be greater than just writing it generically and relying on the pure speed boost to get it all done. There are more than enough generic techniques to use without having to resort to that level of optimization.

No I'm not making that assumption. I was assuming actively developed research level code which is often full of bugs.

I'm also asking questions based on a smaller client than a pharmaceutical company. What if I happened to have $10,000 I could spend on some simulations and thought it was worth it? The other commenters answered those questions, though: just scale it up. Basically allow overhead for debugging on small computer sets.

No offense, seeing as you enjoy the subject matter, but when I see 42nd fastest I have the uncontrollable urge to start snoring. I'm sure the tech is all fancy, but I imagine the top 10's tech is a bit fancier, hence why they're top 10.

It's an interesting story because Amazon's core business isn't even building supercomputers. It would be like Walmart launching its own space program. They wouldn't be the biggest or the best, but it's still interesting because it demonstrates the rapid commditization of these services.

Are the prices from before or after the flooding in Thailand? That disk space just got a lot more expensive!

Probably from before, but I haven't heard anything about Amazon raising its prices in response to the flooding. They've got plenty of disk in place to begin with, though perhaps it will eventually have an effect.

It will be interesting to see how the price per hour for this kind of computing will come down with time.

Also, I would love to see a list of the top super computers built and run purely for profit. I expect if you exclude government, military and university systems Amazon Cloud gets bumped way up the list.

It will be interesting to see how the price per hour for this kind of computing will come down with time.

Also, I would love to see a list of the top super computers built and run purely for profit. I expect if you exclude government, military and university systems Amazon Cloud gets bumped way up the list.

I didn't exactly do a detailed filtering, but according to top500.org if you sort by segment using "industry" as criteria, therefore avoiding government, classified, academia, and research (non-industry from what I can see), Amazon Cloud comes in third. Number one (12 on the whole list) is located in the University of Stuttgart and is used by a number of companies for multiple industrial projects, and thus qualifies as industry. Because of it being shared by multiple parties I find #2 (40) more interesting. I really want to know what the heck is Airbus doing all day with the most powerful single company owned (and single company used I assume) supercomputer in the world.

I didn't exactly do a detailed filtering, but according to top500.org if you sort by segment using "industry" as criteria, therefore avoiding government, classified, academia, and research (non-industry from what I can see), Amazon Cloud comes in third. Number one (12 on the whole list) is located in the University of Stuttgart and is used by a number of companies for multiple industrial projects, and thus qualifies as industry. Because of it being shared by multiple parties I find #2 (40) more interesting. I really want to know what the heck is Airbus doing all day with the most powerful single company owned (and single company used I assume) supercomputer in the world.

I really want to know what the heck is Airbus doing all day with the most powerful single company owned (and single company used I assume) supercomputer in the world.

FEM and CFD modeling for new airliners, I'd presume, but I wouldn't be surprised if they rent out time to industry partners, especially those also under the EADS ownership umbrella, such as Eurofighter, Eurocopter, Astrium (Arianne rockets), etc.

Yes, I know, but do they use it all day every day for that? I don't think they have all that power and leave some idle time. If they indeed use it all day long for that then it is awesome and I would like an article on it.

Quote:

Bad Monkey! wrote:FEM and CFD modeling for new airliners, I'd presume, but I wouldn't be surprised if they rent out time to industry partners, especially those also under the EADS ownership umbrella, such as Eurofighter, Eurocopter, Astrium (Arianne rockets), etc.

That makes sense, guess I assumed wrong on the single company used part. I still find it impressive that Airbus has that computing power. On a quick eye scan of industry computers on the list I didn't find what Boeing is using or Lockheed Martin. Probably something without the company name on it but I don't know. Interestingly, on that same quick scan I noticed there's something in #133 that is identified as "Gaming Company" and gives no further info. Anybody knows who that is?

First, they are hyped about the 70TB of space they use in total, granted it still is a lot but it's obviously small potatoes now.Second, I always wanted a follow up done on that article to find out about failure rates and if they kept expanding.

"Amazon, with its Elastic Compute Cloud service, built a 17,024-core, 240-teraflop cluster".

Presumably, though, they didn't take the /whole/ of EC2 to build the cluster. After all, what did all their paying customers do while they were running the benchmark?

Put it another way - Amazon's revenues are roughly $3.0e10 per year, or well over 3 million dollars an hour (!!). If the whole of EC2, run flat out, makes them just a few thousand $/hour, I'm surprised they find it's a worthwhile line of business for them.

That makes sense, guess I assumed wrong on the single company used part. I still find it impressive that Airbus has that computing power. On a quick eye scan of industry computers on the list I didn't find what Boeing is using or Lockheed Martin.

They probably buy time on .edu, .gov, and .mil machines like the numerous DOE machines, or NASA's Pleiades and Columbia supercomputers, and the USAF supercomputer at #30. Supercomputing time is ridiculously plentiful in the US, especially if you're a top tier defense contractor.

Many, if not most, academic, research and public government supercomputers are actually available to rent for anybody who wants to use them. They're often built with the express idea of supporting industry that can't build such capability themselves. So excluding them in a list of commercially available systems is misleading. AFAIK, the K-computer for instance will be available for industry once it is fully operational.