Project Hyperion: A super testbed for HPC apps and hardware

SC08 When it comes to parallel supercomputing, and indeed any kind of parallel processing, the hardware is the easy part. The systems software, including a tuned software stack and middleware for managing data, visualization applications for turning datasets into something human beings can use to make decisions or understand some phenomenon, is a bit trickier.

Tuning the applications that chew through mountains of data for parallel (and increasingly hybrid) computing architectures and turning it into something that can be visualized is the hard bit. While parallel programming has gotten somewhat easier after decades of sustained effort, parallel supercomputers are still too hard to program and manage.

That is why Lawrence Livermore National Laboratory, which has some of the most powerful supercomputers in the world, put out a request for proposals to build a testbed system for trying out new hardware and software technologies and to give HPC application vendors a place to try out new stuff on a reasonably large cluster, not some toy in a closet they can afford. The project got funding, and now LLNL has tapped chip maker Intel and server maker Dell as the primary contractors for the testbed cluster, which is code-named "Hyperion."

"Hyperion is basically an attempt to provide a testbed for scaling applications on a cluster and for new hardware as it becomes available," explains David Scott, petascale product line architect in Intel's HPC platform unit. "It is not just a point procurement, but an ongoing project."

To that end, Intel will be supplying a bunch of the current "Harpertown" quad-core Xeons for Dell to plunk into the servers it is providing as part of the procurement. Dell's Data Center Solution division, which sells custom-made servers for HPC and other hyperscale customers, is actually managing the manufacturing of the servers, and will refresh the Hyperion cluster with future "Nehalem" multicore chips as soon as they are available. The plan, according to Scott, is to put in place a cluster that has at least 100 teraflops of computing power in the initial cluster, and then have ISVs test the hardware and their own software on the machine. For now, the Dell iron will be equipped with Linux, but it is conceivable that some ISVs want to test their apps on Windows - as well as the scalability of Windows HPC Server 2008 - so Microsoft could at some point get involved too.

As part of the deal, Intel is providing compilers and other tools to parallelize applications and doing consulting work to help ISVs with tuning their applications. Intel gets back 10 per cent of the cycles in the resulting cluster to use as it sees fit, according to Scott, and this is where the ISVs are getting access to cycles on the Hyperion cluster to do their tests. It would be very expensive for many of these ISVs to get access to a 100 teraflops or larger supercomputer on which to test codes and new ideas on how to better use parallel machines.

The initial Hyperion machine, which is being built now, will have 1,152 serve nodes with a total of 9,216 Xeon cores. The boxes will have an aggregate of over 9 TB of main memory and deliver around 100 teraflops of number-crunching performance. They will be linked to each other using quad data rate InfiniBand interconnect and will have over 36 GB/sec of bandwidth to a set of RAID disk arrays for storage.

Intel's desire to donate so much time and effort to the Hyperion project is not entirely altruistic, but rather enlightened self-interest. With so many hybrid supercomputing architectures coming online - including graphics processors and field programmable gate arrays for boosting performance of systems that still use general-purpose x64 chips - Intel wants to protect its dominant share of HPC platforms.

The HPC sector of the serve space comprises between 18 and 20 per cent of the server space, says Scott, citing data from market researcher IDC, and this part of the market is growing faster than the overall server space in terms of shipments and revenues. (If you use a broader definition of HPC, which includes data warehouse, massive scale-out data centers, and such, the share would be larger.) Scott says that Intel has about an 80 per cent share of the HPC space, including Xeon and Itanium processors together. He says that while it is possible that Intel can get from 80 to 85 per cent share, the company is much more interested in expanding the market and maintaining share.

Hence the launch of products like the CX-1 baby blade supercomputer from Cray running Windows HPC Server 2008 that was jointly announced with Intel and Microsoft back in mid-September.

The general server market has been slowing in recent quarters and could be set for a big slowdown thanks to the meltdown, with single-digit revenue growth common. 2007 was certainly a good year for HPC, however. Big HPC boxes (which cost more than $500,000), according to IDC's estimates, saw sales rise by 16 per cent to $2.9bn, while smaller HPC systems (priced between $100,000 and $250,000) rose by 26 per cent to $4.2bn.

Projects like Hyperion are designed to get more apps to scale on big boxes - and more quickly. This is one factor that will drive sales of the latest Intel hardware and the HPC applications that are tuned for it. Without the software tuning, a core is just a core, and hardware sales will slacken or possibly drift to other platforms that have been tuned. And Intel surely cannot afford for that to happen. ®