Appro bridges Tesla GPUs, Nehalems

CUDA clusters

Supercomputer cluster vendor Appro is a niche player in the cut-throat HPC market, and it's bringing the nVidia Tesla GPU and the CUDA programming environment to the knife fight with its new HyperPower clusters.

The Tesla GPUs and their related CUDA tools went commercial last fall and they're been picked up by a number of players, including Penguin Computing and several other vendors selling what they call personal supercomputers. That's a tower x64 server with a few Tesla GPUs thrown in to deliver a pretty good floating point punch.

Appro has been hold back a little, waiting for the Tesla technology to mature, says John Lee, vice president of advanced technology solutions at the company. With the latest Tesla cards, nVidia is offering double precision math, which is a must for a lot of workloads. And even though these current cards have a lot less double-precision performance than they do for single-precision math - and they lack error correction on the memory used in the Tesla units - Lee says that customers want to start deploying hybrid x64-GPU systems using the CUDA environment today so they will be ready for the next generation of nVidia GPU co-processors.

The exact design of these future Tesla co-processors is not known, but Lee hinted that it would be a lot more elegant than the current Tesla PCI-Express cards and would have much better double-precision performance too. These new Tesla GPUs are expected some time in the first half of 2010.

The Appro HyperPower clusters start with a 1U chassis with two so-called "twin servers" inside. These are two-socket, half-width servers that support Intel's current "Nehalem EP" Xeon 5500 processors. Each Nehalem server is linked to a server appliance that has a single Tesla S1070 GPU in it. The Tesla S1070 packs four GPUs - each with 240 cores running at between 1.3 GHz and 1.44 GHz - into a single server chassis that also has 16 GB of its own memory.

This appliance links to the servers through two PCI-Express 2.0 x16 slots (one to each two-way half server). Depending on the clock speed, the Tesla S1070 appliance, which eats up 1U of rack space as well, delivers from 3.73 to 4.14 teraflops of floating point performance with single precision, but only between 311 and 345 gigaflops with double precision. (You can see that that future Tesla cards have to do a better job on the double-precision front).

The Appro HyperPower puts 19 of the twin Nehalem EP servers interleaved in a standard 42U rack with 19 of the Tesla appliances, which yields 304 x64 cores and 18,240 GPU cores. The peak performance of such a rack weighs in at just over 78 teraflops on single-precision codes and 6.56 teraflops at double-precision math. Here's the scary bit - and it's not surprising: One of these Tesla appliances burns at 800 watts when it is working hard.

Lee says that Appro has been looking at using field programmable gate arrays (FPGAs) and other co-processor technologies for years, but in each case, while the technologies could yield significant performance improvements, the programming models have to be changed and the technologies were often very expensive to implement even if they did yield good results (ultimately). These are big barriers to adoption for FPGAs and other custom supercomputers. But with the CUDA C++ and Fortran programming environment maturing and double-precision math available on the Teslas, customers want to play. And that is even if a rack of the Appro HyperPower might run somewhere between $250,000 and $500,000, depending on the configuration.

That works out to between $3.18 and $6.36 per gigaflops on single-precision jobs. That's in the same ballpark as what Penguin Computing is charging for its 16 teraflops and 32 teraflops clusters using the same Tesla S1070 appliance servers. On double-precision math, it costs between $38 and $76 per gigaflops for a rack of HyperPower machinery, and the relative price of a gigaflops on the Penguin Computing machines also goes up.

Appro is supporting Red Hat's Enterprise Linux 5 Update 2 and Update 3 on the HyperPower clusters and will eventually support Novell's SUSE Linux Enterprise Server 10 and 11 for its European customers. Lee said that for marketing purposes, he should probably say that Microsoft's Windows HPC Server variant would also be supported, but he admitted that in the places where Appro is selling supercomputers, customers are not asking for Windows. (The Tesla GPUs and the CUDA programming environment does work on Windows machines, however, so it is an option should Appro get requests from customers).

While Appro has a reseller agreement with Japanese server maker NEC to resell its gear in the Asian market, the HyperPowers are not part of that deal. ®