Four blades gang for 2 teraflops bang

SC10 If you want to sell supercomputers these days, you need to have a GPU story to tell. And Big Blue has been telling a story about a blade server for GPUs for the past several months, but unfortunately it doesn't actually have the product ready.

However, in about a month this BladeCenter GPU expansion blade is coming to market, initially as a special bid product and not with the full product name that other IBM blades get.

It looks like IBM is testing the GPU waters rather than jumping in, and for good reason.

The company is obviously eager to sell its BlueGene/P massively parallel clusters (which might be at the top of the Top 500 list if it weren't for these damned GPUs) or the "Blue Waters" Power7-based monsters to customers instead, and then fall back to the existing iDataPlex rack-ish blade-ish servers, which got their support for GPUs back in May.

This may seem odd for a company that spent so much time and effort designing, selling, and using the "Cell" Power-derived accelerators for game consoles and the first big hybrid x64-accelerator supercomputer, the petaflops-class "Roadrunner" super at Los Alamos National Laboratory. IBM has long since-killed off Cell for its own internal use, but it has not exactly embraced any particular GPU or co-processor to replace it.

One more thing before diving into the new GPU blade. The iDataPlex machines are a hybrid that borrows concepts from both rack and blade servers, and with the dx360 M3 machine that came out in May, the two-tray server allowed two Nvidia M1050 or M2050 single-width fanless GPU co-processors to be slid into the upper tray in the unit and attached to a two-socket Xeon server in the lower unit. With 84U of space in its non-standard cabinet it is half as deep as a standard rack, but twice as wide and offers 43.25 teraflops of GPU oomph using the M2050 at double-precision floating point. By the way, all iDataPlex systems are sold on a special bid basis, like the GPU blade IBM is now announcing, and Big Blue did not provide pricing for the iDataPlex Xeon-GPU combo.

The BladeCenter GPU expansion blade does not interface with any and all of IBM's Xeon, Opteron, or Power blade servers, but only with the plain-vanilla HS22 two-socket Xeon 5500/5600 blade server. The GPU expansion blade snaps into a high-speed I/O slot normally used for riser expansion cards, and the way IBM has designed it, up to four of these GPU blades can be snapped together and then linked to a single HS22 blade server for a five-wide hybrid computing element.

Here's what the HS22 blade and the GPU blade look like:

IBM's HS22 blade server (left) and GPU expansion blade (right)

By the way, IBM's announcement letter for this GPU blade says only three can be stacked up, but prior reports from IBM as well as its tech documents say it is four, not three. The BladeCenter GPU blade seems to be a variant of the PCI-Express I/O expansion blade that Big Blue has been selling for some time, which also snaps up to four wide for PCI-Express 2.0 expansion on a BladeCenter blade server. This expansion blade is supported on the Xeon 5500/5600-based HS22 and the Xeon 7500-based HX5 blades. IBM's own Power7 blades (PS700, PS701, and PS702 by the IBM names) can't use this expansion blade, and Big Blue stopped making interesting Opteron-based blades several years ago after doing some actual engineering with the LS22/LS42 blades.

Each GPU blade can be equipped with a single fanless co-processor from Nvidia, with only the double-wide M2070 and M2070Q GPUs supported. (IBM did not explain why it did not use two M2050s, but it probably has to do with air-flow and heating issues.) The M2070 and M2070Q GPU co-processors are fanless models rated at 515 gigaflops of aggregate peak floating point number-crunching power and have 6 GB of GDDR5 memory; they are rated at 225 watts. The Q in the model means that it not only has features to use the CUDA environment to dispatch calculation work to the Fermi processor on the GPU, but also can load up the Quadro drivers for regular GPUs and be used as a visualization engine.

The reason why IBM hasn't put two M2050s into the blade is obvious when you look at this shot of the device:

IBM's GPU expansion blade for the HS22 blade server

Look at all that heat sinkage. If these GPUs were not so hot, you could pack two of them in there. And if you are only going to be able to put one in the box, you might as well put in the more expensive one with the fatter GDDR5 memory.

The BladeCenter GPU expansion blade will be available on December 13. It is supported by Red Hat Enterprise Linux 5, Novell SUSE Linux Enterprise Server 11, and Microsoft Windows HPC Server 2008 (both the initial and R2 releases) as well as the normal Windows Server 2008.

IBM list price for this GPU expansion blade is $6,899. Because the M2050, M2070, and M2070Q GPU co-processors are designed for servers and to be OEMed and embedded into server products, Nvidia does not provide list prices. But the C2070 GPU that plugs into a PCI-Express 2.0 x16 slot and that has a fan only runs $3,999. That $2,900 price difference is a pretty hefty premium to charge for a big heat sink, some bent metal, and diagnostics to plug into the BladeCenter chassis. Then again, it is also $27,596 for 2.06 teraflops, which is about the quarter of what you have to pay per teraflops for a massively parallel x64 or Power machine with a custom interconnect.

No word on when IBM might put FireStream GPU co-processors from Advanced Micro Devices into these expansion blades, but given the level of warmth IBM is currently showing AMD - with a single four-socket Opteron 6100 box coming out this year, and somewhat begrudgingly - you and lots of your IT friends will have to hold your breath a long time and actually turn Blue before IBM might consider it. ®