Cray's midrange line big on Xeons, GPUs

In launching the CX1000 midrange supercomputer lineup, it looks like Cray is finally getting tired of trying to peddle Lexuses and BMWs to people who can only afford Fords and Chevys.

As Cray's top brass were hinting it would when it talked about its fourth quarter financial results back in late February, the company has put out a new product that shoots the gap between its entry CX1 baby super, based on Intel's Xeon processors, and its midrange and high-end massively parallel XT6m and XT6 supers, based on Advanced Micro Devices Opterons and which use Cray's own SeaStar2+ 2D and 3D torus interconnects to scale out.

The new machines are not based on Opteron processors and trimmed down SeaStar2+ interconnects, as El Reg speculated might be possible back in February, but on Intel's new "Westmere-EP" Xeon 5600 and impending "Nehalem-EX" Xeon 7500s, a possibility that was pondered as well. The new blade servers also deploy Nvidia's prior generation of Tesla graphics co-processors rather than the much-improved and still not shipping "Fermi" GPUs that were previewed at the SC09 supercomputing trade show last November.

There are three different models of the CX1000 machines, and only two of them are actually being announced today. The CX1000-C is a 7U chassis that holds 18 half-height two-socket blade servers based on the new six-core Xeon 5600 processors launched last week. There are ten blades along the bottom and eight blades along the top, with space for two fan blades for cooling in the center of the top part of the chassis. This chassis does not make use of the SeaStar2+ interconnect, but does have a 36-port quad data rate (40 Gb/sec) InfiniBand switch for lashing the blades together so they can run parallel computing workloads that use the ubiquitous Message Passing Interface (MPI) protocol for supercomputing.

There is an optional 24-port Gigabit Ethernet switch to hook the machines into Ethernet backbones as well. No word on who is making the InfiniBand and Ethernet switches used in the Cray chassis.

The CX1000-C chassis has one chassis management module and room for four hot-swap N+1 power supplies that together burn about 6500 watts and peak out at 8200 watts. The two fan blades help cool the chassis, and so do the two fans on each blade server.

The chassis also has an ultracapacitor module that allows the mini-super to ride out power outages that are 250 ms or less in duration. (This may not seem like a big deal until your super crashes after running for three months on a simulation and you have to go back and redo some of the calculations from a checkpoint because of a power glitch the human eye can barely perceive.)

The CX1000-C blade server is based on Intel's S5520 "Tylersburg" chipset and could, in theory, support both last year's Xeon 5500 and this year's Xeon 5600 processors. (It is hard to imagine why anyone would want the Xeon 5500s, given that the Xeon 5600's offer roughly 50 per cent more oomph and cost about the same.)

Cray will not support the fastest Xeon 5600 parts in the CX1000-C blades - that would be the six-core 3.33 GHz Xeon X5680 and the four-core 3.46 GHz X5677 - because at 130 watts they are too hot for the blade chassis. But Cray is supporting the six-core 2.93 GHz Xeon X5670s as well as other 95 watt and 80 watt parts with lower clock speeds. The CX1000-C blade has a dozen DDR3 memory sockets and supports up to 48 GB using 4 GB DIMMs. (Cray knows midrange HPC shops are too cheap to spend the extra money on 8 GB DIMMs today.) The blade has a Mellanox ConnectX mezzanine adapter to link out to the InfiniBand switch, a dual-port Gigabit Ethernet controller, and room for one small form factor SATA or SSD drive.

The CX1000-G is a blade setup as well, but it marries Xeon 5600 blades with Nvidia's M1060 GPU co-processors to boost number crunching for certain kinds of workloads where GPUs make sense. The CX1000 chassis is essentially the same 7U chassis with the electronics and two fan blades at the top center of the chassis. But the machine has nine double-wide, half-height, two-socket blade servers based on the Xeon 5600s and including two of the M1060 GPUs on each blade.

The CX1000-G blades have only six DDR3 memory slots, so you have to use more expensive 8 GB modules to get up to 48 GB of memory per blade. The GPU blades have two ConnectX InfiniBand adapters to link out to the 36-port InfiniBand switch in the chassis, presumably double the pipes because there are four computing elements per blade (two CPUs and two GPUs) instead of two with the CX1000-C blades (two CPUs). The CX1000-G blades have room for one SATA or SSD drive, like their C series counterparts.

The last, and perhaps most interesting, of the new CX1000 midrange supers will be based on the Nehalem-EX Xeon 7500 processors, due from Intel on March 30. Cray is not at liberty to say much about these machines, but did offer some hints.

If the CX1000-C represents scale out supercomputing and the CX1000-G represents "scale through" computing (a new term as far as I know for using GPUs to augment CPUs), then the CX1000-S machines will deliver "scale up" HPC with a "fat memory node". The Xeon 5600 tops out at two-socket SMP, so that leaves the eight-core Xeon 7500s, their QuickPath Interconnect, and Intel's "Boxboro" chipset for its most recent Xeon MP and Itanium processors to make SMP nodes that will scale to 128 cores in a single system image. That would be a 16-socket box. As far as anyone knows, Intel is not offering such a chipset, but IBM and Bull have their respective eX5 and Fame 2G chipsets in the works.

Cray could have done its own chipset, of course, but it is equally likely that the company is licensing either the IBM or Bull chipsets. Considering Cray's intense competition with IBM (despite that Cray chief executive officer Peter Ungaro used to run IBM's supercomputer business), using IBM's eX5 chipset seems unlikely if possible. IBM has not said anything about its plans for Nehalem-EX machines beyond four sockets, but according to information obtained by The Register last summer, Bull's Fame 2G chipset (anchored by the Bull Coherent Switch) and related Mesca blade servers were designed to scale up to 16 sockets and offer up to eight DDR3 memory slots per socket.

The Mesca blade servers have four sockets and up to 256 GB per blade, and four of these are lashed together to make a 16-socket, 128-core, 1 TB fat node. InfiniBand switches could be used to link multiple nodes together if necessary, but it seems like the CX1000-S is aimed at providing a single fat node for local and departmental HPC work where having a big memory space to play in is more important than having lots of cores.

Cray could easily make CX1000-C and CX1000-G equivalents using AMD's future eight-core Opteron 4100 and imminent twelve-core Opteron 6100 processors (due on March 29). But making a fat node system is more problematic, since AMD's own chipsets for the Opteron 6100s are topping out at four sockets and 384 GB of main memory using 8 GB DIMMs. This is a reasonably fat node, to be sure. But it is not 1 TB.

The Cray CX1000-C and CS1000-G machines are available now, with entry configurations costing under $100,000. The feeds and speeds of entry configs were not available at press time. Cray has not said when it plans to put Fermi GPUs in the blades, which are the ones that customers really want because they have more oomph and error correction as well. ®