NextIO squeezes Nvidia GPUs into super-dense package

One of the problems with GPU co-processors is that they don't fit well into existing server designs. The devices run hot and they can affect the performance and reliability of the servers you try to cram them into.

NextIO, which has created a line of virtual I/O appliances, has jumped into the GPU racket and is now shipping enclosures that allow for GPUs to be housed outside of servers and then lashed back to them in the same racks.

The latest GPU chassis from NextIO is the vCore Express enclosure, which is a 1U box with room for four of Nvidia's fanless M2050 single-wide or M2070 double-wide GPU co-processors. The fanless models, which came out in May, are distinct from the Tesla C2050 and C2070 cards in that they do not have their own fans and they rely on a server's own fans to keep them cool.

NextIO's vCore Express GPU chassis

Using the Tesla M2070 co-processors, the NextIO vCore Express chassis can house up to 2 teraflops of double-precision and 4.13 teraflops of single-precision number-crunching power into the box, with typical power consumption being around 900 watts according to the company. The M2050 GPUs, which have only 3GB of GDDR5 memory, are single-wide cards; the M2070s have 6 GB of GDDR5 memory and are double-wide cards. NextIO is an Nvidia reseller and selling the vCore Express chassis as a configured unit. With four of the M2050s installed and four PCI-Express links back to the servers, the 1U chassis costs $12,995. Moving up to four of the M2070s raises the price to $18,995. Shipments begin in November. These are the same exact prices that Nvidia is charging for its respective S2050 and S2070 appliances, and for good reason. They are the same machines, and NextIO is in fact building the boxes resold by Nvidia and its channel partners.

Being an embedded product, Nvidia did not provide pricing on the M2050 and M2070 GPU co-processors, but as El Reg previously reported, supercomputer maker Appro International is shipping combination CPU-GPU machine called the Tetra 1426G4 that packs a two-socket Xeon 5600 server with 96GB of main memory and four of the M2050 GPUs into a 1U rack for $13,000. As long as you don't need the fatter memory on the GPU and a two-socket box is enough to drive four GPUs, this seems like a pretty sweet deal. (So long as it doesn't catch fire.) If you want one GPU per CPU socket, then the NextIO vCore Express (with a few grand knocked off because it is not a server, too) plus four server nodes probably makes more sense.

It stands to reason that NextIO will eventually deliver a similar chassis supporting Advanced Micro Devices' fanless FireStream 9350 and 9370 GPU co-processors, which made their debut in June. The AMD cards have a tiny bit more double-precision oomph and twice as much single-precision performance, but lack EEC scrubbing on their GDDR5 memory - something many GPU customers see as a fatal flaw. Thus far, NextIO is not supporting the AMD GPUs and may not until they do get ECC memory.

The NextIO vCore C200 GPU chassis

In May, as the fanless GPUs from Nvidia were being launched, Next IO launched a bigger 4U chassis for outboard GPU co-processors called the vCore C200, which you can see above. The C200 chassis has a special blade-like enclosure that wraps around the GPU co-processors, which slide into the chassis from the rear. The chassis can support eight of the Tesla M2050 or M2070 fanless cards or the older Tesla M1060 fanless model or the Quadro FX5800 graphics cards. The Quadro FX5800 costs around $3,000 out there on the Web, which is between $2,499 price of the C2050 Tesla GPU and the $3,999 price of the C2070 Tesla GPU. But the FX5800 and M1060 are only rated at 78 gigaflops each at double precision, so these do not make much sense at all. Unless you are getting them for free and space is not an issue.

The C200 chassis, like Dell's PowerEdge C410x GPU chassis announced in August, virtualizes the links between server hosts and GPU co-processors so they can be allocated and re-allocated on the fly as workloads dictate. The C200 chassis from NextIO can have from one to eight hosts, while the Dell chassis, which crams sixteen GPUs into a 3U chassis, has eight PCI-Express links and allows from one to four GPUs to be allocated to a single server.