EX-800 Blade Server Is the Buzz at Super Computing Conference

Do you recall my column from a couple of months ago in which I was introduced to Micron's next-generation universal memory technology, the hybrid memory cube (HMC)? It involves a chip package containing a 3D stack of DRAM memory die connected using through-silicon vias, as illustrated below.

Micron's hybrid memory cube (HMC).

A single HMC can produce an incredible data bandwidth of 160 GBytes/sec, thereby providing more than 15 times the performance of a state-of-the-art DDR3 memory module while consuming 70% less power. (Multiple HMCs can be chained together to appear as a single, mega-humongous memory.)

The EX-800 is a PCI Express board that features the combination of an HMC and four Altera Stratix V FPGAs (providing 3.6M FPGA gates). The massively parallel computational capabilities of the FPGAs can be used to drive the HMC at full speed. Key features of the EX-800 include:

160 GB/s of memory bandwidth

16 full-duplex lane connections from the HMC to each of the four Stratix V FPGAs

A 4GB Micron DDR3L SODIMM dedicated to each of the four FPGAs (32 GB total)

PCI Express Gen 3 full duplex switch

x16 Gen3 PCI Express to the host

x8 Gen3 PCI Express link to each Stratix V FPGA

The amount of digital data in the world is increasing at a tremendous rate. Some say the world's data is doubling every two years, while others say it's doubling every year. A wide variety of big data applications -- including bioinformatics, financial trading, imaging, and surveillance -- require the ability to capture and process an ever-increasing volume of data at faster rates. Pico Computing says the extreme bandwidth and exceptionally low latency offered by the EX-800 will enable significant advances in these and other industries.

@alex_m1: Max, do you think the laptop processors can handle all this bandwidth?

No -- of course not -- that's why the Pico Computing coard uses four Altera FPGAs, because you can use the FPGA fabric to perform computations / processing in a massively parallel way. So if we see laptops with hybrid memory cubes, I met we will also see their processors augmented by FPGAs or FPGA fabric...

When I see FPGAs used in computing I can't help but think back to transputers and the promise of CPUs which reconfigure themselve to adapt to the tasks requested of them. I supported the Parallela project by Adapteva because I had experience of what interesting things can be done with new architectures (I tested ZiiLabs Stem Cell processors later bought by Intel). But I am disappointed that Adapteva hasn't yet managed to get to mass-production and worse still there seem to be few good tools to take advantage of their architecture.

I really thing computing won't move on until we start to disassociate the hardware from the software. It is perhaps herasy for someone in the embedded software business to say that, but I think if (at the top end) we can abstract away the problem from the target we can be more flexible about the ways in which the target devices are built. At the moment attaching an FPGA to a linux computer doesn't accellerate anything unless you write application specific code as a one-shot process.

My solution? We need JITs which have a common input language and a means to adapt to the hardware. Can we have a genetic algorythm which adapts to the architecture to create an intermediate instruction set? Initially it might not run fast, or it might not seem to run at all, but after a time it might optimise beyond human programming ability. There seems to be some research in this area but I don't know how production ready these are.

To me, FPGAs remind me of early computing where we had to deal with individual chips rather than the SoC that we have today.