Layered Chips + New Programming Paradigms?

I went to an interesting talk by Joe Jeddeloh of Micron. He showed off an example of the 1M-IOPS PCI Flash storage device they are developing, and talked about some of the design tradeoffs. The major win for them seemed to be doing a pure HW design rather than using an embedded processor, which let them get a much greater degree of parallelism within the controller.

The first half of his talk, though was on the "Hybrid Memory Cube". They've designed vertical interconnects, so they can stack multiple DRAM chips on top of a logic layer. The design of the chip is very HPC-oriented. He noted that many HPC data centers are actually power-limited more than anything else, and their product will significantly reduce power per bit (while also increasing performance.)

Previous attempts at system-on-a-chip tended to fall over because the design and fabrication processes for DRAM and for CPU cores are actually not all that similar. The same line can't easily produce high-density DRAM and a highly sophisticated CPU on the same die. Vertical interconnects lets you put two specialized parts together, but they don't have to come off the same fab.

So, of course, he talked a bit about offloading more of the compute onto the logic layer of their package. For example, sending a request to perform an atomic operation over to the chip, rather than locking all the layers of the cache and doing a read-modify-write cycle. Again, focused on the HPC market, there is some work they are doing with partners to figure out what makes sense.

It's not the first time somebody has noticed the memory bottleneck and made this suggestion. But, except for GPUs, this idea really hasn't caught on. And I don't think the latest iteration will either. And it's simple to see why--- suppose the talk was given by a guy from Intel instead. He wouldn't talk about moving more smarts into the memory. He'd talk about how he could now create a single package with 8 cores, each with a few GB of memory. (There are some technical problems. DRAM really doesn't like being hot, and processors are very hot. Putting the CPU on top of the DRAM might help some.) System-on-a-chip (or multiple-system-on-a-chip) gets you much of the benefit at approximately 0% of the pain. Outside of specialized arenas, nobody is going to change their programming paradigm to 'smart DRAM' as long as we can push the conventional model--- which we've gotten much better at scaling recently, thanks to virtualization and cloud-centric algorithms.