DesignCon Chip Stack Hits 10 TByte/s

A paper at DesignCon caught Herb Reiter's attention because it described a whopping 10 TByte/s connection for a 2.5-D processor and chip stack.

As a missionary for 2.5-D and 3-D chip stacking, I found the most interesting presentation at last month's DesignCon in Santa Clara, Calif., to be a talk by Professor Joungho Kim from the TeraLab at the Korea Advanced Institute of Science and Technology. He described a 10 TeraByte/s low-noise design on an interposer.

By comparison, ultra-high-definition video (7,680 x 4,320 pixels) running at 60 frames per seccond will require a single terabyte per second of bandwidth. The Wide I/O 2 standard for mobile systems targets a bandwidth of 25.6 MByte/s.

Kim's design used an application processor die and a 3-D memory stack from Hynix with 16 GByte of DRAM mounted side-by-side on an interposer (see diagram below). The two blocks were connected by a thousand channels at 10 Gbit/s each, using high-speed differential signaling and equalizers to achieve this very high bandwidth across 4 mm on various interposers.

KAIST demoed a 2.5-D stack using a thousand 10 Gbit/s channels.

Kim showed 10 Gbit/s eye diagrams, measuring results for three different interposer materials: silicon, glass, and organic. He varied the bit rate to show the silicon interposer can still manage 10 Gbit/s on a single channel well, while organic and especially glass interposers offer lower losses at these high frequencies and allow even higher data rates.

The presentation showed the impact of temperature, signal-line spacing, shielding of the metal lines against noise, different through-silicon via structures, varying liner capacitance, and equalizers. Kim explained several other factors shaping such high-speed signals. He even talked about how to embed passives in a silicon interposer.

It would be interesting to know the power dissipated in picojoules per bit when transmitting 10 GBit/s in the demo compared to the power consumed when routing this high-speed signal on a printed circuit board. In a separate talk, Professor Paul Franzon from North Carolina State University gave some indication of what those differences might be.

Franson showed differences of up to two orders of magnitude when comparing picojoules per bit of power consumed for signaling in PCBs, PoPs, interposers, and 3-D chip stacks. You can now view all the DesignCon presentations online.

Alex, great comment. You are confirming that interposers (2.5D) and vertically stacked dice (3D) are ready to change the system-architecture paradigm. Until now, System Architects are limited by the MEMORY WALL, because current solutions either 1) don't allow to put enough memory into the SoC, right next to the CPU -- because of cost or 2) don't offer sufficient bandwidth or short-enough latency to external memory -- because of power- and cooling constraints.

As your comment highlights, current architectures, if using 2.5D or 3D technology, will run into a CPU WALL and designers can forget about the dreaded memory wall. Highly parallel architectures (I know of a design with a 10,000 bit wide bus) will replace current architectures in high-performance and memory-intensive applications.

Alex, please let me know when and how I may talk to you more about this topic. I am looking for an experienced system designer to convey above benefits clearly and in a compelling way to the system design community -- to trigger new architectures utilizing the 2.5D and 3D-IC strengths.

Really an amazing interface. But this raises questions: considering that optimized systems which use cache , run around 10-50 floating point instructions per memory word access. This means that a fitting cpu should run around 100-500 Tera flops per second !

This is clearly impossible with any of today architectures and moore's law current state.