Intel's Embedded DRAM: New Era of Cache Memory

With SRAM so costly to make, Intel's Embedded DRAM is an intriguing option. Techinsights examines eDRAM and its potential.

Two industry giants -- Intel and Samsung -- expressed their frustration with SRAM scaling at this year's International Solid-State Circuits Conference (ISSCC) in February.

In the paper Song et. al. from Samsung[1] argued that SRAM not only occupied too much real estate, but the operating voltage did not scale in the same proportion as the logic devices on the same die. Cache size in megabytes was also increasing on the die; so there were more devices on the die that required higher voltages than the main logic part. Hamzaoglu in a paper from Intel[2], revealed that SRAM scaling was not satisfying their requirements and that the majority of the die area was taken by SRAM cell area. The question arose; was it worth continuing to invest in SRAM, especially in 22nm nodes and below, where manufacturing costs were astronomical and could only continue to increase for future technology nodes?

Using fabrication cost and performance data, Intel concluded that an alternative configuration was needed, and therefore opted for an external high-density bandwidth cache memory in the same package. An external memory was easier to fabricate than an embedded SRAM, in an advanced technology process where real-estate was becoming scarce on the die. The DRAM cell was also much smaller than the six transistor SRAM cell layout made at the same lithography node. Moreover, having a separate DRAM die in the same package as the processor reduced chip interface delay, compared with external DRAM in a different package. The eDRAM also required 1/5 of the keep-alive power compared with an SRAM device. This analysis led Intel to release their Haswell processor with an external eDRAM.

The Intel Haswell GT3e G82494 processor came out in the market in October 2013 and was analyzed in our laboratories as part of our TechInsights Award program[3]. Our analysis of the GT3e revealed the general philosophy behind this innovative product -- to solve for frustrations experienced with SRAM scaling.

PackageFigure 1, is the package cross-section which shows the processor and the embedded DRAM side by side. The Intel CT3e graphics and GT3e graphics processing unit (GPU) were packaged in a multichip (MCP) process. There were two dice placed side by side and flip-chip bumped to a FR4 type package substrate. One was the eDRAM and the other was the Haswell processor. The eDRAM die area was one third the size of the processor die area. Both dice were flip chip bumped to the package substrate. The die and the package substrate were connected together by Cu-pillars. The same packaging process was used by Intel 32nm and 22nm logic processes.

[3] The Insight Awards, presented by TechInsights, showcase advancements in engineering innovations in the electronics and semiconductor technology. The 2014 winner for Logic is the Intel Haswell GT3e G82494

Refresh seems to consume about 0.5W. The L4 latency is about half that of DDR channels, and the bandwidth about the same as 4 DDR4. Typical hit rate 95% (that would be remarkably good in general, but maybe the gamer market is what they aim at where locality might be better).

Not clear this is better than a Memory Cube. But probably much cheaper.

The DRAM will leak charge without a refresh, which requires power frequently every second. SRAM doesn't. Maybe SRAM doesn't sleep well enough anymore to conserve standby continuous power? What happened to near-threshold operation?

Interesting. The article only mentions "The eDRAM also required 1/5 of the keep-alive power compared with an SRAM device." Could you explain more about eDRAM's constant refresh and is that something that would be fixed someday, do you think?