eDRAM: No Brainer…But No Takers?

By Steve Hamilton
Designers in the consumer electronics market—mobile in particular—are constantly looking for new ways to reduce cost and power while increasing performance. This is far from novel. With consumers’ unrelenting demand for more features at lower prices, you would think semiconductor companies would jump when confronted with a technology that gives them a real competitive edge. The technology I’m referring to is embedded DRAM (eDRAM). I know you may be thinking the cost is prohibitive and this would offset any potential benefit. Hold that thought.

Before we get to cost, let’s first look at the SoC architecture and how the memory subsystem plays a key role in the overall efficiency of the SoC. I say efficiency because with distributed heterogeneous architectures, if careful consideration is not given to the memory subsystem then it will impact many aspects of the chip, especially performance and power. A couple of brief examples may convince you:

The memory subsystem affects performance: This one should be the most obvious, given that if requests from the various processors are not serviced quickly due to high latency, the applications performance will suffer (i.e. you wait).

The memory subsystem affects power and cost: When requests are not serviced quickly the processor cores have to spend more time in active mode consuming more power. The problem only gets worse as you try to overcome performance issues with techniques such as increasing the chip frequency or increasing buffer sizes to allow more requests to be handled—because these both add power and cost to the system. Even adding a second CPU core may be the solution, but again this increases the cost and power.

So what do SoC designers need in order to improve the efficiency of DRAM subsystems? The answer is to minimize read latency and increase memory bandwidth (note that this is the primary function of the cache). The frustrating thing for embedded SoC developers, however, is that what they get from DRAM manufacturers is higher-density DRAMs with no improvements in latency. DRAM vendors keep marching down the path they understand, namely more density with each generation. They have reluctantly moved to newer specs that increase I/O bandwidth, but these specs are hard to use—relying for example on accesses in larger chunks than processors may need. As long as the server market, which requires density, provides the majority of the demand, there is insufficient motivation for the DRAM vendors to optimize for what the mobile market really needs—lower latency.

As an alternative to using external DRAM, developers should now consider using embedded DRAM. eDRAM can be used either in combination with, or sometimes as a total replacement for external DRAM. In many cases, this is more effective than trying to make use of DRAM that was designed primarily for the computing market. eDRAM has the potential to radically improve latency and bandwidth, thereby improving the overall SoC performance while reducing power. Simulations of embedded systems using eDRAM show about 2.5x to 4x improvement in MIPS when compared to the same processor using external DRAM. These simulations also show that using eDRAM allows more system performance than adding a second processor! That’s right, one processor plus eDRAM has higher performance than two processors with external DRAM. Even the power showed significant improvement with external DRAM consuming 2x to 5x more power than eDRAM (while not even including the power consumption of the PHY).

Given so much potential, what’s the hold up? Unfortunately, eDRAM is largely misunderstood by many SoC developers. There are several different technologies used for eDRAM. The best known uses trench capacitors for the storage. This requires a special and expensive process, and is most widely available in SOI, so most people think of eDRAM as expensive. But there are other eDRAM technologies that use metal-insulator-metal stack capacitors for the storage. These have few or no special process steps, are almost as dense, work on bulk CMOS, and only cost a little more than plain vanilla CMOS. For these reasons, I really think developers are missing a big opportunity by passing on eDRAM.

I understand that simulation results and actual chip results are not the same. But with so much potential—one processor doing the job better than two with 1/5 of the power—it seems like a no-brainer to at least do some further investigation.