Architecting Memory For Next-Gen Data Centers

The industry’s insatiable appetite for increased bandwidth and ever-higher transfer rates is driven by a burgeoning Internet of Things (IoT), which has ushered in a new era of pervasive connectivity and generated a tsunami of data. In this context, datacenters are currently evaluating a wide range of new memory initiatives. All seek to optimize efficiency by reducing data transport, thus significantly improving performance while reducing power consumption.

To this end, DDR4 memory was introduced into servers as an evolutionary step forward. DDR4 delivers up to 1.5x performance improvement over the previous generation memory while reducing power by 25% on the memory interface. This translates into almost 8% power savings in the overall data center by converting to DDR4.

The current generation of DDR4 deployed in servers runs at 2.4Gbps and the maximum speed grade, 3.2Gbps, is expected to start shipping this year. The ability to support 3.2Gbps memory has introduced challenges in both the system and the SoC design. As memory speeds top 2.4Gbps, careful signal integrity analysis of the memory channel is needed to perform at 3.2Gbps with comfortable operating margin. Understanding the channel requirements is critical to DDR PHY developers to ensure the PHY can support these operating conditions. Today there are very few companies that actually have working 3.2Gbps prototype hardware that can support the server requirements.

Another more revolutionary approach to increasing server memory performance is the introduction of High Bandwidth memory (HBM). HBM is designed to bolster local available memory by placing low-latency DRAM closer to the CPU. In addition, HBM DRAM increases memory bandwidth by providing a very wide interface to the SoC of 1024 bits. This means the maximum speed for HBM2 is 2Gbits/s for a total bandwidth of 256Gbytes/s. Although the bit rate is similar to DDR3 at 2.1Gbps, the 8, 128-bit channels provide HBM with about 15X more bandwidth.

Perhaps not surprisingly, mass-market deployment of HBM will present the industry with a number of challenges. This is because 2.5D-packaging technology, along with a silicon interposer, increases manufacturing complexities and cost. In addition, HBM routes thousands of signals (data + control + power/ground) via the interposer to the SoC (for each HBM memory used). Clearly, maximal yields will be critical to making HBM cost effective, especially since there are a number of expensive components being mounted to the interposer, including the SOC and multiple HBM die stacks.

Nevertheless, even with the above-mentioned challenges, the advantage of having – for example – four HBM memory stacks, each with 256Gbytes/sec in close proximity to the CPU, provides both a significant increase in memory density (up to 8Bb per HBM) and bandwidth when compared with existing architectures.

As we look to server requirements over the next five years, it is estimated that the total memory bandwidth will need to increase approximately 33% per year to keep pace with processor improvements. Given this projection, DRAM of all variants should achieve speeds of over 12Gbps by 2020 for optimal performance. Although this figure represents a 4X speed increase over the current DDR4 standard, Rambus Beyond DDR4 silicon has demonstrated that even traditional DRAM signaling still has plenty of headroom for growth and that such speeds – within reasonable power envelopes – are possible. In addition, the first production-ready 3200 Mbps DDR4 PHY recently became available on GlobalFoundries’ 14nm Low Power Plus (LPP) process.

We at Rambus look forward to continuing our collaboration with industry partners and customers on cutting-edge memory technologies and solutions for future servers and data centers.

Frank Ferro

2 comments

Mr. Ferro – it would have been great to compare the relative power of DDR and HBM interfaces going forward. At a given bandwidth per bit, HBM1 and 2 consume dramatically less power. What are Rambus and other vendors doing to control power even more aggressively?

Savings in power consumption claimed for Hynix HBM1 compared to say GDDR5 ranges from 50 % and up. Comes mostly from the reduced parasitics and lower clock rate in a very wide data bus made possible by TSVs But etching those holes is expensive and some real estate around them have to be wasted due to stress etc. At this point only real serious or vanity products ( like the AMD game module ) can afford them. There are cheaper alternatives.