Forget Moore’s law: Hot and slow DRAM is a major roadblock to exascale and beyond

DRAM is pretty amazing stuff. The basic structure of the RAM we still use today was invented more than forty years ago and, just like its CPU cousin, it has continually benefited from the huge improvements that have been made in fabrication technology and density improvements. Less than ten years ago, 2GB of RAM was considered plenty for a typical desktop system — today, a high-end smartphone offers the same amount of memory but at a fifth of the power consumption.

After decades of scaling, however, modern DRAM is starting to hit a brick wall. Much in the same way that the CPU gigahertz race ran out of steam, the high latency and power consumption of DRAM is one of the most significant bottlenecks in modern computing. As supercomputers move towards exascale, there are serious doubts about whether DRAM is actually up to the task, or whether a whole new memory technology is required. Clearly there are some profound challenges ahead — and there’s disagreement about how to meet them.

What’s really wrong with DRAM?

A few days ago, Vice ran an article that actually does a pretty good job of talking about potential advances in the memory market, but includes a graph I think is fundamentally misleading. That’s not to sling mud at Vice — do a quick Google search, and you’ll find this picture has plenty of company:

The point of this image is ostensibly to demonstrate how DRAM performance has grown at a much slower rate than CPU performance, thereby creating an unbridgeable gap between the two system. The problem is, this graph no longer properly illustrates CPU performance or the relationship between it and memory. Moore’s law has stopped functioning at anything like its historic level for CPUs or DRAM, and “memory performance” is simply too vague to accurately describe the problem.

The first thing to understand is that modern systems have vastly improved the bandwidth-per-core ratio compared to where we sat 14 years ago. In 2000, a fast P3 or Athlon system had a 64-bit memory bus connected to an off-die memory controller clocked at 133MHz. Peak bandwidth was 1.06GB/s while CPU clocks were hitting 1GHz. Today, a modern processor from AMD or Intel is clocked between 3-4GHz, while modern RAM is running at 1066MHz (2133MHz effective for DDR3) — or around 10GB/sec peak. Meanwhile we’ve long since started adding multiple memory channels, brought the memory controller on die, and clocked it at full CPU speed as well.

The problem isn’t memory bandwidth — it’s memory latency and memory power consumption. As we’ve previously discussed, DDR4 actually moves the dial backwards as far as the former is concerned, while improving the latter only modestly. It now looks as though the first generation of DDR4 will have some profoundly terrible latency characteristics; Micron is selling DDR4-2133 timed at 15-15-15-50. For comparison, DDR3-2133 can be bought at 11-11-11-27 — and that’s not even highest-end premium RAM. This latency hit means DDR4 won’t actually match DDR3’s performance for quite some time, as shown here:

This is where the original graph does have a point — latency has only improved modestly over the years, and we’ll be using DDR4-3200 before we get back to DDR3-1600 latencies. That’s an obvious issue — but it’s actually not the problem that’s holding exascale back. The problem for exascale is that DRAM power consumption is currently much too high for an exascale system.

The current goal is to build an exascale supercomputer within a 20MW power envelope, sometime between 2018 and 2020. Exascale describes a system that has exaflops of processing power, and perhaps hundreds of petabytes of RAM (current systems max out at around 30 petaflops and only a couple of petabytes of RAM. If today’s best DDR3 were used for the first exascale systems, the DRAM alone would consume 54MW of power. Clearly massive improvements are needed. So how do we find them?

Tagged In

“A few days ago, Vice ran an article that actually does a pretty god job of talking about potential advances…”

Vice is pretty cool, but I don’t think they ever claimed to be god. Anyway, somebuddy will invent graphene-laser-optical dram and off to the races again!

Joel Hruska

Pretty much no. :P But it’s nice to dream.

margaretjmanus

Peyton . true that Jessica `s blurb is shocking, last
monday I got a gorgeous Peugeot 205 GTi after having earned $6860 this past 4
weeks an would you believe ten-k this past-month . with-out a doubt this is the
easiest-job I’ve ever had . I actually started six months/ago and pretty much
immediately started to bring in minimum $84… p/h . Read More Here C­a­s­h­f­i­g­.­C­O­M­

lenamshields

my classmate’s aunt makes $68 every hour on the
computer . She has been fired for 7 months but last month her paycheck was
$15495 just working on the computer for a few hours. visit the site C­a­s­h­f­i­g­.­C­O­M­

alexagpina

’’my Aunty Allison recently got a nice 6 month old Jaguar by
working from a macbook.this website C­a­s­h­f­i­g­.­C­O­M­

jburt56

Time for different concepts.

Guest

Not if Micron can help it!

It’s a wonder with HMC and everything else on the horizon that they’re bothering with DDR4 at all.

Joel Hruska

You’ve got that the wrong way around. It’s a wonder that people keep predicting that HMC will suddenly appear and Change Everything, given that memory companies have repeatedly said that DDR4 is the mainstream memory solution of the next 3-4 years.

Patrick Proctor

Wrong, the HMC is launching on servers built by IBM starting Q1 2015.

Joel Hruska

[Citation Needed]

Show me where IBM has committed to using HMC for Power8. Power8 uses its own DRAM buffer and CAPI, as discussed here:

Wait, in the last year IBM added HMC support for Power8? And that’s going into Sierra? Is that what this says?

disqus_KBlVJmFRRm

I’m not impressed at all by DDR4 especially since I just got 48gb of low-profile low-voltage DDR3 server memory working and overclocked to provide 13 gb of bandwidth which is competitive with DDR4 offerings, yes my memory is using slightly more power to get there but the point is HMC to me looks a lot better to me as a successor than this DDR4 “innovation”, ppffffffttt, not really when the industry pushing it is having such a hard time getting it adopted…seriously oh and also the 48gb ecc ddr3 I’m running is on a micro ATX board so SFF too…DDR4 I’m not impressed at all…

Zunalter

If memory tech is anything like battery tech, there are 10 game-changing breakthroughs a day that we hear about once and never again.

Joel Hruska

There’s some of that in memory, true enough, but I mostly don’t write about it. I try to limit my coverage of technology to emerging trends that could actually *go* somewhere.

Rartemass

Just use a graphics card for memory, can’t they do everything these days anyway :P

samlebon23

CPU and GPU’s memory are separated by a huge wall, that is PCI Express. The level of latency between the two is horrible, that’s why AMD’s HSA is the best approach.

Hans van den Bogert

it depends (as always), HSA memory bandwidth is in the order of 10GB/s, whereas on a dedicated GPU – once data is on the GPU memory – it’s around 100GB/s. It really depends on the workload if you need shear bandwidth and your computation per byte is high, or you have unpredictable access pattern and you need low latency.

Simon

Why not use the same technolgy thats used for on die L2/L3 cache in the chips on a DIMM. That would solve the RAM latency issue.

szatkus

Ask Wiki:

“SRAM is more expensive and less dense than DRAM and is therefore not used for high-capacity, low-cost applications such as the main memory in personal computers.”

Joel Hruska

Ahah! That’s actually an excellent question. There are several answers.

2). Distance — Remember, one reason L2 is fast is because L2 is literally on-die. DRAM has its own bus structure. It makes it cheaper, but it also makes it slower.

3). Power — Cache eats a lot more power than DRAM per Kb.

samlebon23

Cost.

samlebon23

You need 5 transistor to build a SRAM bit and just one for a DRAM.

dc

sounds like we need to worry more about building nuclear power plants to meet the demands.

Patrick Proctor

Hybrid memory cube is already the solution hitting servers in 2015.

preciousbwallace

Start working at home with Google! It’s by-far the best job I’ve had. Last Wednesday I got a brand new BMW since getting a check for $6474 this – 4 weeks past. I began this 8-months ago and immediately was bringing home at least $77 per hour. I work through this link, go﻿ to tech tab for work detail

✒✒✒✒✒✒ Jobs7000.Com

================================

ExtremeTech Newsletter

Subscribe Today to get the latest ExtremeTech news delivered right to your inbox.

Use of this site is governed by our Terms of Use and Privacy Policy. Copyright 1996-2016 Ziff Davis, LLC.PCMag Digital Group All Rights Reserved. ExtremeTech is a registered trademark of Ziff Davis, LLC. Reproduction in whole or in part in any form or medium without express written permission of Ziff Davis, LLC. is prohibited.