Covering Everything about Memory Chips

What Memory Will Intel’s Purley Platform Use?

There has been quite a lot of interest over the past few days about the apparently-inadvertent disclosure by Intel of its server platform roadmap. Detailed coverage in The Platform showed a couple of slides with key memory information for the upcoming Purley server platform which will support the Xeon “Skylake” processor family. (A review of this post on 7/13/17 revealed that The Platform’s website has disappeared. The above link and the next one no longer work.)

The Memory Guy puzzled a bit about what this might be. The only memory chip technology today with a cost structure lower than that of DRAM is NAND flash, and there is unlikely to be any technology within the leaked roadmap’s 2015-2017 time span that will change that. MRAM, ReRAM, PCM, FRAM, and other technologies can’t beat DRAM’s cost, and will probably take close to a decade to get to that point.

Since that’s the case, then what is this mystery memory? If we think of memory systems, rather than memory chips we can come up with one very plausible answer. Intel may be very obliquely referring to Diablo’s Memory Channel Storage, or some similar approach.

Diablo’s approach, which places NAND flash on the memory bus, provides a larger amount of memory on a single DIMM than can be done with standard DRAM at a lower cost per gigabyte than DRAM, and its architecture supports higher bandwidth than is available through most NAND flash interfaces.

Although Diablo’s current customers include only SanDisk, with its ULLtraDIMM, and IBM, which sells the SanDisk product rebranded as the eXFlash DIMM, Diablo has repeatedly asserted that it is working with other vendors to increase the adoption of its technology. The Purley slides appear to indicate that Intel has decided to promote the approach.

This would be quite an accomplishment for Diablo, but it fits with a trend that Intel has been supporting for a number of years: to bring flash into the platform in order to unleash the processor’s capabilities.

You can safely bet that I will be watching the Purley platform closely for further disclosures of Intel’s memory plans! As I learn more I will share what I can.

I’m just basing it on what Micron and Intel are saying. Will it be cheaper than DRAM in the first gen? mb not. But according to this slide from Micron, http://www.reram-forum.com/wp-content/uploads/2015/04/New-Memory-300×220.gif , the performance focused memory is supposed to be faster than NAND, cheaper than DRAM, and persistent. If STT lends itself to be built 3D like NAND but at an even bigger node since it is only 4x density of DRAM it is potentially possible that it could be produced at lower cost out of the gate.

Derek, you could be right, but I am pretty jaded after having seen a lot of other hopeful memory technologies fail to displace the entrenched competition.

Note, too, that I added a couple of comments below the post that point to the likelihood that the “Apache Pass” memory is an 800GB module, which is likely to be the next-larger capacity for SanDisk’s ULLtraDIMMs (or IBM’s eXFlash DIMMs), since today they ship in 200GB and 400GB capacities.

But a difference of opinion makes anything more fun to watch. Let’s keep an eye on this and see who was right and who was wrong.

Interestingly enough, NAND on the memory bus is good for computing of all sorts, not only Big Data.

Objective Analysis ran a series of PC benchmarks some time back that found that, after the first gigabyte or two of DRAM was installed, you got a greater performance boost by adding a dollar’s worth of NAND or SSD than if you added a dollar’s worth of DRAM.

It sounds interesting. But, it is hard to believe that just 1GB~2GB DRAM backed by large NAND capacity at PC-level system greatly boosts application performance all the time. It must be dependent on working set size of workload. Isn’t it? Jim, can you tell me more about the results from Objective Analysis? I am wondering how they drew the conclusion with which system configuration and with which workloads. Does the report also have the cost-effectiveness analysis model to justify their argument?

The report present the benchmarks’ findings in two ways: Performance as a function of DRAM and NAND flash size, and performance as a function of cost.

The cost analysis basically says: “For a combined memory/storage cost of $X you can get this much performance with a flash-heavy approach, and that much performance with a DRAM-heavy approach.” Of course, the report puts real numbers around this, but I didn’t include them here.

In all cases, though, for a fixed-cost system, performance increases as you decrease DRAM and increase NAND once the DRAM reaches a certain relatively small size.

Thank you for sharing an interesting article. It would be interesting to understand the load/store access latency of the ‘Apache Pass’. Does it make sense to introduce something slower (think random access!) than DRAM on the memory bus ?. What % of time does a processor spend waiting for data to load from DRAM ? Does Diablo approach solve these issues ?

Harish’s question is on point: Much has been made of the I/O performance benefits of putting (relatively slow) nonvolatile media on the fast, fixed-latency memory bus. I suspect Intel’s architects get a good chuckle out of this.

For any workload that doesn’t fit in L3, modern server CPU’s are badly memory-bound. This excellent paper: http://www.usc.edu/dept/ee/scip/assets/001/56439.pdf shows that for TPC-E, a Xeon experiences an L3 miss rate of only 1.5%, but nonetheless spends 19 out of every 20 instruction slots stalled waiting for DRAM! The limit on the number of cores in a modern server CPU is simply that each additional core adds computational capacity but uses memory bandwidth and so starves all the others. At some point the marginal return is negative. Intermixing multi-microsecond accesses with DRAM traffic can only make this worse. It will doubtless result in great I/O benchmark scores, but for many workloads will kill the overall performance of the machine, and Intel clearly knows this.

If the load/store latency of Apache Pass technology is fixed (like DRAM) and relatively fast, then sharing the DRAM bus may make very good sense. If not, expect Intel to either enhance PCIe, attach it to QPI, or add some other port to access it without disrupting memory traffic. The key, as Harish implies, is that the latency requirement is less about making I/O fast than it is about staying out of the way of DRAM traffic.

Good point. The idea of adding NAND to the DRAM bus is to cut the latency of going to NAND through an HDD interface. Some have tried to do this by stalling the DRAM bus altogether to wait for the NAND access, but the Diablo approach (as I understand) is to DMA the NAND data into DRAM once a page fault has been determined.

As you point out, either of these will reduce DRAM bandwidth to the CPU, and this must be traded off against the alternative, which may hang up other parts of the process. There is no doubt that there will be certain applications for which a NAND DIMM will be a great choice, and others for which it won’t be useful. We’ll learn more about which is which over the next few years.

Hey guys, I came across this post after reading a rather lengthy and what seems like a well researched article on Seeking Alpha. I know very little about memory so I would greatly appreciate your input. The author claims that Intel could use Micron’s PCM memory in order to achieve what you guys are discussing above. If you have a chance, please take a look at the article and let me know if you think it makes sense, it’s possible but a longshot, or just pure nonsense. Thanks!

I was apprehensive that it was on Seeking Alpha, and it unfortunately met my expectations. Most of what I read on that site consists of lengthy discourses steeped in conspiracy theories covering misunderstood technologies and markets using overly-prosaic language laced with spelling and grammatical errors.

In brief, I don’t buy it. If you’re looking for investment guidance, I would suggest for you to look to other sources.

Apache pass on a client machine can improve the user experience compared to PCI/SSD. Faster boot times, no more sleep mode; instead instant-on from zero power hibernate and zero time hibernate, supercap RAM backup. Nothing is faster than DRAM, but Jim’s premise is about price/performance, not raw performance.

It’s hard to tell whether or not 3D XPoint has anything to do with Purley & Apache Pass.

My guess is that this new chip is too new to play a big role in any pending platform. It usually takes at least 5 years to bring any new semiconductor technology to the point that it meets its cost goal. Most of the time it takes far longer.

My prediction is based on the cautious assumption that NAND will be abundant and cheap, but that 3D XPoint may take quite some time to actually become cheaper than DRAM.

I just wondered what you thought of rram as touted by crossbar. It looks more cost effect, less power hungry and can be packed in higher density than any of it’s competition. I can see some drawbacks but every system has at least one drawback (some more).

Crossbar isn’t alone in having a good technology. There are lots of very good alternative technologies out there, all poised to change computing as we know it.

The problem for any of them is getting into volume production. Until they reach volume they will be too expensive to make a difference, and if they are too expensive then they won’t get into volume production.

I like to say that “The road to hell is paved with technologically superior products.” Technology is not the issue here. The issue is cost.