New PC industry consortium to develop next-gen memory interconnect

This site may earn affiliate commissions from the links on this page. Terms of use.

The entire computing industry has a memory problem, and a new consortium of industry partners, dubbed Gen-Z, hopes to solve it. For decades, DRAM has driven virtually every segment of the computing market, from smartphones to supercomputers, but new classes of memory devices already threaten that dominance. What’s needed is a new memory interface that can tie these various components together, and that’s where Gen-Z comes in.

Some of the problems with DRAM performance scaling are long-standing, well-known issues. Generally speaking, the amount of bandwidth available per core has continued to decrease, despite the advent of DDR4. Consider the difference between Intel’s Core i7-6950X, with 10 CPU cores and a total bandwidth of 76.8GB/s when using DDR4-2400 versus the Core i7-4960X, with six cores and 59.7GB/s of DDR3-1866. The total bandwidth available to the Core i7-6950X is higher, by nearly 30% — but the 6950X also has 10 cores and 20 threads, compared with the 4960X’s six cores and 12 threads. Total bandwidth per core has indeed gone down — from 9.95GB/s per core for the 4960X to 7.68GB/s per core for the 6950X.

This difference persists even if we assume the user steps outside Intel’s official specs and uses the highest-end RAM realistically available. A quad-channel Core i7-4960X with DDR3-3100 would offer 99.2GB/s of bandwidth (16.5GB/s of bandwidth per core) while a Core i7-6950X with DDR4-4266 offers 136.51GB/s of bandwidth, or 13.65GB/s per core. No matter which components you choose, the amount of bandwidth available per core is going down.

So instead of just beating our heads against that fundamental limit, Gen-Z wants to beef up the performance of next-generation interconnects that might be used to tap these emerging types of memory — some of which need to be connected in ways not covered by current standards.

Today, the majority of systems contain DRAM and some type of storage, be that HDD or SSD. That’s going to start changing in the not-too-distant future, as High Bandwidth Memory (HBM), Hybrid Memory Cube (HMC), and Managed DRAM are all more widely adopted. Other technologies, like Resistive RAM (RRAM), 3D XPoint (aka Intel’s Optane), magnetic RAM (MRAM) and low-latency NAND will all be deployed in various systems and components. The goal of Gen-Z as stated is to build a “memory semantic fabric” that handles communications as memory operations with sub-microsecond latencies, from the time the CPU issues a load command to the time data is actually stored in a register.

That Gen-Z is talking about sub-microsecond latencies leaves a lot of room for speculation as far as final performance is concerned. DRAM is technically a sub-microsecond memory, but we typically measure DRAM latency in tens to hundreds of nanoseconds (the exact number depends on the type of operation being measured, the DRAM’s timing, and the speed of the integrated memory controller on-board the CPU). Gen-Z could offer substantially faster performance for certain kids of attached hardware scenarios than equivalent standards today — particularly when compared with existing interconnect standards, which are often much slower than DRAM.

The long-term goal of Gen-Z is to tie the entire attached ecosystem of products together on a single open standard that can support sub-100ns load-to-use memory latencies in at least some cases. This will be at least partly determined by which type of memory is being discussed, which is probably one reason why the Gen-Z presentation doesn’t contain a lot of hard figures. A number of significant companies are backing the initiative, including AMD, ARM, Broadcom, Cray, Dell, HP, Huawei, Micron, Samsung, SK Hynix, and Xylinx.

The one major company missing is one you might expect to lead such an endeavor: Intel. With upwards of 98% of the server and enterprise markets, Intel hardware is what you would expect these new memory standards to all be compatible with. Even if you think AMD and ARM are poised to seize significant chunks of the data center market, such growth takes years to build. Enterprise giant Cisco is also nowhere to be found. More details and specification information are expected before the end of the year, implying this project has already been in the works for quite some time.

Very thin fiber could be used for interconnects such as PCI-Express and replacement of various PCB traces, but not inside CPUs or GPUs. It is not dense enough. Yes, it could also be used to connect memory DIMMs at a much lower latency (maybe even sub-nanosecond latency at very short distances, if both the memory and CPU can handle that) perhaps with a modest power efficiency.
When? When copper runs completely out of steam. They are going to drain every last bit of it.

gc9

Nice examples of falling bandwidth per core, but GEN-Z doesn’t directly address bandwidth per core, does it? On-package RAM such as HBM or HMC will improve bandwidth to RAM. Off-package and rack-scale GEN-Z might address falling delivered memory capacity per core (maybe due indirectly to bandwidth bottleneck). It might increase the memory that can be addressed by a processor, currently restricted to local memory sockets, to all memory in a rack enclosure. This off-package memory will be slower than RAM, but storage-class memory devices will be persistent.

Storage-class memory devices might greatly reduce the need to copy data from virtual memory-mapped files into RAM for databases and read-once files (assuming memory throughput can be maintained so buffering in RAM is not needed). So more RAM is available for uses other than buffering memory-mapped files.

Rack-scale storage-class memory devices will be more complex than current RAM, because they may be accessed from any processor, not just one processor. So the storage-class memory device must:
– receive and send more complex packets with routing info,
– provide more complex atomic operations (to avoid multiple transits across the rack network),
– create memory-management-unit -like protection zones to protect against access from processes on any processor that do not have permission for addresses accessed, and possibly to provide xor encryption and ECC.

Processors might become more threaded to better tolerate longer latencies to more remote storage-class memory.

The set of atomic semantic-memory operations might be extensible so that database and big-data cluster and HPC software can optimize the operations they need, perhaps via FPGA if not a controller processor. (Start with read-modify-write operations like increment or add a number to a total, then imagine corresponding operations to query or add to a buffer, or a distinct set of keys, etc.)

David Swanson

Quantum holographic entangled memory (QHEM) which is a superluminal type memory and each processor has instantaneous temporal access to all memory wherever the memory is located in physical space as well as temporal space. Information created in the future will be accessible to the past when the system bus is first created. Well that is how its done on in the Tabby star system KIC 8462852 ;) LOL.

Lorfa

Read about ‘dust theory’ and also the book ‘permutation city’ by Greg Egan :-)

Bert Tweetering

ARM has lots of catching up to do over IBM’s POWER (if that’s even possible with ARM architecture). x86_64 might be able to get close.

This site may earn affiliate commissions from the links on this page. Terms of use.

ExtremeTech Newsletter

Subscribe Today to get the latest ExtremeTech news delivered right to your inbox.