New CPU Architectures Promise Performance Boost

Moore's Law is far from dead. While making transistors even smaller is becoming difficult, creative ways to speed up overall system performance are about to have a major impact on the design of CPU chips and the surrounding elements they control.

The impact of these new approaches will make it possible to continue and perhaps even accelerate system performance growth, so that new demands such as the Internet of Things can be handled economically. At the same time, the boundaries of high-performance computing will expand dramatically.

Much faster computing also will provide mobile interactions with much better graphics and a truly responsive speech recognition system. Internet searches and website loading should speed up, too.

The art of CPU design is to get as much functionality as possible as close together as feasible, with the fastest intercommunication, and without frying the chip. Until recently, this involved making the real estate used on a die by each function smaller due to die-shrinks using tinier transistors. The external interconnect electronics on the die slowly expanded as a result, as faster connections were needed to keep up. Consequently, chips have gotten hotter, and the relationship between CPU cores and memory/IO has been moving out of balance for a while.

Within the CPU, we are seeing two major new initiatives to address this imbalance. The memory bottleneck is being addressed by a completely new serial channel approach, Hybrid Memory Cube, while interconnect is seeing changes both in external interconnect and in the way that cores and CPU modules talk to each other.

As a result of this, CPUs can sustain more cores within a given power envelope, but the better news is that the system-level performance will increase much more due to the memory and interconnect acceleration.

Hybrid Memory Cube envisions a 3D stack of memory devices mounted on a management die. The stacking process involves through-silicon vias -- connections through each layer of the stack.

Partnering with Micron, Intel recently announced some preliminary HMC products. The Knight's Landing CPUs are 72-core versions of the Xeon Phi, which is aimed at HPC. These promise to have as much as 15x the memory bandwidth of today's DDR3. For applications that use an in-memory approach, such as databases, this will be a massive boost in performance.

The first of these HMC products will offer relatively small memory configurations, so memory mapping will still be critical, but we can expect a rapid evolution to terabyte-sized memories with terabyte-per-second performance. This will revolutionize high-end processing.

However, Intel doesn't have a monopoly on HMC. There's a large industry group working on the standards. Nvidia has talked up GPU solutions based on the same technology, and AMD is working on the issue. HMC technology may have two other flavors. One of the design concepts is that the memory, using up to 90% less power than traditional DIMM, can sit on top of the CPU chip and connect to it with TSVs. This has applications in the space (and power) sensitive mobile market.

Another flavor is the idea of hybrid memory. It will take a while, but we can expect stacks to incorporate flash modules, which would outperform today's NVDIMMs and PCIe flash by large factors. The gating technology for this is 3D flash, which increases packing density enormously.

On the interconnection front, the internal links between memory and cores is getting a makeover. Intel is planning an on-chip silicon-photonic "Omni-Scale" fabric, which will speed I/O while radically lowering power on Knight's Landing and more traditional Xeon processors.

Nvidia, which is also planning extensive use of HMC technology to overcome GPU memory bottlenecks, has announced its own inter-GPU connection plan. Though it is not as ambitious as Intel's Omni-Scale fabric, there is still a major performance boost involved, and implementation is likely to be easier.

Radical new approaches don't happen overnight, but the evolution of systems to much higher performance clearly will happen over the next few years. There will be major software impacts, both in the way operating systems tackle the hardware layer and in the way applications are tuned to take advantage of the new modes of operation.

Beyond these changes, research labs are looking at graphene interconnect and transistors, which would allow 3D processor die with much faster clock speeds than silicon. Flash will migrate to a far faster base technology, too. System innovation is back in high gear.

Jim O'Reilly was Vice President of Engineering at Germane Systems, where he created ruggedized servers and storage for the US submarine fleet. He has also held senior management positions at SGI/Rackable and Verari; was CEO at startups Scalant and CDS; headed operations at PC ... View Full Bio

It comes down to the economics of the Through-Silicon Via process. If it's cheap enough, we'll see purpose built stacks of CPU, DRAM and flash in phones and tablets. The space and power savings would be a big plus in mobiles.

The future of the datacenter might very well become interesting and progressive, thanks to HMC. I wonder, if HMC or another new architecture can be implemented for the end-device i.e. PCs, mobile devices and SoC, etc.

Micron has early versions of the memory stack, but they are low capacity, and more a proof-of-concept than a mainstream product. I think the pace will pick up in a few months time. Intel's next-gen very-high-core processor will use the technology.

Our latest survey shows growing demand, fixed budgets, and good reason why resellers and vendors must fight to remain relevant. One thing's for sure: The data center is poised for a wild ride, and no one wants to be left behind.