Breaking The AI Memory Bottleneck

In the long unfolding arc of technology innovation, artificial intelligence (AI) looms as immense. In its quest to mimic human behavior, the technology touches energy, agriculture, manufacturing, logistics, healthcare, construction, transportation and nearly every other imaginable industry – a defining role that promises to fast track the fourth Industrial Revolution. And if the industry oracles have it right, AI growth will be nothing shy of explosive.

“The gains these days are not incremental,” Ajit Manocha, SEMI president and CEO, said to a gathering in July of the Chinese American Semiconductor Professional Association (CASPA) for its Summer Symposium at SEMI’s headquarters in Milpitas. “They are hockey stick – exponential – with AI semiconductors growing in market size from $4 billion this year to $70 billion in 2025.”

Manocha left little doubt that AI is remaking the semiconductor industry and, in the process, the world at large. Internet of Things (IoT) and 4G/5G, both key AI enablers, will account for more than 75 percent of device connections by 2025.

“Today, 30 billion devices worldwide are connected,” Manocha said, citing an Applied Materials prediction that the number of connected devices globally will grow to between 500 billion and 1 trillion by 2030. Those devices will generate stunning amounts of data collected, interpreted and used to reason, solve problems, learn and plan, leading to the holy grail of autonomous machine behavior.

To process this colossal amount of data central to the promise of AI, the industry must break through the limits of a key technology: memory.

Memory a critical AI bottleneck
The challenge for memory starts with performance. Historically, every decade gains in compute performance have outpaced improvements in memory speed by 100 times, and over the past 20 years that gap has grown, said Steven Woo, a fellow and distinguished inventor at Rambus, presenting at the symposium. The upshot is that memory has bottlenecked compute and, in turn, AI performance.

The industry has responded by developing several ways to implement memory systems on AI chips – each suited to different performance needs and each requiring different trade-offs. Among the frontrunners:

On-chip memory delivers the highest bandwidth and power efficiency but is limited in capacity.

HBM (High Bandwidth Memory) offers both very high memory bandwidth and density.

GDDR offers good tradeoffs between bandwidth, power efficiency, cost and reliability.

Since 2012, AI training capability has grown 300,000 times, besting Moore’s law by 25,000 times in doubling every 3.5 months, a blistering pace compared to the 18-month doubling cycle of Moore’s law, Woo said. The staggering improvements have been driven by parallel computing capacity and new application-specific silicon like Google’s Tensor Processing Unit (TPU).

These specialized silicon architectures and parallel engines are key to sustaining future gains in compute performance and combatting the slowing of Moore’s Law and the end of power scaling, Woo said. By rethinking the way processors are architected for certain markets, chipmakers can develop dedicated hardware capable of operating with 100 to 1,000 times greater energy-efficiency than general purpose processors to overcome another big limiter to scaling compute performance – power.

For its part, the memory industry can improve performance by signaling at higher data rates and using stacked architectures like HBM for greater power efficiency and performance, and by bringing compute closer to the data.

Memory scaling for AI
A key challenge is scaling memory for AI. Demand for better voice, gesture and facial recognition experiences and more immersive virtual reality and augmented reality interactions is tremendous, said Bill En, senior director at AMD, speaking at the symposium. Offering these experiences requires more computing capability – high-performance computing (HPC) to make big data analytics possible, the analytics themselves, and machine learning that generates meaningful insights using AI and machine intelligence.

Emerging machine learning applications include classification and security, medicine, advanced driver assistance, human-aided design, real-time analytics and industrial automation. And with 75 billion IoT-connected devices – all generating data – expected by 2025, there will be no shortage of data to analyze, En said. The wings alone of a new Airbus A380-1000 feature some 10,000 sensors.

Mountains of this data are stored in massive data centers on magnetic hard drives, then transferred to DRAM before moving to SRAM within the CPU for the handoff to the compute hardware for analysis.

With data growing at an exponential clip, the question is how to make sure all other memory systems can handle the flood of data. AMD’s answer is a chiplet architecture featuring eight smaller chips around the edge that drive the compute and large chip in center that doubles the IO interface and memory capability to double chip bandwidth.

AMD has also moved from a legacy GDDR5 memory chip configuration to HBM (high-bandwidth memory) to bring memory bandwidth closer to the GPU to process AI applications more efficiently. The HBM provides much higher bandwidth while reducing power consumption. Compared to DRAM, AMD’s HBM delivers a much faster data rate and far greater memory density than DRAM to put memory closer to the GPU so AI applications are processed more efficiently, En said.

Over the next decade, look for more performance improvements from multi-chip architectures, innovations in memory technology and integration, aggressive 3D stacking and streamlined system-level interconnects, he said. The industry will also continue to drive performance gains in devices, compute density and power through technology scaling.