Why memory is the weak link in AMD’s latest Fusion chip

Llano, AMD's second entry in its Fusion family of processors that combine a CPU and GPU on the same die, launched earlier this month to moderately positive reviews. But until now, little detail was known about exactly how AMD had handled the integration of the CPU and GPU on Llano's die.

David Kanter at RealWorldTech has done some digging and put together an in-depth look at Llano, comparing its CPU/GPU integration to that of Intel's Sandy Bridge. Kanter's piece answers some questions about Llano that were raised by the reviews.

Aside from its weak CPU core, the main shortcoming with Llano that the reviews highlighted is the fact that the processor's GPU core is incredibly constrained by memory bandwidth. The Cypress GPU that's used for Llano was designed for a discrete graphics card, where it would have access to a gigabyte or two of high-bandwidth, dedicated GDDR memory. On Llano, in contrast, the GPU shares main memory with the CPU, and the result was that performance was bottlenecked severely. Kanter's article gives some insight into why this is.

Instead of linking Llano's CPU and GPU with high-bandwidth ring bus and letting them share an L3 cache (the Sandy Bridge approach), AMD left the two parts relatively unconnected internally. Instead, the CPU and GPU use main memory to communicate without copying data from one location to the other. On boot, the GPU gets access to 512MB of main memory in a separate memory space; the CPU gets the rest of the RAM.

Internally, there's a small bidirectional bus that connects the GPU to set of coherent memory queues, and there's another bus that connects the GPU to the DDR controller; but that's it. The CPU talks to the GPU using the graphics driver and main memory, and the GPU can talk to the CPU using coherent requests to special regions of memory, but the latter is fairly slow.

In all, then, the lack of a high-bandwidth internal link between CPU and GPU, and the dependence on main memory for communication, means that Llano's graphics performance is pretty much choked by the chip's dual-channel DDR3 controller.

As for the future of Llano, I had suggested that AMD might consider a pool of eDRAM that the CPU and GPU could use for shared memory and on-die communication, but Kanter offers a more feasible alternative for boosting a future Fusion processor's graphics performance: use 3D chip stacking techniques to put a small amount of memory in the same package as the processor. The amount of memory wouldn't have to be much—even 256MB of high-bandwidth, low-latency memory would dramatically boost Llano's performance.

All of this, once again, shows just how big of a bind NVIDIA is now in, and why the company has to make an attempt on the desktop space with Project Denver. Sandy Bridge and Fusion spell the beginning of the end for the discrete GPU market, which is still NVIDIA's bread and butter.