AMD Fusion Architecture and Llano

Llano Analysis

The performance of the Llano CPU cores is unimpressive by any standard. The microarchitecture dates back to 2003 and frankly shows its age in comparison to a more modern design like Intel’s Nehalem – let alone Sandy Bridge. This is not particularly surprising though, as AMD spent less die area on the CPUs than Intel.

Rather than invest die area in an aging CPU, AMD focused on their integrated graphics. Llano has the highest performance integrated GPU to date, handily exceeding Sandy Bridge’s GT2 graphics in some cases by a factor of 2 or more. In part, this is because AMD chose to spend twice as much die area on the GPU (80mm2) compared to the 38mm2 GT2. AMD also has the advantage of programmability, as the Sandy Bridge GPU is not DirectX 11 or OpenCL compatible. At present though, it is unclear how many applications can truly take advantage of the GPU for anything aside from graphics.

The only area where the Llano GPU lags is in media encoding. While AMD integrated the new UVD3 hardware and has excellent decoding performance, Intel has a much more complete and high performance multi-media encoder.

Surprisingly, Intel’s integration of the CPU and GPU is more efficient than AMD’s approach – emphasizing the on-die L3 cache, rather than memory for communication. Additionally, Intel’s power management is unified across both processors so that the clock frequency can be more aggressive.

However, Llano is a huge step forward for AMD from an integration standpoint. It is the first high performance processor for AMD with power gating, which can dramatically reduce energy consumption at idle or near idle workloads. The dynamic voltage and frequency scaling (DVFS) is also tremendously improved, although still lags behind Intel’s more sophisticated approach that also accounts for the temperature of the heat sink and silicon. PCI-Express is now on-die as well, which reduces power consumption for directly attached I/O devices. The Fusion I/O chip also has native USB3, which is a nice bonus.

From a product standpoint, Llano has the right mix of the features necessary for commercial success. Most importantly, AMD finally has a notebook offering with competitive power consumption. While Llano’s CPU and media encoding performance lags behind Intel’s Sandy Bridge, it is sufficient, and the industry leading 3D graphics is a compelling benefit for gamers. Looking at the impact on the market, Llano should substantially improve AMD’s position in mid-range consumer notebooks and help them move up the value chain. The performance and power consumption are still problematic for high-end notebooks, which will continue to favor Intel’s Sandy Bridge but AMD will be able to move out of the low-end. It is also possible that Llano will be popular for mid-range commercial notebooks as well, but it’s entirely unclear if a GPU is actually useful for business notebooks. If AMD can make a persuasive argument for the value of GPU computing, there could be some significant upside.

AMD is already well represented in the desktop market, particularly for value products and Llano will be attractive there. However, the benefits of the excellent AMD GPU are muted – buyers that really care about good graphics (principally gamers) will get a discrete card and are more likely to favor Intel’s faster CPUs. So Llano represents a nice move forward there, but is not a profound shift in the same way that is for mid-range notebooks.

Future Fusion Evolution

Trinity is the codename for the 32nm successor to Llano that is due in 2012. Given that the first Fusion products emphasized time to market and reducing risk rather than maximizing performance, AMD has considerable room for innovation in their next generation of Fusion products.

The most obvious changes slated for Trinity are upgrading the key components – the CPU and GPU. Trinity will use Bulldozer CPU cores, possibly updated with native 256-bit AVX support. The GPU will be based on the newer Cayman, which uses a VLIW4 shader pipeline and has some enhancements for programmability as well. The media encoding will be improved as well, and should reach parity with Intel.

AMD will continue enhancing their integration – with unified power management across the CPU and GPU and more aggressive DVFS that accounts for temperature. It is likely that they will begin to move more CPU/GPU communication on-die and take advantage of caching for lower power and better performance. Moving to a unified memory model is also possible, but seems more likely for the 3rd generation of Fusion. AMD will also upgrade to PCI-E gen 3, which runs at 8GT/s, for external I/O, possibly with extensions for coherency with discrete GPUs.

If AMD was particularly aggressive, they might use 3D packaging to attach high bandwidth memory to Trinity to improve graphics performance. One of the last real advantages of a discrete GPU is high bandwidth and dedicated memory. Even as little as 256MB of attached DRAM using WideIO or LP-DDR3 could bring GPU performance to a new level – at a time when programmable graphics will begin hitting its stride. However, there are still a number of thermal challenges to 3D integration, so 2013 or 2014 may be more realistic.

AMD has clearly articulated a vision for heterogeneous computing at their Fusion Developer’s Summit that has all the right pieces. Llano and Zacate are the first integration steps and focus on reducing overhead by sharing – rather than copying – data between the CPU and GPU. Llano is a serious improvement for AMD’s notebooks, mostly due to power management, which should be well received by the market. The trick for AMD going forward is to execute on their vision and consistently deliver compelling products in a timely fashion.