AMD talks energy with 'Llano' cores

Ceepie-geepie cold Fusion

While Intel is talking up its "Westmere" CPUs and their graphics co-processing, which puts a 45 nanometer graphics chip and memory controller inside the same chip package as a two-core Core processor implemented using 32 nanometer processes, rival AMD wants to change the subject to a truly integrated, single-chip CPU/GPU combination - and at the same time make you think about the future, not the present.

At the International Solid State Circuits Conference today in San Francisco, AMD is going to raise the curtain a little bit about the cores used in its "Fusion" Acceleration Processing Unit, or APU. However, it's not going to reveal all the feeds and speeds of the entire "Llano" chip, the first of the Fusion devices now expected in 2011.

As El Reg has previously reported, the Llano chip takes a tweaked four-core Phenom-II core, implements it in 32 nanometer wafer baking processes, and delivers what AMD calls "gigaflops-class" compute performance and native DirectX graphics processing on the same die. The Llano chip is not based on the forthcoming "Bulldozer" cores for servers and workstations or the "Bobcat" cores for mobile platforms.

The original plan for AMD's APU scheme was to have the first Fusion processor out the door around the middle of 2009, but that's life in the graphics and CPU business. If this were easy, you and I would have designed our own chips on the dining room table and we'd be blogging somewhere about whose was bigger and faster - fast being a good thing in this context, of course. But making CPUs and GPUs is not easy, and integrating the two is tricky.

AMD's first plan, of which very little was known, was to do something similar to what Intel has done with the new Westmere family of mobile and desktop chips, which is to integrate a CPU and a separate GPU into a single package and connect them together with a dedicated bus. AMD scrapped those plans and with Llano, this is a true single-chip CPU+GPU design. (Let's call them ceepie-geepies - just for fun).

According to Sam Naffziger, a senior fellow at AMD, the Llano chip will use a 32 nanometer silicon-on insulator process and will have an on-chip DDR3 main memory controller as well as four cores and a DirectX 11 compatible GPU on the die. Naffziger would not divulge the feeds and speeds of the graphics unit, but he said that the GPU is a derivative of the current Radeon HD5000 series and that it will not link to the cores through a HyperTransport link. Instead, it will use a more direct link on the die. The architectural specifics of that link are not being discussed today at ISSCC.

Since the GPU is not just being used to render video and other graphics but to perform complex compute tasks that are getting more common in both desktop and server applications, "this is not a low-end AGP-style GPU we are integrating," Naffziger said with a chuckle.

AMD wants to talk about the Llano core and its power efficiency at ISSCC, as if that was the most interesting bit. The Llano core - not the entire chip, but the core area - has a 9.69 square millimeter surface area and will include about 35 million transistors (not including the 1 MB of L2 cache on the die and the GPU and interconnect between the CPU and the GPU).

The chip will have clock speeds in excess of 3 GHz and the cores will operate at between 0.8 and 1.3 volts and have an operating power dissipation of between 2.5 watts and 25 watts for a single core. The voltage will scale dynamically as workloads need, and clock speeds will also scale dynamically, giving different performance levels and that wide thermal range.

Current, AMD chips can scale clock speeds as low as 800 MHz to reach a low-power state, and with the Llano chip, Naffziger says "our intent is to do significantly better." Existing AMD chips can already drop down to sub-volt ranges.

Naffziger says AMD's wafer baker, GlobalFoundries, and its process partner, IBM, have cooked up a very efficient power gating transistor in that is faster and smaller than the bulk process transistors that have been used in the past to do power gating on the cores. This results in a factor of 10 times reduction in power leakage on the chips.

This is another big reason why the Llano core is going to be a lot more energy efficient than the current Phenom-II cores. This power gating will allow each core to be shut down individually on the Llano chip, which incidentally will have a kind of overclocking capability like Intel's Turbo Boost and IBM's TurboCore features. The respective Xeon and Power7 chips allow the clock speeds on some cores to be bumped up as other cores are quiesced.

The Llano chip will have an array of digital monitors for keeping track of power consumption on elements of the core (and presumably other parts of the entire chip). Naffziger says that these digital power monitors have some big advantages over analog means of getting power usage stats, such as on-die temperature gauges or ammeters on chip voltage regulators. The digital monitors provide fine-grained temperate data in a form that firmware can make use of to control power consumption on the chip, basically by taking power leakage information and switching activity on the chip and combining it with voltage data to calculate power consumption.

The Llano ceepie-geepie will start sampling to OEM partners sometime in the first half of 2010, and it will be available in 2011. Based on that early sampling schedule, it would be reasonable to assume that Llano could come to market early in 2011 - unless, of course, it has some big bugs. That's always a risk, of course, which is why AMD is being vague about timing and features here in early 2010. ®