AMD to crank up performance with Bulldozer Opterons

Shared components, more clocks

Although AMD won't release its "Bulldozer" PC and server processors and "Bobcat" mobile processors until sometime next year, the company is expected to be the star of the Hot Chips conference at Stanford University later this month.

After all, the IT market is always looking ahead, so AMD has to keep talking about its future to keep OEM partners, corporate customers, and consumers interested in what it has coming down the pike.

AMD started discussing the Bulldozer and Bobcat cores back in November 2009, then raised the veil on the modular, cookie-cutter design of the Bulldozer cores and their first implementations in the "Interlagos" and "Valencia" Opteron processors last December.

At Hot Chips on August 22 through 24, as El Reg has already reported, AMD will no doubt provide some more feeds and speeds on these chips, which are the foundation of the next several years of AMD's microprocessor business.

When going over its financial results last month, AMD president and CEO Dirk Meyer said that the first Bulldozer design had taped out in the second quarter, with samples going into the hands of OEM customers sometime in the second half of 2010 and the chips appearing in machines in 2011.

Somewhat ominously, however, the "Llano" Fusion chip, which marries a tweaked quad-core Phenom II chip to a modified HD5000 series GPU — and which AMD calls an Acceleration Processing Unit, or APU — was pushed out into the first half of 2011. The Llano APU uses the 32 nanometer wafer-baking process from AMD's Opteron foundry partner, GlobalFoundries.

This is only interesting to server and workstation buyers inasmuch as the "Interlagos" G34 socket and "Valencia" C32 socket processors will use the same 32nm process at the GlobalFoundries fabs in Dresden, Germany. Right now, the yields on the 32nm process are not as high as AMD and GlobalFoundries had anticipated. That could mean delays for the Bulldozer systems, but considering that AMD has given itself a 12-month target to hit, it will be hard to say if the Interlagos and Valencia chips will be late.

The good news for AMD is that the "Ontario" APU, which pairs a Bobcat core with an on-chip GPU, was pulled ahead into the fourth quarter and will ship for revenue then; products based on the Ontario chip will ship in the new year. The Ontario APU is being manufactured by Taiwan Semiconductor Manufacturing Corp using its bulk 40nm process. TSMC does not, as yet, make Opteron chips, but it does make AMD's graphics processors — and if GlobalFoundries continues to have issues, it may get at least some of the Opteron jobs, too.

The stakes are high, and if GlobalFoundries drops the 32nm ball, AMD will be hurt by delays. But AMD is confident enough in its former chip fabs, which it spun off early last year to create GlobalFoundries, that it is starting to make performance claims about its future Bulldozer-based chips.

In a blog posting, John Fruehe, director of product marketing for server/workstation products at AMD, laid out what El Reg assumes is a conservative teaser.

"We release benchmarks at launch, so don't expect too much detail there anytime soon," wrote Fruehe. "From a performance standpoint, if you compare our 16-core Interlagos to our current 12-core AMD Opteron 6100 series processors (code-named "Magny-Cours"), we estimate that customers will see up to 50 per cent more performance from 33 per cent more cores. This means we expect the per core performance to go in the right direction — up. That is all I will say until launch."

This is an interesting statement, and it says more than it looks like on the face of it.

With the past couple of generations of Opteron processors, AMD took the whole core — processing elements, memory controllers, caches, and so forth — and plunked as many as its wafer-baking processes allowed onto a single die: two, then four, then six. With the Magny-Cours chips, the 45nm processors didn't allow AMD to make a twelve-core processor on a single die, so it took two six-core chips and put them in the same package, sharing the same G34 socket.

With the Bulldozer chips, as El Regexplained in detail late last year, the basic building block for the chip is what AMD calls a "module", with a single-threaded, four-pipeline integer unit with its own L1 cache. Two of these integer units are in the module, and so are two 128-bit floating-point math units; all four components share a single (but wide) set of instruction fetch and decode units, as well as shared L2 caches, shared L3 caches, and shared northbridges to link out to peripherals. These two Bulldozer quasi-cores have a shared floating-point scheduler and two integer schedulers; if the integer units are doing nothing, this quasi-core can execute four double-precision or eight single-precision floating point operations in a single clock.

The Valencia Opteron chip for single-socket and dual-socket machines, presumably to be called the Opteron 4200, will put four of these modules on a single die, with a shared L3 cache spanning all the cores plus an integrated memory controller and integrated northbridge on the chip. Valencia chips are expected to ship with six or eight cores active. Presumably, the Interlagos chip is two of these slapped onto a single die, sharing a doubled-up L3 cache and two memory controllers and one or two northbridges on the single chip. Interlagos is expected to be available in versions with 12 or 16 cores and for two-socket and four-socket boxes.

AMD has said that Bulldozer's shared component approach results in a Bulldozer module with two quasi-cores, and yields about 1.8 times the performance as two current Magny-Cours cores. That's a 10 percent performance hit, clock for clock, for every pair of cores, but much lower power consumption because of the shared nature of the Bulldozer modules.

If AMD didn't change anything and just moved to a 16-core design from its dual-six design, that alone would yield a 33.3 per cent boost in performance. But adjusting by that 1.8 factor for the shared components means the core count only gives you about 20 per cent more oomph, clock for clock. So the other 30 per cent is coming from changes in the core and clock-speed increases.

My guess is that Interlagos Opteron clock speeds will hit something like 2.75GHz with standard 80 watt parts, and possibly higher if AMD can push it; Special Edition Interlagos chips will push clocks even higher. That's quite a bit faster than the disappointing 2.2GHz that AMD got with the 45nm twelve-core Opteron 6100s in standard wattages.

And that potential 2.75GHz clock speed for Interlagos chips is also a whole lot closer to the 2.93GHz or so that Intel is able to dance around with its Xeon 5600s for two-socket boxes (using standard 95 watt parts) and a bit more clocks that Intel can do with its Xeon 7500s, which peak at 2.26GHz in eight-core parts (and burn 130 watts) for two-socket, four-socket, and eight-socket boxes. ®