AMD finishes its 'Piledriver' Opteron server chip rollout

Looking ahead to 'Steamroller' and 'Excavator', presumably

AMD has had a wrenching couple of years, and its executives are wrestling with so many transitions in the processor market and inside AMD that they are just punting out the new Opteron 4300 and 3300 CPUs for entry servers without making a fuss with the press or analyst communities.

No briefings, no fuss, no muss, here's the press release and the die shot, and we'll see ya next year.

It's an unconventional approach, but El Reg will get you the information about the new chips and do a little analysis, just the same.

The original plan before Rory Read took over as CEO was to push the Opteron 4300 processor up to ten cores with a chip known as "Sepang" and fitting in a new processor socket that was a successor to the C32 socket used with the Opteron 4000s, 4100s, and 4200s. An Opteron 6300 was then going to be two of these Sepang chips jammed into a single ceramic package, side by side, for a total of 20 cores in a single socket.

The problem was that the Opteron 6300 socket was not going to be compatible with the G34 socket. With so few server makers standing behind the Opterons (relative to their peak enthusiasm five or six years ago), a socket change would have been bad for AMD – particularly coming at a time when GlobalFoundries was having trouble ramping its 32 nanometer processes.

And so in November 2011, AMD scrapped that plan and decided instead to just focus on making the Piledriver cores do more work and get modest clock speed increases. In February of this year, AMD publicly copped to this changed plan, giving us the eight-core "Seoul" Opteron 4300 and the "Delhi" Opteron 3300, as well as the aforementioned Opteron 6300 that is already out there.

The Piledriver cores have four new instructions and a bunch of tweaks to goose the performance of the dual-core module compared to the first-generation "Bulldozer" module. The new instructions include FMA3 (floating point fused multiply add), BMI (bit manipulation instruction), TBM (trailing bit manipulation), and F16c (for half-precision 16-bit floating point math).

As we discussed at length with the Opteron 6300 launch, the branch predictors, schedulers, load/store units, and data prefetchers have all been tweaked to run better, and the memory controller had its top memory speed goosed from 1.6GHz to 1.87GHz.

Add up all of the changes with the Piledriver cores, you get 7 to 8 per cent improvement in instructions per cycle, plus slightly faster memory and slightly higher clock speeds.

Here's a graphic that lays it all out:

New features in the Opteron 6300 processors

The Opteron 4300 has eight cores on a die, and depending on the model, it has either six or eight of those cores activated. It is aimed at both unisocket and dual-socket machines, somewhere between a high-end Xeon E3 and a low-end Xeon E5 in the Intel x86 server chip lineup.

The memory controller on the Opteron 4300 supports ultra-low-voltage DDR3 main memory that runs at 1.25 volts, as well as regular 1.5-volt and low-voltage 1.35-volt memory sticks. The processor supports up six memory slots per C32 socket and two memory channels per DIMM for a maximum capacity of 192GB of memory per socket.

The Opteron 4300 has two x16 HyperTransport 3 (HT3) point-to-point links running at 6.4GT/sec linking the two processors in a dual-socket machine together as well as to the chipset and peripherals in the server.

Die shot of the eight-core Opteron 4300

AMD didn't just do a global replace of the Bulldozer cores with the Piledriver cores to make the Opteron 4300s. It made a few changes to the lineup.

For many years, AMD has been shipping four different styles of Opterons. The plain vanilla ones run at the standard voltage and have the standard thermal profiles. The Special Editions, or SEs, run hotter and clock higher and deliver the highest performance, but they are also wickedly expensive and impossible to put into dense servers. The Highly Efficient, or HEs, are a bin sort to find chips that run at significantly lower voltages with slightly lower clock speeds compared to the standard parts, and the Extremely Efficient or EE parts run are a deep bin sort to find parts that run at even lower voltages and lower clock speeds. but which have very low thermals.

The Opteron 6000s series come in SE, regular, and HE variants, while the Opteron 4000s and now Opteron 3000s come in standard, EE, and HE variants.

The interesting thing about the Opteron 4300s is the EE part. With the Opteron 4200, AMD offered an eight-core processor that ran at 1.6GHz with a Turbo Core boost speed of 2.8GHz with a 35-watt thermal envelope; it cost $377. With the Opteron 4300, AMD has down shifted the EE part to only four active cores, but goosed the base clock speed to 2.2GHz and the turbo core speed to 3.0GHz while still staying in that 35-watt thermal envelope.

The Opteron 4300s and their Opteron 4200 predecessors

The other thing to notice is that there are only six Opteron 4300s compared to ten Opteron 4200s, but that doesn't mean much. There are two eight-core parts and three six-core variants, and there are fewer standard and HE SKUs. AMD could add more SKUs in spring 2013, and probably will in February or March when Intel is readying its "Haswell" Xeon E3 chips.

SKU for SKU, the new Opteron 4300s offer the same or around 3 per cent more clock speed than the Opteron 4200s they are most like in the product line, except for that radically different EE part; those chips cost around 10 per cent more. When you add the instruction-level performance to the clock speed gains, you get a chip that has about 10 per cent more oomph for the same increase in cost. Add in compiler tweaks and you can push performance gains up by as much as 15 per cent, says AMD.

This is not the kind of thing that will cause companies to ditch Xeons for Opterons, but by the same token this is probably sufficient to keep Opteron customers who have invested in particular servers adding the new chips to their machines. AMD needs something more dramatic than this to shake up the x86 server biz, and for now it looks like the company is content to have us all wondering what its Opteron ARM plans are.

Made with microservers in mind

The Opteron 3300s, like their predecessors the Opteron 3200s, fit in the AM3+ socket, which is a 942-pin socket that is not precisely compatible with the prior 941-pin AM3 socket and even less so with 940-pin AM2 and AM2+ sockets. (This is not to be confused with the original Socket 940 socket for the Opteron chips from way back when.) What matters to microserver customers is that any machine they have that used an Opteron 3200 can use an Opteron 3300.

The Opteron 3300 and 3200 processors for single-socket boxes

There are three models of the Opteron 3300s, one with eight cores and two with four cores, just like with the Opteron 3200s. In general, the base and turbo clock speeds are up by 100MHz to 200MHz and the prices are the same for the top two parts. The Opteron 3300s are aimed at single-socket servers only, and support up to four memory sticks for that socket, with two memory channels. The Opteron 3300 has one x16 HT3 link running at 5.2GT/sec.

The big difference this time around is that there is an HE and an EE part instead of two HE parts. The low-end Opteron 3320 EE is quite a bit different from its predecessor. For one thing, its clock speed is a lot lower, down to 1.9GHz from 2.5GHz, and its thermals have similarly taken a big dive, down to 25 watts peak from 45 watts with the Opteron 3250 HE part. That is what happens when you drop the voltage and the clocks at the same time.

The low-end Opteron 3300 is perhaps being positioned to compete with the forthcoming "Centerton" Atom processor, as well, which is expected before year's end. Intel is shooting for that to be a 6-watt part, and that means a four-core Opteron 3320 EE has to do four times the work of a Centerton Atom at the same price to compete. We'll see in a few weeks.

What is also interesting is that AMD was pitching the $99 Opteron 3250HE chip at low-end microservers with the goal of helping service providers put together cheap minimalist boxes for hosting (not virtualization, but bare-metal hosting). The Opteron 3320 EE is going to have less performance than its processor – probably somewhere around 20 per cent less is our guess – and yet it costs nearly twice as much.

The thing is, you can get down into a 25-watt power envelope, and AMD clearly thinks it can charge a premium for a "real" x86 processor down in that range. The hosters will still be able to get the older Opteron 3250 HE if they want it, of course.

Now would be a good time for AMD to start telling people about its real plans for future "Steamroller" and "Excavator" Opterons. The engineers had better come up with something good that GlobalFoundries can actually make on time.