Servers are where thread-friendly new architecture shines the brightest

Today, Advanced Micro Devices, Inc. (AMD) announced the Opteron 6300 Series, code-named Abu Dhabi. Equipped with a Piledriver core, the new chips serve as an enhanced replacement to the Bulldozer-equipped Opteron 6200 Series (which was code-named Interlagos).

As we mentioned in our piece on the consumer Piledriver launch, the new core is impressive, featuring a number of performance enhancements. However, consumer workloads tend to be lightly threaded, so the consumer workloads tend to fall short on the price-v-performance scale.

In the server market it's a very different tail. Workloads here tend to be heavily threaded in many cases, such as virtualized infrastructures, web hosting, mobile device data serving, etc.

AMD's numbers may be a bit biased, but it's claiming to essentially match rival Intel Corp.'s (INTC) performance in HPC (high-performance computing) applications, such as chemical simulations, with a chip that's only half the cost.

Intel will likely respond with some aggressive price cuts to stay competitive, but for now it's faced with the puzzle of how to compete with a foe that offers twice as many cores at half the cost.

Pricing (click to enlarge)

Unlike the consumer market, Intel and AMD are largely competing on the same node -- 32 nm -- for server chips. This is because Intel has yet to announce E5 series chips based on Ivy Bridge, having only announced E3 series dual- and quad-core offerings.

We spoke with Michael Detwiler, AMD’s Server Product Marketing Manager, who says that AMD's focus it to be "real targeted, instead of trying to be everything to everybody."

The chips feature a number of subtle improvements, including better branch prediction, new instruction set support, and a more efficient cache. In other words, everything is looking good, although third-party benchmarks weren't available at launch.

Intel will likely respond with some aggressive price cuts to stay competitive, but for now it's faced with the puzzle of how to compete with a foe that offers twice as many cores at half the cost.

The solution to the puzzle is simple: half as many cores, twice the performance per core. The overall chip offers similar performance, and in single-threaded tasks you can get double the performance per task. In multi-threaded environments like VM, hyperthreading helps keep multithreaded performance on a similar level.

Basically, all Intel has to do is reduce their price to match and they're golden.

You're exactly right. The thing is that this higher performing core that Intel needs has already been designed. Haswell finally sees the jump from 3 ALU pipelines (since the Core 2 days) to 4. But lo and behold, 4 ALU pipelines is just the number that AMD has per module (2 per core). For those tasks with more ILP to extract, Haswell will be a monster. With HT, programs can take advantage of that new pipeline even with lower ILP. That's not even counting the performance difference that TSX makes in high core-count systems.

Exactly. It's not rocket science. If you are looking for better performance per dollar, there are better offerings from Intel than the 2690. If you are looking for better performance in a single socket, there are better offerings from Intel.

Twice the cores and similar performance is somehow a bragging right for AMD? It performs about the same in perfect multithreaded applications, and half the performance in single threaded or lopsided balanced applications. Price, sure, but still a very lackluster design without much of a future from AMD.

There are certainly tasks that need high single-threaded performance, too. Even in every-day applications.

Take the recently released game Natural-Selection 2. It's an indy title, so the game isn't the most optimized, which isn't helped by the fact the game (from a compute perspective) is monstrously complex; it drives entity counts that would make other games run away in fear.

The game's dedicated server is only mildly multithreaded; the core work is done in a single thread. Current high-end Xeon processors can't do much more than 16-18 players on a server. As a result, server hosts have resorted to taking consumer processors and overclocking them (4.4 GHz is a favourite target) to get better single-threaded performance, enabling 20-24 player servers.

This is an every-day kind of problem, albeit on a small scale of ~140k players needing game servers to play on.

In this sort of scenario, where single threaded performance is king, and a high-end Xeon can just barely keep up, what use is an Opteron that offers half the single-threaded performance?

You will never see Natural-Selection 2 dedicated servers running on AMD hardware unless either AMD or the game developers make massive improvements.

True. These are also AMD's slides and not independent testing, which tend to exaggerate real world figures. I don't expect real world numbers to be too significantly worse, but it is telling that they haven't shipped chips out for testing when they normally send chips out for opteron launches.

Still its going to be hard to match them in value considering each opteron module shares assets. It'll probably come down to the peak power consumption.