AMD’s Bulldozer server benchmarks are here, and they’re a catastrophe

Bad decisions on the desktop have ramifications in the server room.

The desktop benchmark scores for AMD's new Bulldozer architecture didn't make happy reading for fans of the chip company, with the new design sometimes failing to beat AMD's own predecessor architecture, let alone Intel's comparable offerings. Hope still persisted, however, that the processor's architecture might fare better when tasked with server workloads. With the release last week of AMD's first Bulldozer server processors, branded the Opteron 6200 series and codenamed "Interlagos," a host of such benchmarks have arrived from AMD and others.

One reason for the underwhelming performance on the desktop is that the Bulldozer architecture emphasizes multithreaded performance over single-threaded performance. For desktop applications, where single-threaded performance is still king, this is a problem. Server workloads, in contrast, typically have to handle multiple users, network connections, and virtual machines concurrently. This makes them a much better fit for processors that support lots of concurrent threads. Some commentators have even suggested that Bulldozer was, first and foremost, a server processor; relatively weak desktop performance was to be expected, but it would all come good in the server room.

Unfortunately for AMD, it looks as though the decisions that hurt Bulldozer on the desktop continue to hurt it in the server room. Although the server benchmarks don't show the same regressions as were found on the desktop, they do little to justify the design of the new architecture.

The problems with server benchmarking

AMD's basic CPU design is called Orochi. Desktop variants are named Zambezi; server variants are named Valencia. All Orochis have a four-module, eight-thread design. Zambezi disables some Hyper Transport links, whereas in Valencia they're fully enabled. As such, the actual performance of each core is the same whether desktop or server; what varies is connectivity. The extra Hyper Transport links allow AMD to stick two Valencia dies into a single package to create Interlagos. Interlagos also supports SMP.

AMD has announced ten models at this time, offering a range of trade-offs between power usage, module and thread count, and clock speed. The top-end part, and the most widely benchmarked, is the Opteron 6282 SE. This is an eight-module, 16-core part, with a base clock speed of 2.6 GHz, a maximum turbo speed of 3.3 GHz, a power draw of 140 W, and a price of about $1,019 per chip. At the opposite end of the scale is the Opteron 6212, a four-module, eight-thread part with a base clock speed of 2.6 GHz and a peak of 3.2 GHz, a 115 W power draw, and a price of just $266.

Common industry benchmarks, such as the TPC-C transaction-processing database benchmark or the SPEC JBB2005 Java benchmark are always a little tricky to compare. While typical desktop processor benchmarks try to use as many common system components as possible—the same video cards, hard disks, operating systems, memory amounts, and often even motherboards—the major server benchmarks show much more diversity. Although processors are enormously important, they're but one part of a system; things like memory and I/O subsystems are also important.

Some benchmarks, such as TPC-C, offer measures of performance-per-dollar and performance-per-watt. These make direct comparisons easier; most buyers are constrained by budget, and increasingly, they're also constrained by server room power usage. The bang-per-buck and bang-per-watt measures make it relatively easy to see how much performance you can buy given your financial, electrical, and HVAC constraints. However, others, such as SPEC JBB2005, do not.

Price sensitivities are different, too. In desktop systems and low-end servers, the processor is often the most expensive component (or second most expensive, behind the video card), and can easily be 25-30 percent of the total system cost. But as the server becomes bigger and more expensive, with hundreds of gigabytes of RAM and terabytes of storage, the processor can become a smaller part of the total cost. For example, Dell's PowerEdge R910, a 4-socket Xeon server, lets you spend up to about $22,000 on processors, if you get four of the most expensive parts offered (the Xeon E7-4870). That's a lot, but it's nothing compared to the $185,000 that equipping the machine with 2TB RAM would cost.

As a result, while direct comparisons between the Opteron 6200 series and Intel's similarly priced 2-socket Xeon 5600 series are compelling, they're not the only relevant comparisons to make. Systems built around more expensive processors (such as Intel's previous generation 4-socket Xeon 7500 series or AMD's previous generation 4-socket Opteron 8000 series) might also fare favorably in the overall price/performance measurement.

The other thing about server benchmarks is that they're almost all a sales pitch. Enthusiast sites are more than willing to run desktop-type benchmarks that show products in a bad light—they want to help people make informed buying decisions. Big server benchmarks are the almost exclusive domain of the hardware vendors, however. If processor X performs horribly at benchmark Y, the vendors are under no obligation to publish the results—and often won't.

With this in mind, let's see how Bulldozer fares in its server benchmarks.