AMD vs. Intel: power efficiency in the server room rests on RAM

Late last year, AMD responded to Intel's "Clovertown" Xeon launch by touting the differences between the power efficiencies of platforms based on its Opteron versus those based on Intel's dual-core "Woodcrest" Xeons. AMD did not (and still doesn't) have a quad-core answer to Clovertown, so the company played to its strengths by (rightly) emphasizing the performance per watt of Opteron-based systems viewed as a whole. Tech Report was one site that looked into AMD's claims and found that in spite of Intel's clear lead over AMD in CPU performance per watt, AMD's server platform has a substantial lead over Intel's when it comes to system-level power efficiency under certain significant types of load conditions. What this means is that when power is measured at the wall socket instead of at the processor socket, Opteron systems can give better performance per watt than Woodcrest systems.

More recently, a new study by Neal Nelson & Associates pitting Opteron systems against Woodcrest systems was linked in an InfoWorld article, where the results—which were similar to Tech Report's findings—caused quite a stir. Anandtech also published a similar Opteron vs. Woodcrest comparison earlier this month, again with similar results. Because the issue of Opteron systems versus Woodcrest systems is now a hot topic again, let's take a look at the latest results and see what's going on.

The results in brief

Very generally speaking, all three reports found that, when idle, Opteron-based systems tend to consume around 40 percent less power than similarly equipped Woodcrest-based systems. When transaction processing and database workloads are run on the systems, the power gap between the two platforms steadily decreases as the load increases. Under very high load conditions, the Xeon-based systems were able to pull ahead in a few of Anandtech's benchmarks.

So to sum up, on integer-intensive transaction-processing workloads with high levels of concurrency, AMD has a solid power efficiency lead under less heavy loads, but this lead diminishes as the load gets heavier (i.e. concurrency increases).

On my reading of these results, a major factor in Intel's power consumption problems seems to be the fact that Intel uses FB-DIMMs and the Opteron uses vanilla DDR2. Simply put, the FB-DIMMs are much less able to naturally scale their power consumption to fit the bandwidth needs of a given workload. So the higher bandwidth, greater capacity per pin-out, and RAS capabilities that FB-DIMMs offer come at the price of higher power consumption and reduced power efficiency when the system is under less than a full load.

Memory technology and power consumption

Anandtech's attempts to isolate the power consumption of each component in the two systems is instructive, and it indicates that the FB-DIMMs in the Intel systems do draw significantly more power at idle than the Opteron's DDR2 DIMMs. When you combine this with the fact that the Intel system in the Neal Nelson & Assoc. report had a full complement of 8 FB-DIMMs and never bested the AMD system on even the most intense workloads, it becomes even more clear that the FB-DIMMs are the main culprit.

The FB-DIMMs provide a lot of performance, and they even provide a lot of performance per watt at high usage levels. However, the problem is that the FB-DIMM's power consumption starts out fairly high at idle, though it doesn't appear to go up very much under load conditions. This problem of high initial power consumption gets worse the more FB-DIMMs you add to the system.

Regular DDR2, on the other hand, doesn't consume nearly as much power at idle. Furthermore, its power usage scales with the demand that's placed on the memory subsystem. The relatively abstract graphs below are my attempt to illustrate this general relationship. (Please note that these graphs are not drawn from any particular dataset, but are my quick-and-dirty attempt to make a point.)

The end result is that the power efficiency gap between FB-DIMMs and DDR2 closes as the workload begins to stress the memory subsystem more. This is why the overall performance per watt gap closed on the transaction processing benchmarks as concurrency increased.

Now, there are certainly factors other than the memory technology that contribute to these results—indeed, aspects of the two reports' testbed systems may have contributed as well (e.g., the difference in PSU ratings between the Opteron and Woodcrest systems in the analyst report). But all things being equal, the memory issue seems to have played a decisive role in shaping the results. So while the Intel CPUs themselves enjoy a performance per watt advantage, that advantage seems to be for naught at the system level under certain (probably common) workload conditions.

The FB-DIMM's inability to downscale its power consumption with decreases in memory subsystem loads is a bigger problem than you might think at first. If you read my recent coverage of Intel's power-efficiency research initiatives, then you know that the ability to dynamically match power consumption to workload demands is the very essence of dynamic power optimization. Until the FB-DIMM gets better at this, it's going to have a strike against it for power-sensitive datacenters.

Lower-power FB-DIMMs on the horizon?

FB-DIMMs are implemented using standard DDR2 memory chips, but the addition of an advanced memory buffer (AMB) chip to the module is what gives FB-DIMMs their superior bandwidth, granularity, and RAS abilities. The AMB is also what accounts for the FB-DIMMs extra power consumption, and it's the reason why FB-DIMMs require a heat spreader.

IDT has announced a low-power AMB, called AMB+, that supposedly cuts the power consumption of the FB-DIMM by up to 40 percent. This is a good start, but if Anandtech's numbers are anywhere near reliable it won't be enough to fix the problem. The AMB+ parts have also not hit the market yet, and rumor has it that there are some yield issues there that are delaying them.

Even when IDT does introduce AMB+ for its customers, there's no indication that Intel has any plans to make similar changes to the AMB that its own FB-DIMMs use. So one wonders what Intel has up its sleeve in the memory department, because with the company's massive commitment to power management it's certain that something is brewing.

Ultimately, the FB-DIMM is a more primitive move down a road that Intel has already been down once in the past: RDRAM. The serial interface, high per-pin bandwidth, and high granularity are all reminiscent of RDRAM's features. It's doubtful that Intel will return to Rambus after the falling out and recommit to a proprietary memory technology with a royalty-based licensing model, but I wouldn't completely rule it out.

I haven't been fully briefed on FB-DIMM technology, but from what I do know about it, the current generation of RDRAM that's used in the PlayStation 3 is technically superior to FB-DIMMs on any number of counts. It's also the case that Rambus is really keen on getting back into the commodity consumer market (their most recent design win is a DLP projector), and they'd no doubt love to be in an Intel platform in any segment—server, desktop, or mobile. But whatever happens, the industry is aware of the power consumption problems with FB-DIMMs and is working to mitigate it.