Like MPG for car buyers, server energy efficiency is becoming an increasingly important selling point for datacenter operators, who are facing soaring power bills, shrinking electricity supplies, and in some cases, the need to reduce CO2 emissions. As evidenced by a recent round of lab tests performed by independent analyst Neal Nelson and Associates, the differences in energy efficiency among servers can be striking, potentially varying significantly based on CPU, memory, workload, and other factors.

Nelson released the results of a lab test this week in which he pitted AMD's low-power 45nm quad-core Opteron Shanghai HE processor (model 2376) against Intel's low-power 45nm quad-core Xeon processor (model L5420). Nelson didn't just measure the overall raw performance (throughput) of the chips; he also assessed their energy efficiency. In other words, he determined which CPU delivered the highest performance per watt.

Cutting to the chase, the Opteron server tended to deliver marginally better performance (measured in transactions per minute) at both lower and higher numbers of simulated users (which ranged from 100 to 500 in 50-user increments ). However, when it came to power efficiency, the Opteron was the decisive victor across the board.

Making the test bed

Nelson's test bed comprised two virtually identical servers, one configured with a pair of the low-power, 45nm quad-core Opteron CPUs and the other with a pair of Xeons. There were a couple of differences, however. First, the AMD server's CPU had a clock speed of 2.31GHz, whereas the Xeon's had a slightly higher clock speed of 2.5GHz. Second, whereas both servers used DDR2 (Dual Data Rate 2) memory modules, the Intel server used FB-DIMMS (Fully Buffered Memory Modules) and the AMD server did not. (According to Nelson, Intel requires that all current generation Xeon servers use FB-DIMMs).

Both servers were configured with identical software and system components and set to run Web-based transactions against a MySQL database. Transactions were fed to the servers from a cluster of 32 Linux-based computers that were executing Remote Terminal Emulation (RTE) software. Millions of transactions were fed to each system, during which Nelson measured both throughput and power consumption.

The servers were presented with two types of workloads: calculation-intensive and disk I/O-intensive. The calculation-intensive workload presented transactions that repeatedly accessed a small area of the MySQL database, performing only tiny amounts of physical disk I/O. The I/O-intensive workload offered a mix of transactions that spanned an area of the database that was much larger than the server's largest possible size of disk cache memory. "This large database footprint ensured that virtually every transaction would cause a read-from and/or write-to the physical disk drives," according to Nelson's report.

Nelson ran the entire test sequence twice. The first time, both servers were configured with 4GB of main memory. The second time, they were both configured with 16GB of memory.

Performance just a starting point

The testing yielded some interesting results. The Opteron (despite its slightly slower clock speed) delivered higher performance at 100 simulated users: When the servers were loaded with 4GB of memory, the Opteron managed 32,314 TPM (transactions per minute) whereas the Xeon completed 30,406 TPM, a 6.3 percent difference. At 16GB of memory, the Opteron also fared better at 100 users, but the Xeon closed the gap: 30,989 TPM versus 30,667 TPM, a 1 percent difference.

Similarly at the maximum number of simulated users, 500, the Opteron delivered more TPM. When the servers were equipped with 4GB of memory, the Opteron performed 5.2 percent more TPM: 5,081 versus the Xeon's 4,831. At 16GB of memory, the Opteron squeezed out 0.6 percent more TPM: 5,744 compared to 5,711 for the Xeon.

However, when the numbers of users ranged from 250 to 400, the Xeon was the consistent winner, coming out ahead by as much as 3.9 percent more TPM.

These differences in overall raw performance between the two servers are, in my view, fairly unremarkable. The Opteron arguably has a slight edge. That makes the differences in power consumption all the more remarkable. Here, the Opteron was the consistent winner. When the machines were equipped with 4GB of main memory, the Xeon machine burned between 13.1 and 14.4 percent more watts per hour than the Opteron. For example, at 500 users, the Xeon server consumed 209.6 watts per hour, whereas the Opteron consumed 180.8.

The margin widened at 16GB of memory: The Xeon consumed between 20.3 and 21.3 percent more watts per hour. For example, at 500 users, the Opteron consumed 188.2 watts per hour, whereas the Xeon consumed 239.

MPG for CPUs

While these differences may be striking, the watts per hour measurement in and of itself isn't all that meaningful. I could tell you that my car consumed two gallons of gas in an hour, to which you might want to know how far I actually drove in that time frame. In other words, how many miles did I get to the gallon? (You might also wonder about other factors that might affect MPG, such as whether I was driving a hybrid or an SUV, what the driving conditions were, and whether I was hauling my dry cleaning or a ton of bricks.)

In his test, Nelson calculated the equivalent to MPG for servers by dividing their throughput by how much power they consumed to yield "transactions processed per watt hour," or TWH; in other words, Nelson calculated their power efficiency. Here, Nelson found that the Opteron-based server delivered between 12.6 and 26.8 percent better power efficiency than the Xeon-based machine. (I should once again note that the Xeon clock speed was also slightly higher than that of the Opteron, which could be a factor here.)

For example, at 500 simulated users and with 4GB of memory, the Opteron machine had a TWH rate of 1,686.2; the Xeon machine's TWH rate was 1,382.9. The difference here was 21.9 percent. At 16GB of memory and 500 simulated users, the Opteron's TWH rate jumped to 1,831.2; the Xeon's was 1,433.7. The difference here was 27.7 percent.

There's one other notable finding in Nelson's test: The Opteron-based server consumed substantially less power than the Xeon-based server when the systems were idle. At 4GB of memory, the Opteron consumed 164 watts per hour; the Xeon consumed 198.8 watts per hour. That's a 17.5 percent difference. At 16GB of memory, the Opteron consumed 169.6 watts per hour at idle whereas the Xeon machine consumed 222.8 watts per hour, a difference of 23.9 percent.

Why's this important? As Nelson puts it, "Most servers spend the vast majority of their time in a powered-up, but idle, state. File servers, Web servers, and e-mail servers are normally left powered on 24/7, even though most offices are closed and empty 75 percent of every week." If a company must waste money powering and cooling servers that aren't in use, better to reduce that expense as much as possible.

[ Learn how companies such as Cisco and Cassatt are cutting their energy bills by powering down servers when they're not in use. ]

So what's the takeaway here? I'm certainly not going to blare out that AMD chips are, hands down, more energy efficient than Intel's -- though I will acknowledge that in Nelson's previous tests (which you can read about here, here, and here), AMD CPUs have had an edge in this department. Will this picture change with the arrival of Intel's Nehalem Xeon? Nelson is eager to find out, just as soon as he can get his hands on Intel's new CPU.

Regardless of which CPU wins the green prize, clearly there can be significant differences among servers, not just in terms of raw performance but also energy consumption and efficiency. For large datacenters, a difference of 50 watts per server per hour will add up to real money. Any organization that is concerned about its utility bills, its power budget, or its carbon footprint would benefit from testing servers before investing heavily in them. Make sure they meet not only your raw performance needs but also your energy efficiency needs. Because the last thing you want is a datacenter full of Hummers when a fleet of hybrids would work just as well -- for far less.