Uncategorized —

Intel takes wraps off of Woodcrest

Earlier this month, Intel held a "reviewer's workshop" event where they invited a number of representatives from hardware review sites to spend a few days benchmarking and learning about their new Core 2 (a.k.a. Merom/Conroe) microarchitecture. The star of the show was Woodcrest, which is the top end of the Core 2 lineup and will be replacing the last Netburst-based Xeon processor (Dempsey) in June. The participating reviewers got to benchmark both Dempsey and Woodcrest, and the results of those benchmark runs are now available.

Among the participating sites that have posted reviews of the Woodcrest and Dempsey pre-release hardware are Tech Report, Realworld Tech, and Hexus. (Regarding the latter, is "botty" some kind of Internet mispelling for "booty"? Seriously, this is bothering me.)

In all, Woodcrest looks like a stellar performer that massively improves on its predecessors in both raw horsepower and power efficiency.

A closer look at the results

David Kanter at RWT will be publishing a series of articles based on his time at the workshop, and in the first installment he focuses on pitting Dempsey against Woodcrest. The results are as expected: Woodcrest beats Dempsey handily by every conceivable metric. I hope David doesn't mind me posting part of his conclusions here, because you really should read the whole thing:

We measured 15-40% speed ups [from Dempsey to Woodcrest] across a wide range of Windows applications, including ray tracing, Monte Carlo risk analysis, XML processing and Java based mid-tier OLTP workloads. While our numbers are lower than Intel's and our Dempsey system runs at a slightly lower clockspeed (3.4GHz versus 3.73GHz), there is still a lot of performance left on the table for Woodcrest. Current compilers, JVMs and applications have not been optimized for Woodcrest, and our system is using slower FBD-533 memory, rather than FBD-667.

David's observation about the amount of headroom left for Woodcrest to improve is important, because I got quizzed in the Ars IRC channel earlier today about why Tech Report's benchmark bakeoff, which pit Woodcrest against Opteron, didn't show Intel's new architecture blowing the doors off of AMD's barn.

As background, TR's review pit a 3.0GHz Woodcrest system against a 2.6GHz Opteron system. Woodcrest's lead in a few of the tests was spectacular, but for the most part it ranged from "solid" to "impressive." The results were perhaps a little less awe-inspiring than some would expect, given Woodcrest's 400MHz clockspeed advantage over Opteron, and the fact that the Woodcrest system sported faster FB-DIMMs (compare the Opteron system's 400MHz DDR memory).

In addition to the mitigating points raised by David in the quote above, there are a few other factors to consider when evaluating TR's Woodcrest vs. Opteron benches.

First, TR's tests are heavily weighted toward floating-point math. Going by the SPEC scores that Intel has released, Woodcrest beats Opteron in floating-point by a respectable margin, but it absolutely slaughters Opteron in integer. Socket AM2, with its DDR2 support, will help a little with Opteron's floating-point numbers. AM2 may help even more with vector code, which tends to stream more, but there's even more ground to catch up there so it won't matter realistically. As for integer code, which tends to have more branches and loads, I wouldn't expect AM2 to add as much to the big picture. (For more on why DDR2's higher peak theoretical bandwidth, which helps with data streaming, doesn't help as much with read latency, see this older article of mine.)

Opteron will also gain further on Woodcrest's floating-point performance when AMD moves to a 65nm process and bumps the clockspeed. The combination of higher clockspeed and on-die DDR2 support may possibly bring Opteron to parity with Woodcrest, although I wouldn't bet on it.

In sum, when the more integer-intensive benchmarks start coming out, you should expect to see Woodcrest's lead over Opteron widen, AM2 or no AM2. Also, Woodcrest has much stronger SSE hardware, so its vector performance should be significantly higher than Opteron for properly optimized SSE code.