WinRAR and Fritz Chess are probably less important
to most people than the other benchmarks we ran. However, the reason why we include
them in this article is that their profile is so different from the other applications.
In this way, we get more insight into the different new architectures.

Fritz Chess

Profile

Total

Average IPC (on AMD 2350)

0.99

Instruction mix

Floating
Point

4%

SSE

0%

Branches

17%

Performance indicators on Opteron
2350

Branch
misprediction

12%

L1 datacache ratio

0.65

L1 Instruction
ratio

0.37

L1 datacache
miss

1%

L1 Instruction cache
miss

1%

L2 cache
miss

0%

We have said this before but it warrants
repeating: you'll find a decent amount of complex branches in a chess program.
17% branches is not that extraordinary, but the fact that 12% of those branches
are mispredicted is. If we compare a 2GHz Opteron 22xx with an Opteron 23xx, we
should see if the improvements in branch prediction pay off.

The Opteron 2350 is about 3% faster than the Opteron
22xx, core for core, clock for clock. We believe we can assume that the branch prediction
improvements are minor, as the Fritz chess benchmark runs in the L1 and L2
cache.

HPC

Several of the HPC benchmarks are too
expensive for us to test, but we can get some information from AMD's and Intel's
own benchmarking. According to Intel, the new Intel Xeon 5472 (1.89 score) is about
26% faster than the Xeon 5365 (1.5 score) when running the fluent benchmark. According to AMD, the
Opteron 2350 is about 10% to 60% faster than a 2.33GHz Xeon E5345. That doesn't
give us much comparison data, but at first sight it seems that AMD will be competitive
in Fluent even at lower clock speeds (2.5GHz versus 3GHz).

Intel's own marketing material seems to admit
that a Xeon E5472 with 800MHz memory is just as a fast as AMD's quad-core at
2GHz. AMD's 2.5GHz model will surely take the lead in LS-DYNA. Looking at the Fluent
and LS-DYNA benchmarks it appears that AMD will remain very competitive in the HPC
market.

One benchmark where Intel's newest chip really
shines is the Black-Scholes algorithm: as most of the calculations involve divisions,
the new Xeon 54xx chips are about 50% faster than their older Xeon 53xx siblings,
clock for clock. Unfortunately, our compilation of Black-Scholes failed on the quad-core
AMD, so we have to postpone those results for now.

Post Your Comment

43 Comments

ok .. getting tired of this! Intel loving Anandtech employs very unfair & unreasonable tactics to show AMD processors in bad light every single time. And most readers have no clue about the jargon Anandtech uses every time.

1 - HPL needs to be compiled with appropriate flags to optimize code for the processor. Anandtech always uses the code that is optimized for Intel processors to measure performance on AMD processors. As much as AMD and Intel are binary compatible, when measuring performance even a college grad who studies HPC knows the code has to be recompiled with the appropriate flags

3- "The Math Kernel Libraries are so well optimized that the effect of memory speed is minimized." - So ... MKL use is justified because Intel processors need optimized libraries for good performance. However, they dont want to use ACML for AMD processors. Instead they want to use MKL optimized for Intel on AMD processors. Whats more ... Intel codes optimize only for Intel processors and disable everything for every other processors. They have corrected it now but who knows!! read here http://techreport.com/discussions.x/8547">http://techreport.com/discussions.x/8547

I am not saying anything bad about either processor but an independent site that claims to be fair and objective in bringing facts to the readers is anything but fair and just!!! what a load! Reply

I think a lot of us are intrigued by AMD's memory architecture, its ability to support NUMA, etc. A lot of benchmarch test how fast a small application runs with a high cash-hit rate, and that's not necessarily interesting to everyone.

The MySQL test is the right direction, but I'd rather see numbers for a more sophisticated application that utilizes multiple cores -- Oracle or MS SQL Server, for example. These are products designed to run on big iron like Unisys multi-proc servers, so what happens when they are running on these more economical Harpertown or Barcelona.
Reply

On the steppimgs note you made, it's not the B2 stepping that is supposed to perform better, it's the BA stepping...
The BA stepping was the improved form for B1s, and the B3 stepping is the improved form of the B2. BA and B2 came out at the same time in Sept (though BA was the one launched, B1 was what was reviewed), B2 for Phenom and performance clockspeeds, BA for standard and low power chips.
Do you happen to have a BA chip to test (those are the production chips)? Reply

Despite K10's rather extensive architectural improvements, it looks likes its core performance isn't too different to K8. In fact, the gains we've seen so far could easily be attributable to the improved memory controller and increased cache bandwidth. It seems that introducing load reordering, a dedicated stack, improved branch prediction, 32B instruction fetch, and improved prefetching has had little impact, certainly far less than expected. The question is, why? Reply

Well, we are still seeing 5-10% better integer performance on applications that are runing in the L2, so it is more than just a K8 with a better IMC. But you are right, I expected more too.

However, the MySQL benchmark deserves more attention. In this case the Barcelona core is considerably faster than the previous generation (+ 25%). This might be a case where 32 bit fetch and load reordering are helping big time. But unfortunately our Codeanalyst failed to give all the numbers we needed Reply

At any rate, it was the most in-depth review I've seen, especially with the code analysis. I too, thought it would be higher, but remember that Barcelona is NOT HT3 and doesn't have the advantage of "gangning\unganging." There was an interesting article recently that showed perf CAN be improved by unganging (maybe it was ganging, can't find it) the HT3 links.

I really hate that OEMs decided to stand up to the big, bad AMD and DEMAND that Barcelona NOT have HT3 with ALL OF ITS BENEFITS.

I mean people complain that Barcelona uses more power, but HT3 would cut that somewhat. At least in idle mode, and even in cases where IMC is used more than the CPU or vice versa.

I also may as well use this to CONDEMN all of these "analysts" who insist on crapping on the underdog that keeps prices reasonable and technology advancing.

INSERT SEVERAL EXPLETIVES. REPEATEDLY. FOR A FEW DAYS. A WEEK. FOR A YEAR.

Conjecture regarding why AMD went quad core on the same die... and this has nothing to do with performance. I think one place where Intel is way ahead of AMD is package technology. Remember they were doing a type of Multichip module with the P6. Having 2 dice instead of a single die allows them to have an overall lower defect rate, higher yield, and higher GHz. This is vs. AMD's lower GHz but (it was hoped) greater data efficiency using an L3 die and lower latency of on-die communications amongst cores vs. Intel's solution of die to die communication.