Well done on an excellent review using as many real-world tests as possible. The VMWare test is a real eye opener and shows how the 55xx can match double the number of CPUs from the last generation of Xeons *AND* crucially save $$$$ on licensing from Windows and MS SQL and other per-socket licensed software, plus the power saving which is again a financial saving if you hire rack space in a datacentre.

I eagerly await your own in-house VM tests. Please consider also testing using Windows 2008 Hyper-V which I think doesn't have the 55xx optimisations that the latest release of VMWare has (and might not have until R2?).

Thanks for the time you put in to running the endless tests. The results make a brilliant business case for anyone wanting to upgrade their servers. You must have had the chips a good week before Intel officially launched them. :-) I do feel sorry for AMD though. I'm sure they have plenty of motivation to come back with a vengeance like they did a few years ago. Reply

You mention octal servers from Sun and HP for VM's, but does anybody really use these systems for VM's? I can't imagine why anybody would, since you are paying a serious premium for 8 sockets vs. 2 x 4 socket servers, or even 4 x 2 socket servers. Then the redundancy options are much lower when running only a few 8 socket servers vs many 2 or 4 socket servers when utilizing v-motion, and the expansion options are obviously far less w/ NIC's and HBA's. From what I've seen, most 8 socket systems are for DB's. Reply

What i mentioned after reading the review is there are very few benches on benchmarks a little bit favored by AMD.

For example, only 1 3DSmax test (so unusefull) at least 2 are needed
Only 1 virtualization benchmark, which is really a shame....
Virtualization is becoming so important and you guys only throw in one test?

Besides that, the review feels a bit biased towards intel, but i will check some other reviews of the xeon 5570 Reply

So lets see all the prebuilds of esx3.5 update 4 get a real high score of 16 tiles almost as much as a 4s shanghai while Vmware performance team themselves stated that we should never see the HT core as a real cpu in Vmware (even with the new code for HT) while yet the benchmark shows a high performance increase, no not like anandtech is stating that this is due to the more available memory and its bandwith, those Vmmarks are not memory starving. Now look at the official Intel benchmark with ESX update 4, it provides 10 tiles and a healthy increase, that from a technical point of view seems much more realistic. All other marketing stuff like switching time etc, all nice, but then again is within the same line of current shanghai. Reply

What kind of tests are you looking for? The techreport guys have a lot of HPC tests, we are focusing on the business apps.

"very few benches on benchmarks a little bit favored by AMD."

That is a really weird statement. First of all, what is a test favored by AMD?

Secondly, this new kind of testing with OLTP/OLAP testing was introduced in the Shanghai review. And it really showed IMHO that there was a completely wrong perception about harpertown vs Shanghai. Because Shanghai won in the tests that mattered the most to the market. While many tests (inclusive those of Intels) were emphasizing purely CPU intensive stuff like Blackscholes, rendering and HPC tests. But that is a very small percentage of the market, and that created the impression that Intel was on average faster, but that was absolutely not the case.

"Only 1 virtualization benchmark, which is really a shame..."

Repeat that again in a few weeks :-). We have just succesfully concluded our testing on Nehalem.

Personally I am a bit shocked about the "not enough tests" :-). Because any professional knows how hard these OLTP/OLAP tests are to set up and how much time they take. But they might not appeal to the enthousiast, I am not sure.

I didn't mean to offend you, because i can imagine how much time it takes to test hardware properly. And i personally think that OLTP/OLAP testing is very innovative and needed. Because otherwise people would have no idea what to buy for servers. You cannot let you server purchase be influenced with meaningless (for servers) simple benchmarks like 3D 2006/Vantage/FPS test etc.
You guys always are doing a great a job at testing any piece of hardware, but it is just feeling to much biased towards Intel. For example, at the last page of this review you get a link to Intel resource Center (in the same place as the next button). If you have things like that, you are not (trying to be) objective IMO. Reply

"the last page of this review you get a link to Intel resource Center"

I can't say I am happy with that link as it creates the wrong impression. But the deal is: editors don't involve in ad management, ad sales people don't get involved when it comes to content.

So all I can say is to judge our content, not our ads. And like I said, it didn't stop us from claiming that Shanghai was by far the best server CPU a few months ago. And that conclusion was not on many sites. Reply

But ad sales people should know this creates the wrong impression. A review site (for me at least) is all about objectivity and credibility. When you place a link to Intel's Resource Center at the end of every review, it feels weird. People on forums already call Anandtech, Inteltech. And i don't think this is what you guys want.

I always liked Anandtech since when I was a kid, and I still do. You guys always have one of the most in-depth reviews (especially on the very technical side) and I like that. But you guys are gaining some very negative publicity on the net. Reply

AMDZone is the biggest joke on the internet. I just went there to see how the zealots like abinstein are still doing their damage control; just like before he went on rambling how the Penryn is still weak against Shanghai, and the old and tired excuses like how if people all bought AMD they can drop in upgrades etc etc. ZootyGray...he's the biggest joke on AMDZone. None of them had the mental capacity to accept AMD has been DEFEATED, which is disappointing but funny to say the least Reply

It's not just AMDZone, you are just the opposite. Its like in Woodcrest and conroe times, it's not because the high-end cpu is the best of all that the rest of the available cpu's in the line is by default better. It's all about price performance ratio. Like many who were buying the low-end and think they had bought the better system, well wrong bet.

As mentioned before, why not test the mid range that is where the sales will be. Time to test 5520-5530 against 2380-82 after all those have the same price. Reply

Your argument is valid, however, it just so happens that for low end 1S systems the Penryns are doing just fine against the Shanghais, for higher end 2S systems they used to be limited by memory bandwidth and AMD pulls ahead. No more is this the case, Intel now beats AMD in their own territory. Reply

There's more to HPC applications than you indicate: environmental modeling apps, particularly, tend to be dominated by memory access patterns rather than by I/O or pure computation. Give me a ring if you'd like some help with that -- I'm local for you, in fact... Reply

Thanks for the extremely informative and interesting review Johan. I am definitely looking forward to more server reviews; are the 4-way CPUs out later this year? That will be interesting as well. Reply

Forgot to mention that I was suprised HT has such an impact that it did in some of the benches. It made some huge differences in certain applications, and slightly hindered it in others. Overall, I can see why Intel wanted to bring back SMT for the Nehalem architecture. Reply

awesome performance, but would like to see how the intel 5510-20-30 fare against the amd 2378-80-82 after all that is the same price range.

It was the same with woodcrest and conroe launch, everybody saw huge performance lead but then only bought the very slow versions.... then the question is what is still the best value performance/price/power.

Istanbul better come faster for amd, how it looks now with decent 45nm power consumption it will be able to bring some battle to high-end 55xx versions. Reply

Very informative article... I would also be interested in seeing how any of the midrange 5520/30 Xeons compare to the 2382/84 Opterons. Especially now that some vendors are giving discounts on the AMD-based servers, the premium for a server with X5550/60/70s is even bigger. It would be interesting to see how the performance scales for the Nehalem Xeons, and how it compares to Shanghai Opterons in the same price range. We're looking to acquire some new servers and we can afford 2P systems with 2384s, but on the Intel side we can only go as far as E5530s. Unfortunately there's no performance data for Xeons in the midrange anywhere online so we can make a comparison. Reply

I only skimmed the graphs, but how about some consistency ? some of the graphs feature only dual core opterons, some have a mix of dual and quad core ... pricing chart also features only dual core opterons ...

Part of the problem with the 54xx CPUs is not the CPUs themselves, but the FB-DIMMS. Part of the big improvement for the Nehalem in the server world is because Intel sodomized their 54xx platform, for reasons that escape most people, with the FB-DIMMs. But, it's really not mentioned except with regards to power. If the IMC (which is not an AMD innovation by the way, it's been done many times before they did it, even on the x86 by NexGen, a company they later bought) is so important, then surely the FB-DIMMs are. They both are related to the same issue - memory latency.

It's not really important though, since that's what you'd get if you bought the Intel 54xx; it's more of an academic complaint. But, I'd like to see the Nehalem tested with dual channel memory, which is a real issue. The reason being, it has lower latency while only using two channels, and for some benchmarks, certainly not all or even the majority, you might see better performance by using two (or maybe it never happens). If you're running a specific application that runs better using dual channel, it would be good to know.

Overall, though, a very good article. The first thing I mention is a nitpick, the second may not even matter if three channel performance is always better. Reply

I was wondering if you got any feeling whether Hyperthreading scaled better on Nehalem than Netburst? And if so, do you think this is due to improvements made to HT itself in Nehalem, just do to Nehalem 4+1 instruction decoders and more execution units or because software is better optimized for multithreading/hyperthreading now? Maybe I'm thinking mostly desktop, but HT had kind of a hit or miss reputation in Netburst, and it'd be interesting to see if it just came before it's time. Reply

Well, for one, the Nehalem is wider than the Pentium 4, so that's a big issue there. On the negative side (with respect to HT increase, but really a positive) you have better scheduling with Nehalem, in particular, memory disambiguation. The weaker the scheduler, the better the performance increase from HT, in general.

I'd say it's both. Clearly, the width of Nehalem would help a lot more than the minor tweaks. Also, you have better memory bandwidth, and in particular, a large L1 cache. I have to believe it was fairly difficult for the Pentium 4 to keep feeding two threads with such a small L1 cache, and then you have the additional L2 latency vis-a-vis the Nehalem.

So, clearly the Nehalem is much better designed for it, and I think it's equally clear software has adjusted to the reality of more computers having multiple processors.

On top of this, these are server applications they are running, not mainstream desktop apps, which might show a different profile with regards to Hyper-threading improvements.

The L1-cache and the way that the Pentium 4 decoded was an important (maybe even the most important) factor in the mediocre SMT performance. Whenever the trace cache missed (and it was quite small, something of the equivalent of 16 KB), the Pentium 4 had only one real decoder. This means that you have to feed two threads with one decoder. In other words, whenever you get a miss in the trace cache, HT did more bad than good in the Pentium 4. That is clearly is not the case in Nehalem with excellent decoding capabilities and larger L1.

And I fully agree with your comments, although I don't think mem disambiguation has a huge impact on the "usefullness" of SMT. After all, there are lots of reasons why the ample execution resources are not fully used: branches, L2-cache misses etc. Reply

Not only that, Pentium 4 had the Replay feature to try to make up for having such a long pipeline stage architecture. When Replay went wrong, it would use resources that would be hindering the 2nd thread.

Wow...that's just ridiculous how much improvement was made, gg Intel. Can't wait to see how the 8-core EX's do, if this launch is any indication that will change the server landscape overnight.

However, one thing I would like to see compared, or slightly modified, is the power consumption figures. Instead of an average amount of power used at idle or load, how about a total consumption figure over the length of a fixed benchmark (ie- how much power was used while running SPECint). I think that would be a good metric to illustrate very plainly how much power is saved from the greater performance with a given load. I saw the chart in the power/performance improvement on the Bottom Line page but it's not quite as digestible as or as easy to compare as a straight kW per benchmark figure would be. Perhaps give it the same time range as the slowest competing part completes the benchmark in. This would give you the ability to make a conclusion like "In the same amount of time the Opteron 8384 used to complete this benchmark, the 5570 used x watts less, and spent x seconds in idle". Since servers are rarely at 100% load at all times it would be nice to see how much faster it is and how much power it is using once it does get something to chew on.

Anyway, as usual that was an extremely well done write up, covered mostly everything I wanted to see.
Reply

I am trying to hard, but I do not see the difference with our power numbers. This is the average power consumption of one CPU during 10 minutes of DVD-store OLTP activity. As readers have the performance numbers, you can perfectly calculate performance/watt or per KWh. Per server would be even better (instead of per CPU) but our servers were too different.

Is it me or is page 2 of this article missing some information? The title of that 2nd page is "What Intel and AMD are Offering," but in the body of the text there are only descriptions of Intel's Xeon chips? Perhaps a new title to reflect the body, or add AMD info? Reply

Very nice to see a comparison over some generations of Xeon platform, including the new one (yet to be released).

I would like to see a new article with Core i7 vs Xeon 5500... to check out if my Core i7 @ 3,7GHz is good enough in Maya 2009 (Windows XP 64bit, 12GB DDR3), or if a Xeon 5500 (each at 2,4GHz, for instance) in dual processor configuration will be a much better buy. Reply