Nice article. I'd also be interested in PostgreSQL, being the "other" major open source database... specifically, whether it's any better at scaling with multiple CPUs. (Not that I have any practical use for this information, I'm just curious.)Reply

Seriously, mickyb and elmo may be correct about the Intel compilers (I frankly don't have a clue what's used in most shops)...

The real problem is that it's a virtual impossibilty to create a "level playing field", but I have to say to the critiques of the article that Johan has done a stellar job of coming as close as possible!Reply

mickyb - Thanks for the input! Fair enough...maybe Johan could use both the Pathscale compiler (which is optimized highly for Opteron) and the Intel optimized compiler on his next series of tests?Reply

I dissagree with the comment that a large number of people don't use the Intel compiler. I (other developers and IT shops) only use Intel compiler's for Linux. It is the fastest one out there for x86 and Itanium.

If you are running a large database that requires a large server (compared with a desktop loaded with RAM to run a personal blog site) like this article is testing, you will be setting up the environment with a trained IT professional that will use the compiler that is fast and stable.

When we build our product for all the UNIX platforms, we always use the vendor compiler instead of gnu. gnu works great and is free, but it is not optimized nearly as much.

This is like saying the same audience won't recompile Linux on the platform they are going to install it on. This is the first thing you should do....and with an Intel compiler. There should be no real reason why one vendor Linux is faster than the others except for compile options and loaded modules. You cannot run Linux out of the box, it doesn't come in a box where I get it. :)Reply

C'mon mate...anybody who has read your posts knows you're heavily biased towards Intel, just as people who have read mine know that I am biased towards AMD. The important thing is to try and set aside the bias to look at things from both sides...I do try, but admittedly don't ALWAYS succeed. :-)

I imagine you probably posted before you read the explanation of what a query cache is...understandable.

As to not using an Intel specific compiler, I suppose that if it HAD been used I would be complaining as well. We have to rely on Johan and Anand (who frankly know a Hell of a lot more about this than either of us) to choose based on what the market actually uses...if you can site impartial industry sources that show otherwise, I'm sure we would all (especially the AT staff) would love to see them.
I do know that over the years, Johan and Anand have shown themselves to be quite unbiased in their articles (you should go read some of them on Aces as well)!

There are certainly things that I could pick apart as well..e.g. when he states
"In the second half of 2004, already one million EM64T Xeons were shipped"
Yes they were shipped, but that doesn't mean they were sold. The majority of those shipments were probably to OEMs for inventory buildup. Remember that Intel had a huge inventory write-off at the same time, and this was most likely a shift in inventory.

Regardless, none of this has to do with the validity of the article which is excellent and makes sense. If you think about it, it should have been expected...the only for AMD to have increased their marketshare in servers is by performance. They certainly don't have the budget or marketing clout that Intel has!Reply

About ISAM and DB/2... ISAM (Indexed Sequential Access Method) is NOT a database! It has no referential integrity nor rollback/commit features (although those can be activated on mainframe). ISAM was popular on mainframe when there wasn't any database (or rather when database was a too massive application to run!) and even there they were superseeded by VSAM. They're not much different from DOS random access files (an index file pointing to the relative record number on the main file).
And it's no suprise that DB/2 scales well: mainframes rarely feature a single CPU, at least as far as I know.... IBM have had some 20 years to practice on multi-cpu machines!
Reply

Mino, thanks for pointing that out. Query cache enabling has nothing to do with "stressful". It has to do with accelarting a few queries that are run over and over again. Which is very interesting for reducing the response time of a website serving up the last article, but which is not limited by CPU power at all.

To the people who make a fuss about disabling the query cache: this has nothing to with the Opteron not performing well in that situation. Single Xeon: 980 queries/s. Dual xeon: 985 queries/s Opteron 250: 1020 queries/s . Get it now why I say "other bottlenecks started to kick in"?

It impossible that a dual xeon can't outperform a single one in these tests. We tried to find the bottleneck and even used a quad opteron 850 as client. The client was not the problem. My bet is on the network latency, but I have no knowledge of tools to profile the complete machine. The disk was not the problem, we tested that. Network bandwidth neither. My bet is on the network latency, or even the OS as the bottleneck kicked in a lot sooner w kernel 2.4
Reply

#32 try to think for a moment
"Because the Opteron can't perform that well in stressful situations you won't post the scores?"

If the CPU is not the bottleneck in the query cache scenario then why test the effect of CPU at all !!!

You reminded me friend of mine who "tested" effect the "FSB" has on A64 system NOT having an FSB at all !!! ;-)
Funny guy indeed.

And about an intel compiler not beeing used.
Like it or not, It IS a fact that it is not widely adopted especially among the target audience of this site an article.

BTW given the past experience intel compiler would produce better code even on AMD systems so don't be so sure! Best code for K7 is made by intelcc set to PIII config. Albeit it does not use 3DNow! functionality at all.Reply

I think I have to agree with #20, as much as I am un-biased I feel this test was doctored by AMD... it ressembles the tests we see released by Apple often...

"We didn't use the Intel compiler version as we have reason to believe that this version is not used a lot in the real world. We might try it out in a future article."

Translation, "with the intel compiler AMD lost so being a marketing force for AMD we opted not to post those scores".

and also as was mentioned before...
""The " query cache" was off, as we wanted to test worst case performance. In some cases, the query cache was able to push a single Xeon to 1000 queries per second, and the CPU was still capable of doing more, as the CPU load was at 50% - 70%."

Why not?
Because the Opteron can't perform that well in stressful situations you won't post the scores?

Seriously.. this test is the biggest load of BS I have ever read... and I'm a current AMD adopter.Reply

Viditor, it is possible that the IOMMU might have to do something with it.

The IOMMU is a memory mapping unit sitting between the I/O bus and physical memory.

Memory mapping is AFAIK only necessary if a certain device (PCI devices come to mind) can not do a 64 bit DMA. Now it seems that almost everything inside the newest Intel southbridges can do 64 bit DMA.

So the IOMMU can only play a role when the driver is a 32 bit only, and the memory mapping has to happen. Now I would think that Intel would have an advantage here with their ultra modern southbridges. There might be a device that I am overlooking of course. Maybe our SCSI controller... But I don't think so. Reply

Johan, if you're still reading (great article BTW)...
A question I have had for quite awhile now is what effect the IOMMU has on these tests.
The reasons I'm asking are
1. I noticed that there was quite a disparity between the AMD and Intel 64bit performance (which you mentioned).
2. I know that one difference between the 2 platforms is that AMD has a hardware IOMMU (of sorts) and Intel (at present) does not.
3. I saw a thread last year with Linus T mentioning this quite a bit. He seemed to think that this would impair the EM64T substantially...

Viditor: thanks for the helpful comment. Indeed, if you turn on the query cache, your CPU is doing very little.
Everybody else: note the "identical" word in viditor's quote. If your database is running many identical databases, than you are not going to spend time reading this kind of article: you simply buy the cheapest decent server. Any CPU today can run 1000s of querries if everything comes out the query cache.

Running benchmarks with the query cache on is simply not interesting. The query cache is all about accelerating the IDENTICAL queries that are run from time to time. You might reserve a bit of RAM to make sure that the most common queries (getting the latest article of a website for example) are run faster.

But those numbers don't tell you anything about the load that your server is going to be able to take. You want worst case performance numbers! Reply

Questar - the reason the query cache was turned off (guessing here) is to more reasonably simulate a real-world test. Obviously in this test, the same queries are repeated quite often. But that is not usually the case in the real world...
For those who don't know what the heck a "query cache" is:

"the query cache stores the text of a SELECT query together with the corresponding result that was sent to the client. If the identical query is received later, the server retrieves the results from the query cache rather than parsing and executing the query again"Reply

Why are there no graphs like other Anandtech articles? Why is everything in hard to read tables with broken formatting? This one seems a bit rough around the edges compared to the usual Anandtech quality.Reply

I find it quite odd that you claim to be testing with a 2.6.12 Linux kernel despite the fact that that kernel has not yet been released in a final version.
If you are using one of the pre-release kernels you should explicitly say so, and tell us which one.
The latest stable kernel at the time I write this is 2.6.11.12, the latest development kernels are 2.6.12-rc6, 2.6.12-rc6-git8 & 2.6.12-rc6-mm1 . There's also the question of wether or not you used a stock kernel.org kernel or a "patched to hell-and-back with crap" gentoo kernel...

Correct me if I'm wrong, but would not the query cache positively affect the scores of both vendor's chips?

I suppose I don't have a pair of database machines just sitting around to test it out, but I'd imagine that if query cache was enabled the Opteron would experience similar performance boosts to the Xeon- if not more of a boost thanks to the higher-performing memory subsystem.

#20 Ah yes the conspiracy theories begin. Just like AMD with Tomshardware. The server results here appear pretty consistent with every other server test I have seen on review sites but who knows.
# 19. Intel is only at 90nm but do have 300mm wafers. That is why Fab36 is so important for AMD. 300mm wafers and 65nm by Q2 of 2006 should put them pretty equal with Intel's fabrication level. Production level is still way, way in favor of Intel though.

Pricing, as I said before the Opteron dualcore chips are way cheaper than Intel dualcore server chips because Intel doesn't have any.Reply

"The " query cache" was off, as we wanted to test worst case performance. In some cases, the query cache was able to push a single Xeon to 1000 queries per second, and the CPU was still capable of doing more, as the CPU load was at 50% - 70%. "

Translation: We didn't want our beloved AMD to lose, so we doctored the test.Reply

BTW guys, one reason why AMD may be pricing its chips mihc higher is the MFG process. Unless I am mistaken (and someone correct me if I am wrong), they are still using 200mm wafers on a 90 or 110 process. Intel is using 300mm at 65 nm...this results in a huge difference in throuhput. Since AMD is already pricing its CPUs very agressively to gain market share, and the die of those dual-cores is much bigger (anybody know the real %?) then it is to be anticipated that their dual-cores are much more exspensive. They are probably gambling on selling dual-core Opterons at high-margins via Sun and other OEMs first,which will probably take most of their wafers. This is why their Desktop parts are coming later I would bet...

#17 Good answers but, "Depends on the applications you run. On single threaded code, the faster single core will run your code faster."

Doesn’t explain what I was meaning, so for you is OK to pay more for a single core processor, because runs faster some times (may run slower other times), how will you know what it will happen? (Supposing that you don’t know shit about your software requirements).

. AMD don’t have this problem so why would amd for example release an 4200+ processor at the same price of the 3500+ ? If the performance is equal or superior?
I think amd have made they right decisions, like intel have made his.
They all play with what they have, and not with they haven’t.
Reply

"Explain me something:
- how do you explain or how Intel will explain that their single core processor cost more than the dual core ones? "

Because Intel sets the prices of their chips. They want to push dual-core to the masses so they price them accordingly.

"- Why should you buy a single core over a dual core if it cost more? "

Depends on the applications you run. On single threaded code, the faster single core will run your code faster.

"- How good is this Intel market decision (marketing). "

Probably pretty good. Considering you have to buy a new motherboard to use the dual-core Intel parts, they dropped the price so that the CPU + motherboard cost is about the same (or less than still) the cost of just the dual-core CPU from AMD. Sounds like a good strategy to me.
Reply

I also think it is safe to say that Amd's dualcore Opterons will be cheaper than any Intel dualcore server chip for the next six to eight months since there aren't any Intel dualcore server chips. IDC just released market research that showed AMD with 30% of the 4way server sales in Q1 '05. That is what AMD is after. The 64bit performance difference is surprising to say the least.Reply

Interesting benchmark though i would of prefered you used a postgresql instead of DB2 since it's also open source and is the most likely alternnative to mysql... a DB2/Oracle bench would be kinda cool... Reply

#6 "but very important: AMD, OPEN YOUR EYES AND SEE WHAT INTEL IS DOING WITH PRICING ON DUAL CORES! don't get cought with pants down."

Explain me something:

- how do you explain or how Intel will explain that their single core processor cost more than the dual core ones?
- Why should you buy a single core over a dual core if it cost more?
- How good is this Intel market decision (marketing).

I think is pretty logical to me that a dual core cost more than a single core processor so 4800+ more expansive than a 4000+ so it's OK! Stop blaming AMD, and their marketing team!Reply

Calin - "Unfortunately, it seems that their dual core processors will be more expensive than Intel's"

Actually, that's not true at all!
The fact is that AMD haven't released a "Value Line" of dual core yet because they don't see a large enough market for it.
By comparing the 2 companies offerings, it's apparent that the 820D should match up equvalently to a dual core Sempron when it's released, and the ($1000) 840EE matches up to the ($500) 4200+ rather nicely.
The 4400+ and the 4800+ are in a class by themselves without competition at the moment, hence the prices are high.
Reply

I would guess that the problem is the netburst architecture's fast integer units. As I recall, the P4 integer units are split into two 16-bit stages and run at double the main clock using complicated differential signaling logic. I seriously doubt it would be feasible for Intel to add another two stages or to double the width of the units, as they're already power and chip area hungry and pretty much integral to the design. AMD on the other hand designed the Opteron to be 64-bit from the ground up and is running at a lower clock speed, making 64-bit wide single stage logic much easier.

Thus the authors speculation that the P4 is taking twice as many cycles to process 64-bit simple integer operations (while the Opteron needs no additional cycles) seems highly likely to me. I'm one of those (apparently) rare programmers that needs to use 64-bit integers a lot, so it's not surprising that all our compute servers are Opteron powered.Reply

#3: The thing is, that the effectiveness of optimization flags is dependent on the application being used (specifically, what the application is doing and how it is designed). Activating the wrong optimization can have adverse effects on performance.

I would say that -march=xxx is always helpful, -O and -Os are always helpful, -O2 is almost always helpful, -ffast-math is usually helpful, and you should hold your breath on most anything else. You can also try Acovea (http://www.coyotegulch.com/products/acovea/index.h... which applies a genetic algorithm to compiler flags. Just don't expect to come out ahead, given the number of compiles you have to perform for such a small amount of performance.Reply