Posted
by
Unknown Lamer
on Tuesday November 22, 2011 @08:06AM
from the cool-kids-jump-ship-to-fleet dept.

New submitter RobinEggs writes "Some reviews of Bulldozer's server performance have arrived. Ars Technica has the breakdown, and the results are pretty ugly. Apparently Bulldozer fares just as poorly with servers as with desktops. From the article: 'One reason for the underwhelming performance on the desktop is that the Bulldozer architecture emphasizes multithreaded performance over single-threaded performance. For desktop applications, where single-threaded performance is still king, this is a problem. Server workloads, in contrast, typically have to handle multiple users, network connections, and virtual machines concurrently. This makes them a much better fit for processors that support lots of concurrent threads. ... It looks as though the decisions that hurt Bulldozer on the desktop continue to hurt it in the server room. Although the server benchmarks don't show the same regressions as were found on the desktop, they do little to justify the design of the new architecture.' It's probably much too early to start editorializing about the end of AMD, or even to say with certainty that Bulldozer has failed, but my untrained eye can't yet see any possible silver lining in these new processors."

And yet, 3 supercomputers with those opterons were ordered in the last 4 weeks ? and in a month, one of them - which is being revamped from #3 supercomputer position of the world - will be #1 supercomputer of the world when complete ? Was lockheed martin also morons to choose an opteron based supercomputer ?

Why is an article which is apparently written to bash amd was included in slashdot despite its apparent bias ?

Recall the Itanium from Intel and HP.. It started out with great hype more than ten years ago. When the first benchmarks came no-one wanted to believe them. Still that particular architecture is about to die.

Unfortunately, Bulldozer may end up with a similar fate. The big difference is that Intel had its regular desktop cpu line-up to finance the Itanium disaster. If nothing can be much improved on the AMD cpu side, can the shrinking graphics card business save AMD?

1. Nobody with a sig advertizing knock-off PHP plugins even has the right to use the word "supercomputer" in a sentence.

2. Supercomputers are NOT built based on processor speed. If you took the SPARC CPUs used in the K computer (the worlds fastest and *not* running opterons) and put them into a regular server or desktop, then you'd have a pretty underwhelming computer. Most of the $$$ going into supercomputers goes to the interconnects, not the CPUs. So sure, use the opterons in the supercomputer where AMD sells them at firesale prices and does not make any money. The rest of us will use Xeons and be very happy with the results.

3. You are a well known AMD fanboi and your repetitive posts are becoming less and less amusing.

The bulldozer is faster then the Xeon chip on all cpu benchmarks which can generate enough threads to fill all cores.

Each bulldozer core is as fast as a core on a Opteron 6100.

It looks exactly like the cpu I want in my web/db server, and my supercomputer.

Do the majority of real world uses 'fill all cores'? Are you arguing that the vast majority of these benchmarks are useless? I can't distinguish between which tests use all of the cores and which don't, but it's not my field.

However, the results fall far short of a resounding success for AMD. The results are broadly split between "tied with Opteron 6100" and "33 percent or less faster than Opteron 6100." For a processor with 33 percent more cores, running highly scalable multithreaded workloads, that's a poor show. Best-case, AMD has stood still in terms of per-thread performance. Worst case, the Bulldozer architecture is so much slower than AMD's old design that the new design needs four more threads just to match the old design. AMD compromised single-threaded performance in order to allow Bulldozer to run more threads concurrently, and that trade-off simply hasn't been worth it.

That's the problem. There are several instances in which AMD isn't even beating itself. Almost none of the tests show it working better than the old 6100 Opterons on a per-core basis. And the Xeons the 6200 only sometimes beat are 18 months old; new Xeons ship next quarter. I suppose if I accept your statement about "filling all cores" at face value, given my general ignorance of the server market, then I have to admit that Bulldozer could be superior in situations that filled all of the cores most or all of the time. Is that a significant potential market share? Does it justify an entirely new architecture?

I completely agree. You have to hunt down which link is the correct link to find the specs that they eventually skewed to make an inflammatory point. They are writing articles to fill pages with advertisements based on a headline that is sure to piss off someone.

That's just Simply not true. On the server side, the quad 6100 1U servers are very competitive, supplying as much (sometimes more) power than iuntel boxes for considerably less money. At this point they're a bit of a no-brainer in the server room.

On the desktop, it is different. More of the benchmarks show that the core i5 is faster than the Phenom2 x6 and 8150. But some benchmarks show that the AMD showings can be considerably faster. The choice is really simple. If your workload is dominated by the kind of things that Intel do well, then buy intel, otherwise buy AMD.

I don't go there for the tech articles, but the part on page 2 where they pull AMDs TPC-C numbers apart is pretty damn good.

AMD claims 1.2 million tpmC for a two-socket Opteron 6282 SE system. The company compares this to a score for a two-socket Opteron 6176 SE system (each socket having 12 cores), (...) AMD also claims that this beats "competing solutions" by "as much as" 18 percent. (...) the reference AMD uses is another official result: dual Xeon X5690s (6 core, 12 thread, 3.46 GHz) with 384GB RAM. (...) looking just at the servers and their storage, and assuming similar discounts, we get prices of around $260,000 for the Opteron 6100 system, $879,000 for the Opteron 6200 system, and $511,000 for the Xeon system.

Basically their figures are doped with a massive SSD storage solution to make a slow CPU look good. And they show that if you wanted to spend $879,000 on a system, there's much faster Intel solutions (even though the CPUs cost more). So they're doing pretty good on the economics end at least.

Do the majority of real world uses 'fill all cores'? Are you arguing that the vast majority of these benchmarks are useless? I can't distinguish between which tests use all of the cores and which don't, but it's not my field.

Obviously. The high performance server market these days doesn't really include web and mail servers. Most are being deployed for one of 2 purposes: (1) Large database servers, and (2) Virtual Server hosts. Both of those utilization of servers will take advantage of this architecture, unlike the contrived "benchmarks" used to test these chips.

I haven't deployed a single server NOT used in a virtual environment in over 2 years. We are even deploying database servers as virtual these days, because the backup and fault-tolerant features are so good. These new Bulldozers look like they'll be on the list for the next set of hardware I need.

Though I'm suspicious that Bulldozer is going down remarkably like NetBurst (NetBurst made design compromises for marketable massive clock gains, Bulldozer similarly makes compromises to boost the now-marketable core count) and time may prove that wrong, but this article was crap.

It looked like they cherry picked some benchmarks from the world at large with no control. As pointed out in the article, the tpmC benchmark had massive storage differences and the cost delta means there were probably node count differences. There are so many things in play that it is impossible to derive any sort of statement specifically about the processors. The article, however uses that as a point to show AMD is more expensive to make AMD look bad but in the same breath says better SSDs probably drove the benefit to steal AMD's thunder. He can't have it both ways. I'm inclined to believe the storage architecture was the key in terms of cost and performance given the nature of the test.

Later, the article says AMD should have just done 16-core Magny-Cours. Clearly AMD should hire him as he is a genius who *must* have considered all the complexities and figured out a way to achieve that core density when no one else in the industry has. No one pretends for a second that a bulldozer module matches 2 'real' cores, but they can't just wave their wand and make a 16-core package of the old architecture. Bulldozer is all about trying to ascertain the 'important' bits of a core and share other bits in the hopes the added resource gives most of the benefit of an additional core without the downsides that make it impossible to do that many cores on a socket.

It doesn't seem to be any more power-efficient than AMD's last generation, despite being built on a smaller process node (32nm vs 45nm).

At what point does AMD simply admit Bulldozer is a failure, pull the plug, and write off the sunk costs? Putting good money after bad is a classic business mistake that has killed many companies.

AMD should continue improving their existing cores on the 32nm process (they already have some of the work done with Llano) and forget about their "revolutionary new" architecture which is basically this decade's Prescott.

Or, heck, see if it's possible to scale up the Bobcat cores for mainstream desktop use. Don't forget, Intel's very successful Core 2 Duo came from a previous design (Pentium M) that had been reserved to laptops. AMD will probably have more luck increasing performance (both raw clock and IPC) on Bobcat than trying to tame the heat, insane transistor count, and long pipeline of Bulldozer.

So if performance/watt is your first priority, we think the current Xeons are your best option.

Who, other than NASA (et al) has performance/watt a high priority, much less their first priority? And for NASA, the priority for performance/watt's only for space bound vehicles. I doubt they're sending supercomputers into space.

Bogus.

Anyone who pays for their own power?

Power is a significant portion of the operating cost of a server - a server that's 25% more energy efficient with the same performance is a sizable savings. You don't just pay for watts to the server, every watt that goes into the server has to be taken away by cooling, and has to be supplied by an expensive redundant power infrastructure.

Don't forget, Intel's very successful Core 2 Duo came from a previous design (Pentium M) that had been reserved to laptops

That was a bit of a special case. It's not a testament of how fundamentally awesome low power processors are, and more of a illustration of *just* how bad NetBurst was. The Pentium M skipped NetBurst entirely because they *couldn't* make it work acceptably in a mobile device.

*Usually* the low power parts optimize for overall wattage and *not* performance per watt. If they can get 25% more performance but at 10% more power, a desktop context may elect to do it and a mobile may elect not to.

Hang on, "typical focus on embarrassingly parallel problems"? That's just plainly not true. Pick a classical problem for HPC, weather forecasting. You break up the atmosphere into a bunch of cubes and distribute those cubes in a sensible way between your nodes. You model the flows between the cubes on a local machine and pass the edge information to neighbouring nodes. If it's embarrassingly parallel then you wouldn't be passing edge information, but that would mean weather wouldn't move from one area to another...

CFD for modelling heat or air flow, or pathogen propagation. Modelling population trends with microsimulation, or even parallel simulation of software systems. None of that is embarrassingly parallel. You wouldn't spend all your money on low latency high bandwidth interconnects if all the nodes spent their days playing with themselves.

Something like raytracing *can* be embarrassingly parallel, but I'd say most that runs on HPC isn't.