Posted
by
CmdrTaco
on Monday September 10, 2007 @08:11AM
from the well-isn't-that-special dept.

Justin Oblehelm writes "AMD has finally unveiled its first set of quad-core processors, three months after its original launch date due to its "complicated" design. Barcelona comes in three categories: high-performance, standard-performance and energy-efficient server models, but only the standard (up to 2.0 GHz) and energy-efficient (up to 1.9 GHz) categories will be available at launch. The high-performance Opterons, together with higher frequencies of the standard and energy-efficient chips, are expected in the out in the fourth quarter of this year.
But it's far from clear that this is the product that will help right AMD's ship."

Barcelona is a different architecture from K8 (the architecture of the current X2s). It's overclocking performance is currently unknown. Just as Intel's overclocking potential improved as it went from Pentium -> Core 2 Duo, Barcelona may increase or decrease AMD's overclocking potential.

I'm not sure they beat their price, the article lists the 2.0Ghz model at US$389; I got my Core 2 Quad at $300, and they're cheaper than that too now. Unless your thinking of some bulk sale to retailer's price I don't know of.

Since Barcelona is one of the bigger architectural changes from AMD in the past few years, the 32-bit benchmarks are relevant because they are good predictors of what's to come for the entire product line, including the desktop processors, where 32-bit code dominates. Also, if they used exclusively 64-bit code, they would be accused of using unrealistic benchmarks to highlight the fact that AMD has better 64-bit performance than Intel.

It could be argued, however, that these are server and workstation chips and so would be expected to perform mainly 64bit tasks to get the full use out of the performance. So 64bit benchmarks would make more sense. Now when the Phenom chips are out then 32 and 64 bits would be both useful as over the next few years most software will convert to 64bit and drop 32bit.

The 2.0GHz Barcelona [ibm.com] beats the 3.0GHz Xeon X5365 (Cloverton) [spec.org] on floating point. Barcelona specfp_rate2006 score is 73.0 to Cloverton's 66.9. Things can only get better as AMD cranks up the clock in the coming months.

If you scale the benchmarks to the same GHz rating you will see that clock for clock Barcelona is at worst on par with Intel's best chip, and at best 80% faster on floating point. This is really quite amazing when you consider it's using the same amount of power as the previous 2 core AMD Opte

specfp rate was running faster on pre-barcelona dual core Opterons than on Intel's dual core Woodcrest. The reason is no big secret: specfp is memory bandwidth limited and specfp_rate is specfp's running in parallel. Here is a good anandtech article [anandtech.com] on the subject.

We already know that AMD has superior memory performance. If you are doing bandwidth-limited floating point, Barcelona is the clear winner.

If you're making a general statement about floating point performance, you're wrong.

I simply want to use the chip that gives me the greatest floating point throughput I can get.

Define throughput. At some point you need to decide if you are solving equations like LinPack or equations like spec_fp. One causes lots of cache misses and benefits from memory bandwidth, the other does not.

Right now that chip appears to be Barcelona.

Well that's a hypothetical statement based on perception of your needs and their marketing.

I'm not interested with hypothetical arguments

That explains why you're making them (???)

I am looking forward to using Barcelona processors because they will get my mathematical computations done faster.

Hypothetically. Are you going to hypothetically switch when Intel's Penryn with SSE4 comes out? What about Intel's Nehalem?

By the way, check out number 2 and 3 on your top 500 supercomputer list - they're Opterons.

And?? They were designed and built before Core 2 was released. Do you think I'm going to argue they should have used Pentium 4's? Those systems also make solid use of NUMA through a custom Cray crossbar (Seastar), and Intel doesn't have that. If they made them today I see no reason for them not to use Opterons. Do you have a computer with lots of Opterons and a Cray Seastar router on order?

The performance of those systems is measured using LinPack. As I mentioned at the beginning, declaring a 2.0 GHz Barcelona as having faster fp throughput than 3.2 GHz Core 2 depends wholly on which types of calculations you are doing. spec_fp does calculations that are memory bound, LinPack does not (at least not as much). Barcelona's faster fp throughput is not due to markedly superior fp unit (though it may be marginally better) but its onboard memory controller. If you need that sort of thing, great, go with barcelona. If you need raw speed on smaller units (under a couple of megabytes) chances are good that the higher clocked Core 2 with huge cache will win.

However, this will only occur if AMD's newest quad-core is able to outperform the Intel alternatives clock for clock by a decent margin.

I really dislike this whole "tuner" mentality from most reviewers. This is a server chip, so not just clock for clock, but also dollar for dollar, and watt for watt will be big issues. Plus, Intel still generally releases larger caches, so that weighs in.

"The delay puts the chip maker a full generation behind its archrival in terms of chip manufacturing processes. Intel's quad-core processor, which was launched in November last year, melds two of its duo-core processors into a single package."

Heh, shouldn't that be "full generation ahead" since AMD manages to put four cores on a single die?

"The delay puts the chip maker a full generation behind its archrival in terms of chip manufacturing processes.

Emphasis mine. Reading comprehension 101: Read the whole sentence. AMD is at 65nm, Intel is at 45nm, just as when AMD was at 90nm, Intel hit 65nm. This qualifies them as being "a generation behind" in chip making processes.

Whether or not their architecture or their core design is better is completely irrelevant to that sentence (but relevant to the next, which is why it's so odd they'd put tho

This is a direct reference to 65nm vs. 45nm geometry. If AMD brings their quad core to a 45nm process, that should help yield, power and performance. If nothing else, it puts them on a level playing field with Intel (who already have product at 45nm [intel.com]) so that it's down to "design vs. design." Being stuck one silicon technology generation back, they need to resort to other tricks to "keep up."

In other words, to be at overall performance parity with Intel, they have to have a more advanced design in 65nm to keep up with Intel's 45nm work.

Another thing worth noting: By being 1 generation back, the quad core setup is a double whammy. The die area of a given chip roughly halves with each technology node. Not only is AMD putting twice as much on one chip, it's also making chips that are twice the size per transistor. (Remember, to double square area, you only increase your linear feature size by sqrt(2). 65/45 = 1.444... which is about sqrt(2).) Each additional sq mm of die area causes greater yield loss than the one before it (driven by defect density in the source silicon). Doubling die size has a huge impact on yield. So, AMD will potentially suffer significantly higher yield loss, and correspondingly higher costs. Even if it can keep its ASP (average selling price) up, the profit margins will suck.

It'll be interesting to see if AMD can quickly shrink this design to 45nm and get closer to parity. The benefits of the quad core design probably become much more apparent at 45nm.

Intel and AMD are using different production technologies for their dies.
For what i know, AMD is using IBMs SOI (Silicon On Insulator) which has much less drain current and therefor is much better at the same size. But it seems also more complicated to shrink this technology to 45nm.

The die area of a given chip roughly halves with each technology node.

This is not entirely true. Although I agree overall with what you're saying, core logic transistors scale much worse than cache as the manufacturing process decreases in size. I'm not sure if AMD factors this process disadvantage into their chip design, but it is an interesting design choice that they choose to stuff their chip real estate with logic transistors instead of cache. I'm sure that I'm oversimplifying, but I have a gut feeling that they possibly might be choosing to use less cache and more logi

Don't knock "easier to manufacture". The Cray3 and many other interesting designs failed because yields of some critical part never reached commercial viability. My first opteron servers (right out of the gate from a major vendor) had several failures, all due to the onboard memory controller frying. A little slower but fewer defects results in fewer recalls and less bad press.

This is not entirely true. Although I agree overall with what you're saying, core logic transistors scale much worse than cache as the manufacturing process decreases in size.

Fair enough. That said, it's not the transistors so much as it is the wires that don't scale well. I'll warn you: I'm not a physical designer, I'm just an architect. The one and only cookie I directly designed and baked was in 2 micron. [spatula-city.org] That said, I'm aware of the trends.

Its a little more complicated than that because with smaller features, your are susceptible to smaller defects. That is, a defect that is likely not to affect AMD's chip at 65 nm can obliterate Intel's 45 nm chip. So the likelihood of a defect isn't a linear relationship to area as you suggest.

Fair enough, but in terms of dollars of revenue per wafer, though, the relative cost of a given defect is generally smaller on a 45nm wafer than a 65nm wafer if the 45nm design is roughly 1/2 the size of the 65nm design. You've taken out a smaller percentage of the devices on the wafer. Note that I say "relative cost." 45nm wafers are still more expensive than 65nm wafers.:-)

Also, with a RAM-heavy design, you can build significant redundancy into your RAM arrays and perform RAM repair (remap columns

Heh, shouldn't that be "full generation ahead" since AMD manages to put four cores on a single die?

No... AMD's arrogance costs them dearly. Intel has superior fab/process technology and could build monolithic quad-core but it is more expensive than MCM because of decreased yield in monolithic quad-core per wafer.

AMD already has a decent infrastructure to support MCM quad-core very well but refuse to use it to increase their yields. Only arrogance and pride keep AMD from releasing MCM parts, which would si

You're missing a key point: AMD has to have something to differentiate themselves from Intel. If they release an MCM quad-core chip, they're just following Intel. On the other hand, the press is eating up the monolithic quad-core chip, precisely because of the perception that the monolithic design is special and different from the existing quad-core chips.

The Techreport also has a review up: http://techreport.com/articles.x/13176/1 [techreport.com]. Barcelona is similar to Core2, clock for clock. It has better energy efficiency and SMP scaling. But the clock frequencies will need to come up in order to beat Intel's highest clocking chips in absolute performance.

AMD's had I/O performance and memory latency advantages on Intel even before Barcelona though. I suppose Intel will be in even more serious trouble than before in the server space, until it can get its next-generation bus thingy (CSI they called it?) up and running in a year or three. Until then, Intel's stuck in a SMP scaling black hole... and I don't really see Intel coming out with integrated memory controllers and native NUMA like AMD did with their whiz-bang DEC Alpha engineers.Once Barcelona ramps up,

Good link. Glad you RTFA. However, since this is a server chip, why don't they do benchmarking with something other than Windows? Does anyone know of decent benchmarking apps or tech sites that use them for, say, Linux?

As someone who is interested in quiet desktops this article has an interest comment on CPU power specs in that AMD seems to have gotten tired of direct wattage comparisons between different methods of measurements: ie Intels "typical" rating versus AMDs "maximum" rating. So AMD are now introducing their ACP rating "average CPU power".

I wonder how much these things will go for... I know they won't be cheap (in traditional terms), but since AMD has a history of comperable performance for less, I'm really curious how affordable these things will be. If the price is right, one of these may be in my near future...One thing of note is that Motherboards already exist for this processor in fair number. The Barcelona uses a socket F (1207) which the current dual core Operons already use. That should give this processor a decent jumpstart in t

AMD says it won't use the ACP number to compare the power
consumption of its processors against Intel's.

Before everyone slams them for coming up with yet another
cheesy marketing gimmick, I would point out that Intel has
done this ever since the first of the power-sucking P4 line.
They did it a bit less up-front, however, choosing to
redefine "TDP" in their specs rather than give their numbers
a new term (such as "ACP").

This still won't make for a completely fair direct comparison,
because Intel's TDP

Better to lose some 'x' amount of money vs losing all credibility, mindshare, etc, by releasing a flawed product in haste. We'll see if their strategy pays off in the near future. I noted from the benchmarks that these 1.7-2GHz chips keep up with Intels in many benchmarks, but Intel's best are still slightly ahead.

It'll be good to see what comes up in the next 2-3 months as production ramps up.

Really what they need to do is pull off a big merger with just about any other very large company. Samsung would be a good one. Sun wouldn't be half bad, considering Sun's vested interest in Opteron. Although I'd say Sun isn't big enough, even. IBM?

Literally. I can't wait to get in our first DL585 G2 with 4 of these beasties and 64GB of ram. The only regret I have is that we probably won't use em for DB servers because of Oracle's asinine policy of charging per core, sometimes I wish we had gone SQL2005 for more stuff as it is going to scale better with improving hardware. Then again maybe the proliferation of quad core (and above) server cpu's will make Oracle rethink their pricing policy again. I hope they go to what the rest of the industry is doing and license per socket.

The only regret I have is that we probably won't use em for DB servers because of Oracle's asinine policy of charging per core, sometimes I wish we had gone SQL2005 for more stuff as it is going to scale better with improving hardware.

That is the most draconian pricing policy I have ever heard. You actually have to pay Oracle for increasing your processing power?

And an honest question: was there a reason why you didn't look at MySQL or PostgreSQL? I'm not a database expert but my work with them has made me believe they are robust solutions--I certainly prefer them to MSSQL, which is about as pleasant to use as a suppository.

That is the most draconian pricing policy I have ever heard. You actually have to pay Oracle for increasing your processing power?

Um, yeah. Charging per processor (or machine) is par for the course for large "enterprise" software packages. Oracle, Rational, all the hardcore rendering software, etc. they all do it. Welcome to real life.

And an honest question: was there a reason why you didn't look at MySQL or PostgreSQL? I'm not a database expert but my work with them has made me believe they are rob

You are charged per core and can only go below the number of physical cores in the machine if the architecture has hard partitioning of resources, for instance a zone with hard resource limits is acceptable but a container with soft limits is not (well, it is but you need licenses for the max possible resources the container has access to).

I think what the grandparent is distressed about is that they charge per core, rather than per physically discrete processor.

I don't get it. A 'core' is simply a CPU that happens to share a piece of silicon with other CPUs. Charging per a piece of silicon doesn't really reflect the computing resources available for the application.

There's a large gap between charging per processor or per machine. A machine could feasibly be used to independently run the software alongside other instances of it on the network, so charging for another license isn't unreasonable. But charging per core doesn't make any sense to me: unless each core is running a separate and independent instance of Oracle (can it be programmed to do so? Does virtualization play a role here?) then it just seems like you're being penalized for attempting to increase your

There's a large gap between charging per processor or per machine. A machine could feasibly be used to independently run the software alongside other instances of it on the network, so charging for another license isn't unreasonable. But charging per core doesn't make any sense to me: unless each core is running a separate and independent instance of Oracle (can it be programmed to do so? Does virtualization play a role here?) then it just seems like you're being penalized for attempting to increase your ef

vs. the old days. Until not too long ago, they charged based on "power units". What's a power unit you ask? 1 MHz on x86 was 1PU, 1MHz on sparc was 1.5PU, etc. (So for example your departmental e450 with four, 400 MHz cpus would be 4x400x1.5 = 2400 PUs.) How much did a PU cost in licensing? Well, you see, said the oracle salesman with a gleam in his eye, that all depends... That they've shifted to a flat rate per core is actually a big win over the old model for their customers.

I've never used it but Oracle is either one hell of a database, or one hell of a brand for people to put up with tactics like that. That shouldn't even be legal.

Oracle is an amazingly powerful brand and managers think that "scalability" is something you buy rather than an engineering problem for programmers and system architects to solve. That's really the whole story. Given what servers cost and the actual performance differences between different database software given appropriately written client softw

Uh, MSSQL 2005 is a serious enterprise DB, this isn't SQL 7 anymore. Also none of our enterprise software supports PostgreSQL so invalidating our 6 or 7 figure support contracts just isn't an option even if it WOULD work.

fanboi, hardly. I'm quite the MS critic if you bother to read my post history, but MSSQL 2005 is something they got right and I give credit where credit is due. Actually we are having our issues with Oracle at the moment and that combines with their outrageous licensing policies has me a bit peaved off. They have an admitted bug that affects all 10gR2 versions prior to 10.2.0.4 which sends random dates at times to the Oracle Enterprise Management DB causing different OEM tools like DB replication to fail ye

Seems to me you're looking at old Oracle pricing w.r.t. cores - back in February this year, Oracle revised it because they were getting hammered by competitors' per-socket licensing.

If you have 1 or 2 CPUs (i.e. you typically run "Standard Edition One" Oracle which limits you to 2 CPU sockets anyway), you are only charged per CPU *socket*, regardless of the number of cores per CPU - here's the details [oracle.com] from the horse's mouth.

So a quad core single CPU server will set you back about 3,000 pounds + VAT in

Is that sometimes, you wind up losing the race. I think AMD tried to take a risky approach of putting all four cores on one die and they shot themselves in the foot. Now Intel got some sort of quad core out there. Even if it wasn't as good, it was still better than two completely separate 2 way chips, and now, Intel is circling the wagons to do its own native quad core implementation.I fear that Barcelona might well wind up as the Great Eastern [wikipedia.org] of chip making - an impressive technological first, but,

Well, developing MCM capabilities isn't exactly risk-free either. Since in any event the MCM version would be a temporary hold over, the question is does it make sense to spend the R&D time to develop that technology for a product that would, ideally, only exist in the market for a short period?Remember, AMD isn't Intel. They don't have the same resources as Intel. Spending the time on an MCM design would necessarily mean having fewer resources devoted to the native quad core version, and for whateve

Yeah, it reminds me about what happened to Sega in the home console market. They were "firsts" in a variety of changes, like the 32-bit add-on for genesis, the sega "CD", which was 32 bit, and the "dreamcast" that had the first console networking capabilities (at, ahem, modem speed). Soon after the competitors learned what worked and what didnt, and innovated around them. Sometimes being first in a new technology isnt better.

If this chip has four cores, how much faster does this actually make something happen? Doesn't the software have to be optimised for multiprocessors? If someone could 'splain this, it would be a public service.

It depends on three things:1: Whether the software CAN use multiple cores.2: How efficiently it uses the extra cores.3: Whether the program is currently limited by cpu power or by something else.

For "1:", if the program can't use the extra cores, then you'll only see a speed improvement from the fact that the cores are 15% more efficient. i.e. A 2GHz one of these quads performs the same as a 2.3GHz (+15%) dual core from the previous generation for applications in this category.

For "2:", if the program can use the extra cores, but not as efficiently as the first, then you'll see a speed increase equivalent to this. e.g., if the program does two tasks at once, one that takes 70 seconds and one that takes 30, then on one core it'll take 100 seconds. On two cores it would do the 70 second task on one core and the 30 second task on the other, reducing the total time to 70 seconds, a ~40% speed improvement.

For "3:", if the application is limited by something other than the cpu, e.g. "how quickly it can pull data from the hard-disk", you will likely see no improvement whatsoever.

In conclusion, depending on what applications you use, you will see anywhere from no improvement up to 2.3x the previous speed (x2 for double the cores and +15% from the improved efficiency).

Note: As these cpus also have an extra instruction set extension, applications that make use of this could exceed the speed improvements I noted above.

It doesn't make "some" (i.e. one) thing faster, it makes four concurrent things faster. These are server processors. Servers do lots of things at the same time. Even on your desktop, a dual core chip is useful; you can ask stupid questions on Slashdot without slowing down encoding your goat porn home movies.

Well, it has to be multithreaded [google.com]. Thing is, a lot of software is multithreaded already; even on a single-core system, it makes sense to distribute functionality among multiple threads so that resources are used efficiently. On server systems (which is where Opterons are mostly used) software pretty much has to be multithreaded — you don't want all your other clients hanging when one client is waiting on a resource. A web server is a classic example.

When you move a multithreaded program to a system with more cores, than any given thread is more likely to get a core to run on when it needs it. Assuming, of course, that you have enough threads so that's an issue.

Shameless plug: I'm the docs lead for this Opeteron-based server [sun.com], which can have up to 8 CPUs, for a total of 16 cores. When the Barcelona-based CPU modules are ready, customers will be able to upgrade their systems to a maximum of 32 cores. (Don't ask me when this will happen; Marketing would have me killed.) Obviously any software running on such a system has already dealt with the multicore optimization issue.

I don't understand why everyone is so concerned with multithreaded programs that take advantage of multicore CPUs. I'm interested in it simply so I can multitask better between discrete (probably single-threaded) applications. But I run Gentoo, so I guess I need all the extra simultaneous horsepower I can get.

But AMD customers who relied on the company's previous power metric of TDP (thermal design power) were putting too many resources into cooling and electrical supply, said Bruce Shaw, director of server and workstation marketing for AMD. That's because TDP was developed so server manufacturers would know much power the chip consumes in worst-case maximum-power situations that very rarely occur, and design their systems accordingly, he said.
So now AMD will advise customers of an Opteron processor's average CPU (central processing unit) power, or ACP. "ACP is meant to be the best real-world end-user estimate of what they are likely to see from the power consumption on the processor," Shaw said.

Oh Great, first they used the + speed numbers (which I think cyrix actually started but they jumped right in there). I can see that the core speed really meant nothing so I was OK with that, but TDP is a real number. Obviously their marketing folks decided it drew too much power so they opted to make up a lower power usage number. Frankly even a home user wants to know top power usage of the cpu and video card to properly size their power supply.

Uh, they are doing this to come closer to Intel's TDP numbers which have been average high use numbers instead of worst case for at least the last couple generation of chips. AMD is actually being much more upfront here by offering both worst case and average case numbers, I hope Intel follows their lead and offers both numbers.

Ah, my bad, thanks for clearing this up...so that explains Intels ability to suddenly have lower power chips...so it is they that are playing with the numbers this time, interesting:)

To some extent. The Pentium 4 is where this started. The Netburst architecture was very power hungry normally, but it's maximum power was insane. The graph of power consumption vs benchmark had a long "tail", which Intel sought to chop off. See, TDP is a real-life number, since it's used by OEMs and others to design thermal solutions for the parts. If the thermal solution is insufficient, then the parts fail. So it's not actually possible to fudge TDP numbers.

What Intel decided to do was implement an on-chip thermal diode and some logic that halved the effective clock cycle* if the temperature went above a certain threshold. What this meant is that based on how they programmed this logic, they could guarantee that the chip's power consumption would never go above a certain level no matter what code you were running. They had effectively lopped off the long tail. The downside is that if your application does draw more power than the limit, then you'll see vastly reduced performance because of the clock throttling. Most of the time this is transient so it's not that noticeable, but there were benchmarks out there that showed this effect very clearly. Like a certain game benchmark would get lower scores at 640x480 than 1600x1200 because at the lower res the game was cpu bound as was crossing the thermal threshold.

So theoretically with this feature Intel could fudge the numbers however they wanted and claim whatever TDP they desired. In practice they don't have that much flexibility because if they set the bar too low then their effective performance would suck, and their TDP numbers are set at average power + several standard deviations.

The main reason why Intel was able to suddenly have low power chips is because they ditched the Netburst architecture and went back to a design that was more balanced between high clock speeds and high IPC.

They kept the clock throttling logic, though, since it does still give them some benefit in reporting lower TDP numbers. AMD doesn't have this feature, so their TDP is truly the maximum power (as determined by running a "power virus") that you would ever see, even though it's unlikely. Since power has become ever more important as a marketing feature even outside of mobile, I'm not surprised that AMD would decide to start touting expected numbers vs maximum.

* Actually a 50% duty cycle of full speed for some number of microseconds followed by completely off.

Intel releases OSS drivers for their hardware. AMD never really made much that needed drivers up until their buying out ATI, and ATI doesn't have the best Linux track record. So Intel really is the better company to buy for Linux compatibility.

I have owned a PC since early 95 and primarily for financial reasons and then because the Athlon range were, in my option, the best processors available I had not owned an Intel CPU for 12 years. Now for the first time I have an Intel chip, a Dual Core 2 Duo, in my laptop as AMD just can't compete on price or performance.
My new desktop will have an Intel Quad core as Barcelona can barely compete with the 65nm chips and the new 45nm chips will just blow it away.
Although Intel has it's fingers in many

I completely disagree.
AMD's memory architecture is superior to Intel's and that makes for a much more pleasant computing experience -- in my opinion.
I ran an Intel P100 as my first 32bit machine and was happy with it until I got an AMD K2-300. That AMD chip gave snappier performance than a PIII-500.
I fell for the Intel Core2 Duo hype and bought a laptop loaded with a 2Ghz CPU. Running Vista, it underperforms the AMD Turon X2 with half the memory.
AMD's memory architecture is superior. There is no dou

What can I say? I'm disappointed that they stuck with a 3-issue architecture - while it is true that Intel's 4-issue setup is often data-starved, even with exceptional I/O performance AMD can only hope to match the Core platform in most situations. The lack of progress in their cache technology means AMD gets as much burden as benefit out of the L3 cache over 20ns access time!).In the I/O arena, AMD potentially has the edge, and for HPC there's no question Barcelona will do well: this architecture is buil