New server platform and 12-core Opteron keep AMD in the game

March was a red-hot month for the processor wars, with both Intel and AMD …

The x86 server wars heated up significantly in March, with the end of the month seeing a major processor launch from each vendor: AMD launched its 12-core Opteron 6100 processor, codenamed Magny-Cours, on the 29th, and Intel then finished off the month with the launch of the 8-core Nehalem EX Xeons.

These were pretty major launches, but I've covered Nehalem EX previously so I want to focus on AMD this time around.

AMD actually launched ten different processors at a range of clockspeeds (1.7 to 2.3GHz) and core counts (8 and 12); all of these parts make up the Opteron 6000 series, which is aimed at two- and four-socket configurations. These two configs represent the bulk of the server market, and AMD is aiming to be the value player here.

In terms of microarchitecture, the new Opterons don't differ significantly from their predecessors, or indeed from the previous few generations. The addition of support for virtualized I/O is the main change a the core level, a change that brings AMD up to par with Intel's Nehalem parts in their level of virtualization support.

At the level of what I'd like to call "macroarchitecture"—meaning the arrangement of cores, I/O, cache, and other resources on the processor die—there are some significant improvements.

The shared cache for the 8-core parts is 17.1MB, while the 12-core weighs in at 19.6MB.

On the memory front, the new Opterons boast support for four channels of DDR3—that's a lot of aggregate memory bandwidth across two or four sockets. For I/O, each package has four HT 3.0 (x16) links; this amount of I/O bandwidth is needed because there are so many cores per socket. In fact, moving out to the system level, you can see where AMD put most of its engineering effort.

DirectConnect 2.0

One of the key ways that AMD is amping up the bang-per-buck is by taking a route that it had previously made fun of Intel for: sandwiching two n-core dies into a single package (a multichip module, or MCM) and calling the resulting part a 2n-core processor. The 12-core is two six-core dies, and the 8-core part is two four-core dies (actually, it's two six-core dies with some cores disabled, an approach that helps get yields up and costs down).

Back when Intel started doing this MCM-based multicore approach in the waning days of the Pentium 4 era, its impact on system architecture was a lot more straightforward. But AMD's NUMA system architecture, where the on-die memory controller means that the memory pool is distributed among all the sockets in a multisocket system, complicates the MCM approach. This is because the number of NUMA nodes no longer equals the number of sockets. AMD's answer to this problem is called Direct Connect 2.0.

Take a look at the diagram below, which shows the I/O and memory buses in the Magny-Cours part. You can see that each individual Magny Cours die (or "node," from the perspective of NUMA topology) has two memory controllers and four HT 3.0 controllers.

A Magny-Cours socket. Source: AMD

The two memory controllers on each die connect directly to the pool of DDR3 that hangs off of each socket, which gives each socket its four total DDR3 lanes.

The way the HT link bandwidth is divided up in a two-socket config is a little non-obvious, but you can see what's going on from the diagram. The controllers split the link bandwidth for each die/node into three x16 links and two x8 links. One of the x8 and one of the x16 are then combined to make what's essentially an x24 link, which is used for directly connecting the two dies that are in the same package.

Another x16 link goes out to connect to the first die in the other socket, and the remaining x8 link connects diagonally to the second die in the other socket. The fourth remaining x16 link on one of the dies is not connected to anything, and on the other die it's used for I/O. The diagram at right attempts to illustrate how this works—it's not great, but if you stare at it for a minute it makes sense.

What's new about Direct Connect 2.0 (as opposed to Istanbul's 1.0 version) are the diagonal links, which let each node connect to two other nodes. Direct Connect 1.0 was missing the diagonal links, so if memory was in the wrong pool a node might have to make two hops to get it, instead of just one. Of course, the diagonal links are half the bandwidth of the regular links, but you can't have everything.

With so many cores per socket, congestion is still going to be a problem, despite the four HT 3.0 links per node. This being the case, AMD uses a technology called HT Assist to reduce inter-core traffic by cutting back on cache snoops among the sockets, so that helps mitigate some of the traffic congestion that could crop up with all of those cores and off-die links.

Despite the drawbacks of the MCM approach, Intel proved with its own dual-die products that the strategy works, especially if you're targeting cost and not just raw performance. MCMs are also great for when you want to pack a lot of compute power into a smaller, cheaper system, and you're willing to compromise a bit on the memory performance for certain kinds of latency-sensitive, write-intensive workloads. Specifically, Magny-Cours should make for a great HPC platform, because it offers a lot of plenty of hardware per socket, per dollar, and per watt, and that's just what you need to put together a cluster of machines that can grind through heavily data-parallel workloads.

Databases are probably a different story, especially when you compare Magny-Cours to Nehalem EX's buffer-enabled memory subsystem, which lets you cheaply cram loads of memory onto each socket. It's also the case that these types of workloads tend to have more coherency traffic, because different nodes may be accessing the same region of memory. In this case, the balance may tip in Intel's favor.

In all, though, the Magny-Cours launch is a huge one for AMD, and its platform-level innovations like Direct Connect 2.0, support for virtualized I/O, power efficiency, and relatively low cost should keep AMD in the server game. And staying in the server game has been AMD's number one survival priority in the past two years. I pointed out at the end of 2009 just how much other business AMD has thrown overboard as the company shrank back into its core x86 server and GPU businesses, and this new server platform reflects that single-minded focus. AMD's processors may not have the raw per-core performance that Intel's Nehalem boasts, but the company is doing a lot at the macroarchitecture and system architecture levels to narrow that gap.

Nice overview. Probably worth emphasising the cost aspect here with relation to actual pricing. Unless Intel cuts EX costs then it really should be beating MC. Approx $2000 difference between the respective top SKUs, per processor, makes Magny-Cours very attractive indeed. Will be interesting to see a like priced EX box vs a like priced MC, I'd assume the EX would have a low clockspeed and probably artificial segmentation such as no HT meaning it may not even beat MC.

good article, from looking at this and other articles it looks like amd is focusing on making more affordable cpus while intel is making high performance. i dont think they will be able to grow without making high performance chips.

amd is pretty much dead, everyone says "thank god amd is around to give intel competition". no one says "thank god amd is around to make great cpus"

good article, from looking at this and other articles it looks like amd is focusing on making more affordable cpus while intel is making high performance. i dont think they will be able to grow without making high performance chips. amd is pretty much dead, everyone says "thank god amd is around to give intel competition". no one says "thank god amd is around to make great cpus"

Please stop this sort of commentary.You can find "AMD IS DEAD FOR SURE LOL" commentary going clear back to the late 90s.It's just not happening, especially with releases like this.

Contribute something intelligent or leave, please.

Anywho...

This is a great announcement. I'd like to see some figures for total real-world bandwidth on various workloads, though, if you guys get a chance (and sample parts), pretty please.We're very pleased with our 6-core istanbuls. 4 sockets of that makes ESX very happy indeed.

good article, from looking at this and other articles it looks like amd is focusing on making more affordable cpus while intel is making high performance. i dont think they will be able to grow without making high performance chips.

Depends upon how you measure performance. AMD does very well in some regards.

Quote:

amd is pretty much dead, everyone says "thank god amd is around to give intel competition". no one says "thank god amd is around to make great cpus"

Maybe great isn't the word but they have competitive CPU's. They are certainly worth looking into for the right apps. Just like the article suggested here these processors will have their performance niche.

Besides in a world where (corporate) computers are bought based on the fact that Intel is inside and no thought with respect to performance is given how can anyone have a valid opinion of AMD. Lets face it the majority of corporate computers are tied to Integrated Intel GPU's and then people wonder why Windows sucks so much on them.

The only thing killing AMD right now is the mind set that they are dead. The end result on most corporate desktops would be exactly the same if AMD hardware was used. In many cases it would be cheaper.

From the diagram I am getting that in the 2 socket system, the socket are communicating with each other via 48 HT links, two x16 links and 2 x8 links. This is a big jump from the single x16 link that the previous socket used for dual socket configuration and puts the diagonal x8 links in a more favorable light.

So, new Opteron = old Core2? And new Nehalem = old Opteron (maybe?)? Got it.

We know who won that round, anyway :-)

The last time I bought a motherboard/CPU to build a new machine was about 1.5-2 years ago. At the time, there were great E2200 deals from Frys. Today, I see an X3 CPU + mboard for $52.xx from Frys. Almost makes me wish I needed to upgrade something.. Or, the X4 + mboard for $90-100. Of course, I'm also kicking myself I didn't buy more DDR2 when 2x2 packs were going for $20-25 after rebate!

I dont think they [AMD} will be able to grow without making high performance chips.

amd is pretty much dead, everyone says "thank god amd is around to give intel competition". no one says "thank god amd is around to make great cpus"

I'd like to know who you mean by "everyone" and "no one"... I haven't bought Intel at home since 1999 and have yet to see a reason to do so. If it hadn't been for AMD's Opteron and the A64, there'd never have been a Core 2--or maybe you missed the part about Intel licensing x86-64 from AMD in order that Intel could dump Netburst and get back into a competitive position with an x86 Core 2...?

Lot's of people have said and said it often: "Thank Goodness for the AMD Opteron and the A64 and x86-64...!" You might want to reflect on the fact that "high performance" doesn't equate to a few frames-per-second advantage in some 3D game benchmarks published by AnandTech and pals... And of course, just because the Phenom II may look a bit "slower" in terms of FPS when running game benchmarks, that certainly doesn't mean that Phenom II is not a "high-performance" chip in its own right. Basically, many of the day-to-day tasks that people do won't seem "faster" on Intel than they do on AMD--and in fact you would often never know the Intel cpus were "faster" *unless* you used benchmark program to compare the metrics because very often the differences are so slight that they're not perceivable in the absence of running a benchmark as a metric. But the one thing people usually have no trouble perceiving is "bang-for-buck" and the general consensus is that on that score AMD is still doing quite well.

I remember back in the A64's Heyday when the best Intel could bring to the "high-performance" party was the P4 in the form of the short-lived Prescott architecture. Most gaming websites at that time used nothing but AMD cpus for their reviews because the A64 was deemed "fastest" by a long shot. These things vacillate back and forth as a result of competition. Intel surely didn't go belly up just because the A64 was deemed faster for a couple of years--likewise the reverse perception won't send AMD belly-up, either.

good article, from looking at this and other articles it looks like amd is focusing on making more affordable cpus while intel is making high performance. i dont think they will be able to grow without making high performance chips. amd is pretty much dead, everyone says "thank god amd is around to give intel competition". no one says "thank god amd is around to make great cpus"

Posted in HPC, 2nd April 2010 17:24 GMT

Free whitepaper – Systems management simplified

The Los Alamos National Laboratory has inked a $45m contract with supercomputer maker Cray to supply the nuke researchers with one of the first Baker series of Opteron-based supers that will ship later this year - and to provide some company for the petaflopping RoadRunner Opteron-Cell massively parallel super, custom-built by rival IBM for Los Alamos.

The Baker machines will couple the latest Opteron processors from Advanced Micro Devices with Cray's new 3D torus interconnect, code-named Gemini. The name suggests that the kicker to the current SeaStar+ interconnect will double up the performance of the crossbar and therefore also double the number of nodes that can be clustered together - although Cray has thus far been pretty vague about the capabilities of the Gemini interconnect.

The HPC supplier has also said very little about what will make Baker machines distinct from its impending XT6 supers, which use the twelve-core Magny-Cours Opteron 6100 processors that were announced earlier this week. With twelve-core Opterons and twice the interconnect bandwidth, the Baker machines should be able to deliver around 3.5 petaflops of sustained performance, according to my back-of-envelope calculations.

The machine that is going into Los Alamos is to be nicknamed Cielo, presumably because the sky is the limit but also because the US National Nuclear Security Administration is frustrated by current performance ceilings as it manages the country's arsenal of 6,000 nuclear weapons.

The NNSA is a semi-autonomous agency within the Department of Energy, which makes and manages nukes for Uncle Sam. The major super labs in the States - Los Alamos, Sandia National Laboratories, and Lawrence Livermore National Laboratory - are all affiliated with the NNSA effort, which also has the goal of designing, completely within a supercomputer, new nuclear weapons and simulating their explosions. (You can't do this with real nukes because of the Nuclear Test Ban Treaty, of course.)

The article states AMD is now doing MCMs when previously they mocked Intel for doing so. That isn't entirely true, because AMD was actually against unconnected MCMs (On Core 2, inter-core traffic was routed through the slow FSB and the memory controller on the motherboard). With this MCM, the traffic stays on package via fast HT links. This means that scaling from 6 to 12 cores is very good, whereas 2 to 4 on Core 2 wasn't so good (though they had a large IPC lead over Barcelona anyway so it made little difference).

Magny-Cours seems to be offering similar performance to equivalently-priced Gulftown Xeons, at similar power consumption. Look at Anandtech's review, and when comparing the Xeon scores remember the price competitor of the benched Opteron is actually the 2.66GHz Xeon, not the 2.93GHz Xeon they tested it against. That puts them in a better position vs. Intel on the server than they have been since 2006 before Conroe launched. They don't have the performance lead but they are a viable choice on cost, performance and power. Interestingly, MC destroys Nehalem EX (that's the 8-core Nehalem, not the 6-core Gulftown) at the price points it competes at. Intel is selling a 4-core, 1.86GHz (no Turbo) Nehalem EX at the same price as a 12-core, 1.9GHz MC Opteron. Obviously the high-end EXs have no competition but it's like they don't think MC exists.

Bulldozer is the one to watch for though. Since it's their first truly new architecture since K8, it has the potential to be great (or a huge flop). Whereas its competitor Sandy Bridge is clearly based on Nehalem so should have a predictable 10-20% boost over that. AMD is saying the performance jump from 12-core MC to 16 core Interlagos should be greater than 6 core Istanbul to 12 core MC, so they are optimistic about its prospects.

Slight error in the article, clockspeeds go up to 2.4GHz on the 8-core parts.