AMD sets out its plans for 2013, hints at a possible ARM future

AMD has presented its plans for this year and next, setting out a new …

AMD today laid out its plans for the next couple of years at its Financial Analyst Day. The plans are a mix of familiar and logical extensions of the company's current products, but contained some more surprising elements: specifically, AMD opened the door to future processors that include ARM CPUs.

The underlying themes to AMD's plans are faster iteration—a GPU-like 18-24 months between CPU designs, compared to the current 3 or more years—achieved by moving away from custom designs and depending more heavily on synthesized chip layouts, and lower power usage. This in turn will give AMD more flexibility to integrate CPUs and GPUs—and potentially other co-processors too—into what the company calls APUs (accelerated processing units).

Client processors

On the client side, this year AMD will release three APU lines: Trinity, aimed at the performance mobile and mainstream desktop segment, the low power desktop and mobile Brazos 2.0, and the ultra low power tablet-oriented Hondo. Trinity's CPU portion will be based on Piledriver, the successor to AMD's Bulldozer architecture. Each Trinity will pair one or two Piledriver modules (offering two or four threads) with an AMD 7000-series second generation Direct3D 11 GPU (though this may be marketing-ese, and it could be a 6000-series part branded as 7000-series).

The 9-18 W Brazos 2.0 will use AMD's low-power Bobcat core, pairing two of those with a 6000-series Direct3D GPU (this too might be marketing magic; current Brazos uses a 6000-branded 5000-series GPU, and the same may be true of Brazos 2.0 and Hondo). Hondo will have one or two Bobcat cores and a similar GPU, with a power draw of just 4.5 W.

AMD's desktop processors will also include a line of regular CPUs: the second generation FX processors, codenamed Vishera, with two to four Piledriver modules (4-8 threads).

In 2013, the company will make some bigger changes. On the mobile side, Trinity will be replaced by Kaveri, Brazos 2.0 by Kabini, and Hondo by Tamesh. Kaveri will continue to be Bulldozer-derived, using the third-generation Steamroller cores, along with AMD's "Graphics Core Next" (GCN) GPU. Kabini and Tamesh will similarly continue to be Bobcat-derived, using the revised "Jaguar" desgin. Kabini and Tamesh will also use the GCN GPU. Both will include integrated I/O functionality, including SATA and USB, making them single-chip solutions.

Kaveri and Kabini will also be used on the desktop. However, the company announced no plans to replace the Piledriver-powered Vishera with an equivalent Steamroller model. Vishera will continue to be sold into 2013.

Kaveri and Kabini will also be the first processors to support what AMD is calling Heterogeneous Systems Architecture (HSA). AMD's goal with HSA is to make mixed workloads that use both the CPU and the GPU easier. The 2013 HSA processors will give the GPU and CPU a unified, coherent address space, with the GPU able to use the same demand-paged virtual memory that the CPU uses. This means that data will no longer have to be moved from CPU to GPU to allow the GPU to work on it, and that both processors will be able to operate on the same data simultaneously, making mixed CPU/GPU computation seamless.

Server processors

The company's server roadmap is simpler. This year, 2- and 4-socket machines will continue to use Interlagos: 2-8 Bulldozer modules (4-16 threads), four HyperTransport links, four channels of DDR3 memory. 1- and 2-socket machines will use Valencia: 3 or 4 Bulldozer modules (6 or 8 threads), two HyperTransport links, two channels of DDR3 memory. High-density 1-socket machines will use Zurich: 2 or 4 Bulldozer modules (4 or 8 threads), one HyperTransport link, two channels of DDR3 memory.

Interlagos, Valencia, and Zurich will be replaced with, respectively, Abu Dhabi, Seoul, and Delhi. Module/thread counts will stay the same, as will the numbers of memory channels and HyperTransport links. The new processors will, however, use the Piledriver architecture. AMD says that Piledriver will improve performance in the same power envelope by about 10-15 percent.

Older server roadmaps from AMD included plans for a new socket for even larger server CPUs, a 10-thread part called Sepang and a 20-thread part Terramar. These have been cancelled in favor of the current plan. The decision to keep the module/thread counts the same and instead improve per-thread performance is a welcome one: in many workloads, Bulldozer's increased thread count was not enough to offset its weaker per-thread performance relative to AMD's previous generation processors. Piledriver should go some way toward redressing those weaknesses.

A mixed ISA future?

With APUs, more easily iterated synthesized designs, and HSA, AMD is taking some big steps toward producing flexible, heterogeneous processors: processors that pack together many different cores, each with their own strengths and weaknesses. AMD wants to extend the APU concept with other processor units, both AMD-originated and third party, to offer customers tailored solutions suitable for different applications. For example, motion video codec accelerators (such as those found in Intel's Medfield system-on-chip) would be attractive in tablets and, if AMD can get power usage down enough, even smartphones (though this is not a market the company is targeting at present).

One particularly intriguing third-party unit would be an ARM processor. AMD mentioned ARM several times during its presentations, and a number of its slides stated that the company wanted to produce SoCs that are "ambidextrous... across ISAs," stating also that the company was "flexible around ISA."

AMD spoke of these mixed-ISA processors in the context of servers and datacenters, so the immediate utility of an ARM processor is not clear. However, ARM Ltd wants to move ARM into the server space, having recently extended the ISA to support 64-bit systems. If ARM were to become a significant force in this market, the ability to natively run both ARM and x86 workloads on a single chip might become attractive. AMD could potentially even scrap the x86 core entirely, pairing ARM CPUs with its own high-performance GPUs.

Guess it would probably be more like a single system that could run binaries for multiple architectures (eg x86 or ARM). But it seems like clever operating system software would be needed to take advantage of such a feature.

Guess it would probably be more like a single system that could run binaries for multiple architectures (eg x86 or ARM). But it seems like clever operating system software would be needed to take advantage of such a feature.

Guess it would probably be more like a single system that could run binaries for multiple architectures (eg x86 or ARM). But it seems like clever operating system software would be needed to take advantage of such a feature.

That would be revolutionary if it could work seamlessly without any modification of the OS. Thanks to ever shrinking transistors instruction sets take up increasingly small portions of overall die space.

So this unified, demand paged address space. Is that only for openCL, or are we potentially looking at page faulting textures?

I don't know what Microsoft's current plans actually are, but the original plans for WDDM (Windows Display Driver Model) 2.0 (Windows Vista: 1.0, Windows 7: 1.1, Windows 8: 1.2, so 2.0 is still "distant future) were to have full demand-paged virtual memory for GPUs and I believe faulting in textures. Though I'm sure there would also be ways of forcing them to be resident. Page-faulting seems to me like it would be nice for Megatexture-like systems.

AMD had a MIPS division until 2006 which made a line of 64-bit chips called Alchemy. In 2006 AMD sold the division (I'm can't find whether or not the license went with the deal, but normally licenses of that kind aren't transferable and the buyer already had a MIPS license, so I assume that AMD still has a MIPS license unless the contract expired) to Raza Microelectronics which was in turn bought out by NetLogic Microsystems.

In 2008, AMD sold it's mobile Xilleon GPU division to Broadcom (though parts of the IP still show up in the form of AMD's unified video decoder). The same year, the Imageon mobile GPU division was sold to Qualcomm who now markets it as the Adreno series.

With AMD having sold off it's mobile IP, it's small wonder that they say they don't want back in the game; all their former advantages are now gone.

My gut feeling is that AMD has too many models in its product line. They need to focus on one chip for each major market segment and make that chip kick ass. Even now, who has need for 4 different F1 procs when most just get the unlocked one for a little more. While the F1s are a great value, the same cannot be said for the vast array of bulldozer procs that performed more poorly than phenoms.

I also want to add that the diagram above is the worst type of desperate marketing crap you can find. What do any of the statements actually mean? "Innovation driven engineering"? It's good to know that "time to market" is part of their "plan", that they plan to "get shit done fast ". I hope so.

I don't think the term "ambidextrous" is catchy enough. I'm thinking "transsexual" or "bISA-curious"

AMD had a MIPS division until 2006 which made a line of 64-bit chips called Alchemy. In 2006 AMD sold the division (I'm can't find whether or not the license went with the deal, but normally licenses of that kind aren't transferable and the buyer already had a MIPS license, so I assume that AMD still has a MIPS license unless the contract expired) to Raza Microelectronics which was in turn bought out by NetLogic Microsystems.

In 2008, AMD sold it's mobile Xilleon GPU division to Broadcom (though parts of the IP still show up in the form of AMD's unified video decoder). The same year, the Imageon mobile GPU division was sold to Qualcomm who now markets it as the Adreno series.

With AMD having sold off it's mobile IP, it's small wonder that they say they don't want back in the game; all their former advantages are now gone.

I'm not convinced AMD either a) had any idea prevalent ARM would become (and how quickly it would do it, remember the iPhone was only released in 2007), or b) had any choice. They were bleeding cash and had to sell their assets out of necessity, not out of choice.

I would also not assume that just because they mention ARM that it means they are targeting the conventional mobile sector.

Seems like AMD have acknowledged they can't keep up with Intel in the traditional "make the fastest x86 CPU you can" market, and are going to do something a bit different. Remains to be seen if it is successful of course, but I think in the long term, this may be what keeps AMD afloat. We'll see...

So this unified, demand paged address space. Is that only for openCL, or are we potentially looking at page faulting textures?

I don't know what Microsoft's current plans actually are, but the original plans for WDDM (Windows Display Driver Model) 2.0 (Windows Vista: 1.0, Windows 7: 1.1, Windows 8: 1.2, so 2.0 is still "distant future) were to have full demand-paged virtual memory for GPUs and I believe faulting in textures. Though I'm sure there would also be ways of forcing them to be resident. Page-faulting seems to me like it would be nice for Megatexture-like systems.

It's the "there would also be ways of forcing them to be resident" part that has me somewhat scared and excited. In order to implement an Id Tech 5 "Megatexture-like system" that supported page-faulting and ideally persistence of memory state we're really talking about implementing an application/rendering specific "graphics-filesystem." This does have me rather excited about the possibilities of supporting the streaming of very large data sets for rendering applications but I can just see several years of multiple (incompatible) competing "graphics-filesystems" vying to become the de facto standard which would be the scary part, one can only imagine the headaches...

Does anybody know if there is anyone working on or discussing a standard for "graphics-filesystems?" Khronos Group?

I like AMD's current direction. It is the best the company could have chosen, in my opinion. They can't beat Intel very well in raw processing power, so they've decided they're going to beat them where it counts. Most people have far too much processor power on their computers..they need more graphics power. Intel's chips don't matter to most people, and even a lot of mid to high end gamers don't realize they don't need that kind of performance. This is the way to go. And if they can offload some processor workloads to the GPU, they've defeated Intel in performance in both areas most likely.. but we'll see if they succeed here.

Peter, I'm a little confused by your use of "threads" when referring to the CPU parts in this article - when you refer to an CPU module as "4-8 threads" do you mean 4-8 actual cores? I ask because right now I'm not sure if you'd refer to a Sandy Bridge CPU as "4 threads" or "8 threads", and AFAICT the performance impact of Hyperthreading is not equivalent to actually doubling the number of cores present.

As far as the direction goes... it seems like everybody wants to hate on AMD now, but I'm not so sure that ultimately the direction they're taking is the wrong one. It may, however, be a classic case of "before their time". The market is pretty clearly heading towards parallelization on higher and higher scales, and power efficiency over per-core performance falls along with that. So the real question, in addition to the obvious "can they do a good job with that?" is "is their timing to market going to be right?"

Peter, I'm a little confused by your use of "threads" when referring to the CPU parts in this article - when you refer to an CPU module as "4-8 threads" do you mean 4-8 actual cores? I ask because right now I'm not sure if you'd refer to a Sandy Bridge CPU as "4 threads" or "8 threads", and AFAICT the performance impact of Hyperthreading is not equivalent to actually doubling the number of cores present.

Each module has a single front-end, two narrow integer pipelines, and a shared floating point and SIMD pipeline. Each module can simultaneously run two threads. I'd call a hyperthreaded Sandy Bridge, such as a 2600, 4 core, 8 thread. I'd call a Bulldozer FX 4 module, 8 thread.

I wouldn't use "cores" to describe Bulldozer at all, just as I wouldn't use "cores" to label hyperthreaded threads.

AMD's direction is not so good for today's gamers. It's pretty good though for servers -- especially if you need many servers focusing on many high utilization threads like virtualization.

For example, you get get a 1U dual 4284 (16-cores) w/ 64GB of RAM for $2000. Intel's alternative at that price range is 8-cores w/ hyperthreading. A Sandy Bridge Xeon might be faster per core (although the gap is much closer running open source stack) but it won't overcome an extra 8 cores if you actually utilize all cores.

As for the arguments whether AMD's modules have 2 cores or just 1+1, it definitely acts like 2 full cores on our integer-only server tests using CentOS + KVM + PostgreSQL. We have a test FX-8120 and we can fully allocate VMs to all 8-cores with minimal slowdown (compared to 4 cores at non-turbo speed). By comparison, we tried to over-allocate VMs on both Nehalem i7 and Sandy Bridge i5 -- Intel hyperthreading does not support even a single extra core w/o losing significant speed. Obviously, FPU/SIMD would be maxed out at 4 cores but we don't have much FPU/SIMD workloads on our servers. (The trend anyways is to use GPU servers for these calculations.)

AMD's biggest problem is they had to sell off their fab for cash. Now they're beholden to an outside vendor to make their chips and it's probable they missed their frequency targets for Bulldozer because of this.

In many ways this makes sense. x86 is not a home grown architecture and neither is ARM. Anything to help AMD cover a changing field is important. As power limitations, based either on limitations of batteries or greening of data centres, become more important factors, then any chip that can squeeze more performance while consuming less electricity will become more important.

I also want to add that the diagram above is the worst type of desperate marketing crap you can find. What do any of the statements actually mean? "Innovation driven engineering"? It's good to know that "time to market" is part of their "plan", that they plan to "get shit done fast ". I hope so.

I don't think the term "ambidextrous" is catchy enough. I'm thinking "transsexual" or "bISA-curious"

++

Thank you. I was cringing reading that slide. It smacks of motivational "follow the cheese" business leadership bullshit execs spout off. The type of thing the director or VP would stand up and talk to before letting the lead tech person get up and really outline what's going to happen in product development...which makes the other execs snore, b/c they don't want the details.

Can Ars do a article, or at least mention, which sockets these chips are designed for? I find that info hard to come by. I love AMD chips because of their history of upgradability, but without knowing which chips will be compatible with which motherboards, I can't buy for the future.

I heard that the current FM1 sockets would be replaced with FM2 in 2012 for the trinity chips. Is this just for the low power mobile class processors or are will they work with future FX chips as well?

We have a test FX-8120 and we can fully allocate VMs to all 8-cores with minimal slowdown (compared to 4 cores at non-turbo speed). By comparison, we tried to over-allocate VMs on both Nehalem i7 and Sandy Bridge i5 -- Intel hyperthreading does not support even a single extra core w/o losing significant speed. Obviously, FPU/SIMD would be maxed out at 4 cores but we don't have much FPU/SIMD workloads on our servers. (The trend anyways is to use GPU servers for these calculations.)

Thanks for this. I've seen similar results on AMD-powered servers running KVM with various guests, but it's nice to see some confirmation elsewhere - I haven't really had the time to test hyperthreaded Intel CPUs as heavily.

Why all the goofy damn names for architecture and chip revisions, and even OS versions lately? What the hell was wrong with 80386, 80486DX, 68000, 68040 etc. It was an easy to understand measure of where and why they fit in the hardware scheme with a logical set of modifying numbers and letters where it's not so much goddamn alphabet soup to keep track of, and you don't get looked at like some kind of a freak when you are overheard talking about your bobcat not liking ice-cream sandwiches or something.

I honestly hate this paradigm and I do not understand why it has taken over the last 10 years or so.

AMD's ambiguious plans indicate that they (a) don't have a clue on what they're suppose to be doing or (b) have something so massively earth shattering that they are keeping it secret to spring on to Intel and mortally wound that beast.

I also want to add that the diagram above is the worst type of desperate marketing crap you can find. What do any of the statements actually mean? "Innovation driven engineering"? It's good to know that "time to market" is part of their "plan", that they plan to "get shit done fast ". I hope so.

I don't think the term "ambidextrous" is catchy enough. I'm thinking "transsexual" or "bISA-curious"

++

Thank you. I was cringing reading that slide. It smacks of motivational "follow the cheese" business leadership bullshit execs spout off. The type of thing the director or VP would stand up and talk to before letting the lead tech person get up and really outline what's going to happen in product development...which makes the other execs snore, b/c they don't want the details.

Lol ... I had exactly the same thought. I can see an email circulating among AMD executives talking about how their "ambidextrous ecosystem" is part of their "blue ocean stratgy" to find their "new cheese".

Why all the goofy damn names for architecture and chip revisions, and even OS versions lately? What the hell was wrong with 80386, 80486DX, 68000, 68040 etc. It was an easy to understand measure of where and why they fit in the hardware scheme with a logical set of modifying numbers and letters where it's not so much goddamn alphabet soup to keep track of, and you don't get looked at like some kind of a freak when you are overheard talking about your bobcat not liking ice-cream sandwiches or something.

I honestly hate this paradigm and I do not understand why it has taken over the last 10 years or so.

AMD could potentially even scrap the x86 core entirely, pairing ARM CPUs with its own high-performance GPUs.

That's a stretch. AMD is a bit like ARM Holdings plc (not ARM Ltd, as in article!), since they are both fabless. X86 is AMD's core competency, they are not going to throw it away easily. Especially with the arrival of Win8 x86 tablets and WP8 x86 phones. AMD's only strength compared to Intel is a decent GPU (ATI's legacy) - that's why they want to leverage it as much a possible.