New chip families will feature the PUMA/Steamroller cores and Graphics Core Next GPU compute units

Advanced Micro Devices, Inc. (AMD) announced a pair of sporty upcoming x86-based, 28 nanometer (nm) processors that will front its mobile processing efforts. It also announced a new HTPC-geared CPU+GPU combo chip that's sure to please budget shoppers. Officially these chips won't launch until2014 Consumer Electronics Show (CES). But AMD's revealed enough that it's painting a very interesting picture for the 2014 chip market as it tries to capitalize on its sales strengths and improve upon its struggling lines.

Since day one AMD's APU line competed for very specific market niches -- budget laptops with no discrete mobile GPU card -- and leaned heavily on price as a selling point. But AMD's graphical leadership made this formula not only work, but flourish as AMD's chips rivalled even low-end discrete mobile GPUs at a reduced bill of materials net cost for the GPU+CPU.

Last year's follow-up to Llano, the 32 nm Trinity line of "accelerated processing units" (APUs) added an improved on-die GPU (sometimes referred to as a "dGPU"), which fell somewhere between a Radeon 6000 HD and 7000 HD in architecture. Trinity also ditched the aging K10 architecture for a leaner, enhanced Bulldozer core, code-named Piledriver. Power fell to between 65 to 100 watts.

Graphics wise Richland featured a new Neptune on-die graphics processing unit (dGPU), but despite being branded as the Radeon 8000 Series, this GPU was based on the aging VLIW-4 core design introduced with the Radeon HD 6000 series, and replaced in the Radeon 7000 series by Graphics Core Next (GCN). This isn't to say that Richland's GPU didn't evolve on a separate parallel path -- it did-- however, the lack of GCN also had some decided downsides. Most notably GCN is a SIMD architecture; fundamentally different from the MIMD style approach used by VLIW-4.

Ultimately, SIMD would seem to hold some advantages in terms of GPU computing, given that both AMD and NVIDIA Corp. (NVDA) have adopted this approach in their high-end GPUs. But that said AnandTech's tests showed Richland to actually perform quite well in compute, indicating that while its branch of the Radeon 8000 tree was decidedly different, it was not grossly inferior.

While it was announced in January, Richland didn't become available until this last June, a launch window that put it head-to-head with Intel's 22 nm node Haswell chips.

Given the larger node size, Richland chips were slightly cheaper ($105-150 USD versus $182-339 USD), however, Intel's Haswell SoCs were much faster CPU-wise. Haswell U series Core i5 chips also offered aggressive power efficiency, consuming as little as 11.5 watts on the mobile end and 28 watts on the desktop end. By contrast Richland's top performing chips only achieved a 17-watt envelope in mobile chips and a 45-watt envelope on the desktop side, due largely to AMD's larger transistors (32 nm versus 22 nm).

In terms of graphics Intel and AMD took somewhat different roads -- AMD's dGPU featured an external MXM memory package (2 GB) which was larger than the "Crystalwell" embedded memory found in new Intel "Iris" dGPUs, but also slower. Ultimately, Intel's eDRAM+DRAM approach stacked up relatively comparable bandwidth-wise with AMD's pure MXM GDDR5 solution, with Intel's solution being slightly superior from a reliability and power standpoint.

The key area where Richland has done well is finding price-sensitive niches to compete at, including the budget PC market.

Intel disappointed many when it decreed that its high-end Iris Pro 5200 GPUs (the only Haswell dGPUs to feature Crystalwell) would be only available as a ball-grid array (BGA) packaged design for select ultrabooks. Haswell desktop chips were stuck with the lower-end HD 4600 graphics solution, which Richland handily beat in terms of graphical performance [source]. For buyers of machines with a discrete GPU, this wasn't a major concern, but for those building or buying leaner outfitted rigs (e.g. a cheap home theater PC) this was a major letdown.

By July a handful of laptops with Iris Pro (HD 5200) graphics (found in 'R' and "HQ" series Core i5 and i7 processors) had popped up. Intel's Iris Pro GPU turned the tables, outperforming Richland by anywhere from 25-40 percent in gaming benchmarks [source; source] -- albeit at a higher unit cost that ultimately translated to higher laptop prices.

Operating income was $22 million, compared with operating income of $2 million in Q2 2013 and an operating loss of $114 million in Q3 2012. The Q3 2012 operating loss included an inventory write-down of approximately $100 million primarily consisting of first generation A-Series accelerated processing units (APUs).

II. Meet Kaveri, the Fourth Generation Chip in the A8/A10 Line

At this point you might be wondering -- didn't AMD promise to launch 28 nm A8/A10 chips this year? Indeed, AMD had hoped to ship Kaveri -- the successor to Richland (and first AMD 28 nm APU) in H2 2013.

Ultimately that release date slipped to January 2014. Kaveri features AMD's new Steamroller core design which packs 3 more ALUs per core and other improvements to improve on both parallelism and single core performance.

After a back and forth debate rumor-wise about whether it was feasible to pool GDDR5 and DDR3, the believers won out as AMD has indeed adopted this novel technology for Kaveri. The 28 nm chip features a new "HUMA" (Heterogeneous Unified Memory Access) as outlined in slides which leaked in April 2013.

[Image Source: AMD via Bit-Tech]

HUMA supports up to 32 GB of pooled memory total (including support for four DDR3 DIMMs) -- which likely will often mean 16 GB of DDR3 and 2-4 GB of GDDR5.

[Image Source: AMD via Bit-Tech]

A footnote reveals more data on the upcoming Kaveri A10-7850K APU. The slide points to a CPU clockspeed of 3.7GHz, and GPU clockspeed of 720MHz. This A10 is clearly a desktop part, consuming 95 watts. A leak from July indicates AMD will also release a 1.8 GHz CPU clock (2.3 GHz turbo clock); 500 GHz GPU clock chip on the mobile end.

[Image Source: AMD via Bit-Tech]

Also interesting is the inclusion of a Cortex-M5 processor, a tweaked licensed intellectual property (IP) core from ARM Holdings plc (LON:ARM). While the x86 Streamroller CPU and SIMD GCN 1.1 GPU cores are still doing most of the heavy lifting, the added ARM coprocessor gives AMD a low-power tool for "console class gaming sound and movie theater surround processing".

This becomes very interesting when you consider the sales commentary in AMD's earnings report. Clearly AMD has recognized that the power-hungry A8/A10 chips have struggled to find acceptance in laptops, but have been embraced by HTPC users on a budget (as evidenced by the increase in desktop sales). This is a very smart move as AMD is adding value to one of its product's biggest target audiences.

Overall Kaveri is a major jump in terms of core design for AMD. Not only does in unify the until now disparate GPU/APU graphics core trees, but it goes a step further, putting for in essence a single common platform for game console chips, a broad spectrum of APUs, and GPUs. This is the first generation that's featured this unification so it's important not judge the gains to hastily. But it's definitely the start of big things.

IV. AMD's Mobile Successors, "Beema",

In related news, AMD also announced two new processor designs, Beema and Mullins, which will fill in the A4/A6 product line, targeting laptops and tablets.

Mullins notably packs a razor-thin 2 watt TDP, with an onboard GPU. That's a major approved over the most efficient Temash chips which consumed 3.9 watts. Given AMD's reptutation for aggressive pricing these gains may transform it into a serious competitor for tablets and even phablet-style smartphones perhaps. Versus Temash, which saw very weak adoption, Mullins gives AMD a fighting chance against Intel and ARM chipmakers (who, its worth noting, are stuck on the same node).

Beema is slightly more power hungry, slotting into a 10-25 watt envelope, a modest improvement over the 17-35 watt envelope of Trinity.

Both processors are anchored by the fresh Puma core design, which replaces the Jaguar cores found in AMD's Temash and Kabini platforms -- the chips that comprise AMD's current E1/2 branded lineup, as well as much the A4/A6 branded lineup. Like Kaveri, both mobile-minded cores pack a licensed ARM Cortex-M5 processor. In this case the coprocessor is used to enhance mobile security via AMD's TrustZone technology that allows for secured financial transactions online via hardware-level encryption.

Mark Papermaster, AMD’s chief technology officer and senior vice president voices AMD's determination to turn around its mobile APU offerings after a rocky 2013 APU slump. He comments, "AMD is establishing excellent momentum this year in the low-power, mobile computing market and with ‘Mullins’ and ‘Beema’ coming in 2014 we are not standing still. AMD aims to deliver a set of platforms in the first half of next year that will outperform the competition in graphics and total compute performance in fanless tablets, 2-in-1s and ultrathin notebooks."

Fiscally AMD has performed well in 2013, and the year has been a turning point for AMD. Now with the new Steamroller, Puma, and GCN 1.1 CPU/GPU cores anchoring AMD's entire lineup, and with a consistent approach that offers the same kinds of computing IP (from ARM processors down to the processor layout) in a diverse range of product families, AMD's vision is coalescing as well.

It should be a very interest 2014 for AMD. Even if it can't live up to its ambitious mobile hopes, it should see some small gains at least, and Kaveri should make a big splash in the budget desktop space, sustaining growth, assuming AMD continues to deliver come shipping time in H1.