Introducing the AMD FSA

At AMD’s Fusion 11 conference, we were treated to a nice overview of AMD’s next generation graphics architecture. With the recent change in their lineup going from the previous VLIW-5 setup (powered their graphics chips from the Radeon HD 2900 through the latest “Barts” chip running the HD 6800 series) to the new VLIW-4 (HD 6900), many were not expecting much from AMD in terms of new and unique designs. The upcoming “Southern Isles” were thought to be based on the current VLIW-4 architecture, and would feature more performance and a few new features due to the die shrink to 28 nm. It turns out that speculation is wrong.

In late Q4 of this year we should see the first iteration of this new architecture that was detailed today by Eric Demers. The overview detailed some features that will not make it into this upcoming product, but eventually it will all be added in over the next three years or so. Historically speaking, AMD has placed graphics first, with GPGPU/compute as the secondary functionality of their GPUs. While we have had compute abilities since the HD 1800/1900 series of products, AMD has not been as aggressive with compute as has its primary competition. From the G80 GPUs and beyond, NVIDIA has pushed compute harder and farther than AMD has. With its mature CUDA development tools and the compute heavy Fermi architecture, NVIDIA has been a driving force in this particular market. Now that AMD has released two APU based products (Llano and Brazos), they are starting to really push OpenCL, Direct Compute, and the recently announced C++ AMP.

The AMD Fusion Developer Summit 2011 is set to begin at 11:30am ET / 8:30am PT and promises to bring some interesting and forward looking news about the future of AMD's APU technology. We are going to cover the keynotes LIVE right here throughout the week so if you want to know what is happening AS IT HAPPENS, stick around!!

During this morning's keynote at the AMD Fusion Developer Summit, Microsoft's Herb Sutter went on stage to discuss the problems and solutions involved around programming and developing for multi-processing systems and heterogeneous computing systems in particular. While the problems are definitely something we have discussed before at PC Perspective, the new solution that was showcased was significant.

C++ AMP (accelerated massive parallelism) was announced as a new extension to Visual Studio and the C++ programming language to help developers take advantage of the highly parallel and heterogeneous computing environments of today and the future. The new programming model uses C++ syntax and will be available in the next version of Visual Studio with "bits of it coming later this year." Sorry, no hard release date was given when probed.

Perhaps just as significant is the fact that Microsoft announced the C++ AMP standard would be an open specification and they are going to allow other compilers to integrated support for it. Unlike C# then, C++ AMP has a chance to be a new dominant standard in the programming world as the need for parallel computing expands. While OpenCL was the only option for developers that promised to allow easy utilization of ALL computing power in a computing device, C++ AMP gives users another option with the full weight of Microsoft behind it.

To demonstrate the capability of C++ AMP Microsoft showed a rigid body simulation program that ran on multiple computers and devices from a single executable file and was able to scale in performance from 3 GLOPS on the x86 cores of Llano to 650 GFLOPS on the combined APU power and to 830 GFLOPS with a pair of discrete Radeon HD 5800 GPUs. The same executable file was run on an AMD E-series APU powered tablet and ran at 16 GFLOPS with 16,000 particles. This is the promise of heterogeneous programming languages and is the gateway necessary for consumers and business to truly take advantage of the processors that AMD (and other companies) are building today.

If you want programs other than video transcoding apps to really push the promise of heterogeneous computing, then the announcement of C++ AMP is very, very big news.

AMD is in the spotlight this week and Intel has yet to find a way to distract the techies, something the two companies tend to do whenever one takes the limelight. Llano is here and is quite an impressive low power APU. AMD is taking advantage of the space savings of placing the GPU on the same die as the CPU and is basking in the success of the graphics portion proving much better than SandyBridge when it comes to gaming. [H]ard|OCP handed a Gold Award to the $700 notebook they were given to test, find out why by clicking this link.

"While we have seen previous Fusion APUs, today AMD releases its code named "Llano" Fusion A Series APU processor on the world. The first one of these we get to see is in a notebook and a mere 228 square millimeter of silicon that AMD is counting on changing its balance sheet."

Well that was an interesting twist... During a talk on the next generation of GPU technology at the AMD Fusion Developer Summit, one of the engineers was asked about Trinity, the next APU to be released in 2012 (and shown running today for the very first time). It was offered that Trinity in fact used a VLIW4 architecture rather than the VLIW5 design found in the just released Llano A-series APU.

A shader unit from the VLIW4-based Cayman architecture

That means that Trinity APUs will ship with Cayman-based GPU technology (6900 series) rather than the Evergreen (5000 series). While that doesn't tell us much in terms of performance simply because we have so many variables including shader counts and clocks, it does put to rest the rumor that Trinity was going to keep basically the same class of GPU technology that Llano had.

Trinity notebook shown for the first time today at AFDS. Inside is an APU with Cayman-class graphics.

AMD is definitely pushing the capabilities of APUs forward and if they can stay on schedule with Trinity, Intel might find the GPU portion of its Ivy Bridge architecture well behind again.

Before the AMD Fusion Developer Summit started this week in Bellevue, WA the most controversial speaker on the agenda was Jem Davies, the VP of Technology at ARM. Why would AMD and ARM get together on a stage with dozens of media and hundreds of developers in attendance? There is no partnership between them in terms of hardware or software but would there be some kind of major announcement made about the two company's future together?

In that regard, the keynote was a bit of a letdown and if you thought there was going to be a merger between them or a new AMD APU being announced with an ARM processor in it, you left a bit disappointed. Instead we got a bit of background on ARM how the race of processing architectures has slowly dwindled to just x86 and ARM as well as a few jibes at the competition NOT named AMD.

As is usually the case, Davies described the state of processor technology with an emphasis on power efficiency and the importance of designing with that future in mind. One of the interesting points was shown in regard to the "bitter reality" of core-type performance and the projected DECREASE we will see from 2012 onward due to leakage concerns as we progress to 10nm and even 7nm technologies.

The idea of dark silicon "refers to the huge swaths of silicon transistors on future chips that will be underused because there is not enough power to utilize all the transistors at the same time" according to this article over at physorg.com. As the process technology gets smaller then the areas of dark silicon increase until the area of the die that can be utilized at any one time might hit as low as 10% in 2020. Because of this, the need to design chips with many task-specific heterogeneous portions is crucial and both AMD and ARM on that track.

Those companies not on that path today, NVIDIA specifically and Intel as well, were addressed on the below slide when discussing GPU computing. Davies pointed out that if a company has a financial interest in the immediate success of only CPU or GPU then benchmarks will be built and shown in a way to make it appear that THAT portion is the most important. We have seen this from both NVIDIA and Intel in the past couple of years while AMD has consistently stated they are going to be using the best processor for the job.

Amdahl's Law is used in parallel computing to predict the theoretical maximum speed up using multiple processors. Davies reiterated what we have been told for some time that if only 50% of your application can actually BE parallelized, then no matter how many processing cores you throw at it, it will only ever be 50% faster. The heterogeneous computing products of today and the future can address both the parallel computing and serial computing tasks with improvements in performance and efficiency and should result in better computing in the long run.

So while we didn't get the major announcement from ARM and AMD that we might have been expecting, the fact that ARM would come up and share a stage with AMD reiterates the message of the Fusion Developer Summit quite clearly: a combined and balanced approach to processing might not be the sexiest but it is very much the correct one for consumers.

On stage during the opening keynote at the AMD Fusion Developer Summit 2011, Rick Bergman showed off a notebook that was being powered not by the recently released AMD Llano A-series APUs, but rather the Trinity core due in 2012.

Trinity is the desktop APU for next year that will combine Bulldozer-based x86 CPU cores with an updated DX11 GPU architecture built on the current 32nm process. Not much else is known about the chip yet but hopefully we'll get some more details this week at the show.

AMD lines up Llano

Introduction

2006. That was the year where the product we are reviewing today was first consummated and the year that AMD and ATI merged in a $5.4 billion deal that many read about scratching their heads. At the time the pairing of a the 2nd place microprocessor company with the 2nd place graphics technology vendor might have seemed like an odd arrangement even with the immediate benefit of a unified platform of chipset, integrated graphics and processor to offer to mobile and desktop OEMs. In truth though, that was a temporary solution to a more long term problem that we now know as heterogeneous computing: the merging not just of these companies but all the computing workloads of CPUs and GPUs.

Five years later, and by most accounts more than a couple of years late, the new AMD that now sans-manufacturing facility is ready to release the first mainstream APU, Accelerated Processing Unit. While the APU name is something that the competition hasn't adopted, the premise of a CPU/GPU combination processing unit is not just the future, it is the present as well. Intel has been shipping Sandy Bridge, the first mainstream silicon with a CPU and GPU truly integrated together on a single die since January 2011 and AMD no longer has the timing advantage that we thought it would when the merger was announced.

For sanity sake, I should mention the Zacate platform that combines an ATI-based GPU with a custom low power x86 core called Bobcat for the netbook and nettop market that was released in November of 2010. As much as we like that technology it doesn't have the performance characteristics to address the mainstream market and that is exactly where Llano comes in.

AMD Llano Architecture

Llano's architecture has been no secret over the last two years as AMD has let details and specifications leak at a slow pace in order to build interest and excitement over the pending transition. That information release has actually slowed this year though likely to reduce expectations on the first generation APU with the release of the Sandy Bridge processor proving to be more potent than perhaps AMD expected. And in truth, while the Llano design as whole is brand new all of the components that make it up have been seen before - both the x86 Stars core and the Radeon 5000 series-class have been tested and digested on PC Perspective for many years.

For today's launch we were given a notebook reference platform for the Llano architecture called "Sabine". While the specifications we are looking at here are specific to this mainstream notebook platform nearly all will apply to the desktop release later in the year (perhaps later in the month actually).

The platform diagram above gives us an overview of what components will make up a system built on the Llano Fusion APU design. The APU itself is made up 2 or 4 x86 CPU cores that come from the Stars family released with the Phenom / Phenom II processors. They do introduce a new Turbo Core feature that we will discuss later that is somewhat analogous to what Intel has done with its processors with Turbo Boost.

There is a TON of more information, so be sure you hit that Read More link right now!!

ECS, aka Elitegroup, had a large booth at Computex that focused more on its ODM aspects than consumer aspects, but there were still a couple of interesting designs to look at.

The board we spotted was the new A990FXM-A motherboard that is of course based on the latest 990FX chipset from AMD. Supporting the AM3+ processor socket and thus the pending AMD Bulldozer processors, the 990FX is going to be a long term product rather than a short term. One interesting addition to the board is found on the chipset heatsink that has a temperature reactive plastic on it that will turn from grey to orange-ish as the ambient case temperature increases. This could be a great feature to easily gauge the heat level inside a windowed case.

Also an interesting move, ECS has elongated the receptacle on the 8-pin CPU power connection to make it easier to plug in and to remove. If you have ever experienced a pinched finger or sliced finger nail from trying to reach down and unplug an ATX connector, you will see this as a nice addition.

ECS also had its X79 motherboard variant on display, showing the company's readiness for the pending Sandy Bridge-E release.

Also on the motherboard wall was the upcoming A75F-A with support for the AMD Llano Fusion-based processors that should be ready later in the summer.

Finally, a motherboard that we have just recently received for review purposes, the HDC-I is an AMD E-350 or E-240 Zacate platform mini-ITX form factor. This solution might be a great option for users looking to build an HTPC box so be sure you check out our full review coming shortly.

While talking up the new 900-series of chipset and the branding for the upcoming AMD Llano APU launch, AMD did surprise us by showing off a bit more of the future than typical. Rick Bergman, general manager of the AMD Product Group, pulled a Trinity-based APU out of his pocket to demonstrate the conviction of staying on a "one-APU-per-year" cycle in the years to come.

While it looks just like any other AMD processors from a distance, this Trinity APU is based on the Bulldozer x86 architecture (which will see the first release as a CPU only later this year) and combines some amount of SIMD-units (aka Radeon cores) for a CPU/GPU combo. This will be the part that succeeds Llano, due out in a few short days.

This roadmap shows the cadence of once a year will be the norm for AMD going forward and that AMD plans to introduce an APU for the tablet market sometime in 2012. It will be interesting to see how late to the game AMD is in this arena and if they can compete with what ARM is doing or even what Intel will be doing with Medfield.