We have been talking about the HSA foundation since 2013, a cooperative effort by AMD, ARM, Imagination, Samsung, Qualcomm, MediaTek and TI to design a heterogeneous memory architecture to allow GPUs, DSPs and CPUs to all directly access the same physical memory. The release of the official specifications today are a huge step forward for these companies, especially for garnering future mobile market share as physical hardware apart from Carrizo becomes available.

Programmers will be able to use C, C++, Fortran, Java, and Python to write HSA-compliant code which is then compiled into HSAIL (Heterogeneous System Architecture Intermediate Language) and from there to the actual binary executables which will run on your devices. HSA currently supports x86 and x64 and there are Linux kernel patches available for those who develop on that OS. Intel and NVIDIA are not involved in this project at all, they have chosen their own solutions for mobile devices and while Intel certainly has pockets deep enough to experiment NVIDIA might not. We shall soon see if Pascal and improvements Maxwell's performance and efficiency through future generations can compete with the benefits of HSA.

The current problem is of course hardware, Bald Eagle and Carrizo are scheduled to arrive on the market soon but currently they are not available. Sea Islands GPUs and Kaveri have some HSA enhancements but with limited hardware to work with it will be hard to convince developers to focus on programming HSA optimized applications. The release of the official specs today is a great first step; if you prefer an overview to reading through the official documents The Register has a good article right here.

"The HSA Foundation today officially published version 1.0 of its Heterogeneous System Architecture specification, which (if we were being flippant) describes how GPUs, DSPs and CPUs can share the same physical memory and pass pointers between each other. (A provisional 1.0 version went live in August 2014.)"

Filling the Product Gaps

In the first several years of my PCPer employment, I typically handled most of the AMD CPU refreshes. These were rather standard affairs that involved small jumps in clockspeed and performance. These happened every 6 to 8 months, with the bigger architectural shifts happening some years apart. We are finally seeing a new refresh of the AMD APU parts after the initial release of Kaveri to the world at the beginning of this year. This update is different. Unlike previous years, there are no faster parts than the already available A10-7850K.

This refresh deals with fleshing out the rest of the Kaveri lineup with products that address different TDPs, markets, and prices. The A10-7850K is still the king when it comes to performance on the FM2+ socket (as long as users do not pay attention to the faster CPU performance of the A10-6800K). The initial launch in January also featured another part that never became available until now; the A8-7600 was supposed to be available some months ago, but is only making it to market now. The 7600 part was unique in that it had a configurable TDP that went from 65 watts down to 45 watts. The 7850K on the other hand was configurable from 95 watts down to 65 watts.

So what are we seeing today? AMD is releasing three parts to address the lower power markets that AMD hopes to expand their reach into. The A8-7600 was again detailed back in January, but never released until recently. The other two parts are brand new. The A10-7800 is a 65 watt TDP part with a cTDP that goes down to 45 watts. The other new chip is the A6-7600K which is unlocked, has a configurable TDP, and looks to compete directly with Intel’s recently released 20 year Anniversary Pentium G3258.

Open source HSA has arrived for the Linux kernel with a newly released set of patches which will allow Sea Islands and newer GPUs to share hardware resources. These patches are both for a sample driver for any HSA-compatible hardware and the river for Radeon GPUs. As the debut of the Linux 3.16 kernel is so close you shouldn't expect to see these patches included until 3.17 which should be released in the not too distant future. Phoronix and Linux users everywhere give a big shout of thanks to AMD's John Bridgman for his work on this project.

"AMD has just published a massive patch-set for the Linux kernel that finally implements a HSA (Heterogeneous System Architecture) in open-source. The set of 83 patches implement a Linux HSA driver for Radeon family GPUs and serves too as a sample driver for other HSA-compatible devices. This big driver in part is what well known Phoronix contributor John Bridgman has been working on at AMD."

When I first read many of the initial AMD A10 7850K reviews, my primary question was how would the APU act if there was a different GPU installed on the system and did not utilize the CrossFire X functionality that AMD talked about. Typically when a user installs a standalone graphics card on the AMD FM2/FM2+ platform, they disable the graphics portion of the APU. They also have to uninstall the AMD Catalyst driver suite. So this then leaves the APU as a CPU only, and all of that graphics silicon is left silent and dark.

Who in their right mind would pair a high end graphics card with the A10-7850K? This guy!

Does this need to be the case? Absolutely not! The GCN based graphics unit on the latest Kaveri APUs is pretty powerful when used in GPGPU/OpenCL applications. The 4 cores/2 modules and 8 GCN cores can push out around 856 GFlops when fully utilized. We also must consider that the APU is the first fully compliant HSA (Heterogeneous System Architecture) chip, and it handles memory accesses much more efficiently than standalone GPUs. The shared memory space with the CPU gets rid of a lot of the workarounds typically needed for GPGPU type applications. It makes sense that users would want to leverage the performance potential of a fully functioning APU while upgrading their overall graphics performance with a higher end standalone GPU.

To get this to work is very simple. Assuming that the user has been using the APU as their primary graphics controller, they should update to the latest Catalyst drivers. If the user is going to use an AMD card, then it would behoove them to totally uninstall the Catalyst driver and re-install only after the new card is installed. After this is completed restart the machine, go into the UEFI, and change the primary video boot device to PEG (PCI-Express Graphics) from the integrated unit. Save the setting and shut down the machine. Insert the new video card and attach the monitor cable(s) to it. Boot the machine and either re-install the Catalyst suite if an AMD card is used, or install the latest NVIDIA drivers if that is the graphics choice.

Windows 7 and Windows 8 allow users to install multiple graphics drivers from different vendors. In my case I utilized a last generation GTX 580 (the MSI N580GTX Lightning) along with the AMD A10 7850K. These products coexist happily together on the MSI A88X-G45 Gaming motherboard. The monitor is attached to the NVIDIA card and all games are routed through that since it is the primary graphics adapter. Performance seems unaffected with both drivers active.

I find it interesting that the GPU portion of the APU is named "Spectre". Who owns those 3dfx trademarks anymore?

When I load up Luxmark I see three entries: the APU (CPU and GPU portions), the GPU portion of the APU, and then the GTX 580. Luxmark defaults to the GPUs. We see these GPUs listed as “Spectre”, which is the GCN portion of the APU, and the NVIDIA GTX 580. Spectre supports OpenCL 1.2 while the GTX 580 is an OpenCL 1.1 compliant part.

With both GPUs active I can successfully run the Luxmark “Sala” test. The two units perform better together than when they are run separately. Adding in the CPU does increase the score, but not by very much (my guess here is that the APU is going to be very memory bandwidth bound in such a situation). Below we can see the results of the different units separate and together.

These results make me hopeful about the potential of AMD’s latest APU. It can run side by side with a standalone card, and applications can leverage the performance of this unit. Now all we need is more HSA aware software. More time and more testing is needed for setups such as this, and we need to see if HSA enabled software really does see a boost from using the GPU portion of the APU as compared to a pure CPU piece of software or code that will run on the standalone GPU.

Personally I find the idea of a heterogeneous solution such as this appealing. The standalone graphics card handles the actual graphics portions, the CPU handles that code, and the HSA software can then fully utilize the graphics portion of the APU in a very efficient manner. Unfortunately, we do not have hard numbers on the handful of HSA aware applications out there, especially when used in conjunction with standalone graphics. We know in theory that this can work (and should work), but until developers get out there and really optimize their code for such a solution, we simply do not know if having an APU will really net the user big gains as compared to something like the i7 4770 or 4790 running pure x86 code.

In the meantime, at least we know that these products work together without issue. The mixed mode OpenCL results make a nice case for improving overall performance in such a system. I would imagine with more time and more effort from developers, we could see some really interesting implementations that will fully utilize a system such as this one. Until then, happy experimenting!

AMD has just introduced their powerful new embedded chip called Bald Eagle. Depending on the model of processor you purchase you get two or four Steamroller CPU cores, and up to eight GCN GPU cores based on the HD 9000 series. That gives the higher end chips enough juice to power up to four independent 3D, 4K, or HD displays which you can bump up to nine if you include an embedded Radeon E8860 discrete GPU in your system. The cores are all fully HSA compliant and will support ECC and non-ECC DDR3 at speeds of up to 2133MHz as well as support for PCIe Gen3 x16, PCIe Gen2 2x4 and USB and SATA as well. Check out more at The Inquirer.

"Bald Eagle also enables heterogeneous system architecture (HSA), which first appeared in AMD chippery in its desktop Kaveri APUs this January, and which allows the CPU and GPU to share the same system memory, vastly simpifying the programming challenge of getting GPUs to shoulder the parallel-processing chores that they excel at far better then CPUs."

Next Wednesday we will get our first look at the HSA enabled Opteron X Series, otherwise known as Berlin. AMD will be unveiling the processor at the Red Hat Summit in San Francisco with an X2100 Opteron running on a Linux environment that is based on the Fedora Project. We have very recently had a chance to see the desktop equivalent, Kaveri, in action but this will be the first example of AMD's heterogeneous computing on a server. Keep your eyes peeled for our coverage, in the mean time you can get a preview at The Register.

"AMD will give the first public demo of its second-generation Opteron X-Series server processor, code-named "Berlin", at the Red Hat Summit in San Francisco on Wednesday."

Not only are the first Kaveri reviews arriving today, the A10-7850K is up for sale on both NewEgg and Amazon and the A10-7700K is available on NewEgg. This new part, at 45W competes favourably with the previous 100W Trinity APU in most tests and when Ryan boosted it to 65W it gained a little more. The Steamroller cores have been updated but not in a way that has a huge effect on CPU performance, on the other hand the 384 SIMD units composing the GPU portion of this chip are quite impressive, 1080p gaming of current generation titles is possible on this chip and we haven't seen it's big brother with 512 SIMD units yet. In the Tech Report's review you can see that BF4 is playable on this chip and this is not the Mantle version optimized for AMD's new architecture. It is also a pity that Thief was unavailable to see just what TrueAudio is capable of. Unfortunately this chip will not find its home in gamers dream machines, that is simply not where AMD is targeting its CPUs. However, for SFF systems that need to be energy efficient and where a discrete GPU is to big to fit Kaveri will usher in a new level of performance.

"AMD's next-generation APU packs in a ton of innovation, including updated "Steamroller" CPU cores, GCN graphics, and advanced HSA features. But is it enough to restore AMD's competitiveness in desktop processors?"

The AMD Kaveri Architecture

Kaveri: AMD’s New Flagship Processor

How big is Kaveri? We already know the die size of it, but what kind of impact will it have on the marketplace? Has AMD chosen the right path by focusing on power consumption and HSA? Starting out an article with three questions in a row is a questionable tactic for any writer, but these are the things that first come to mind when considering a product the likes of Kaveri. I am hoping we can answer a few of these questions by the end of this article, but alas it seems as though the market will have the final say as to how successful this new architecture is.

AMD has been pursuing the “Future is Fusion” line for several years, but it can be argued that Kaveri is truly the first “Fusion” product that completes the overall vision for where AMD wants to go. The previous several generations of APUs were initially not all that integrated in a functional sense, but the complexity and completeness of that integration has been improved upon with each iteration. Kaveri takes this integration to the next step, and one which fulfills the promise of a truly heterogeneous computing solution. While AMD has the hardware available, we have yet to see if the software companies are willing to leverage the compute power afforded by a robust and programmable graphics unit powered by AMD’s GCN architecture.

(Editor's Note: The following two pages were written by our own Josh Walrath, dicsussing the technology and architecture of AMD Kaveri. Testing and performance analysis by Ryan Shrout starts on page 3.)

Process Decisions

The first step in understanding Kaveri is taking a look at the process technology that AMD is using for this particular product. Since AMD divested itself of their manufacturing arm, they have had to rely on GLOBALFOUNDRIES to produce nearly all of their current CPUs and APUs. Bulldozer, Piledriver, Llano, Trinity, and Richland based parts were all produced on GF’s 32 nm PD-SOI process. The lower power APUs such as Brazos and Kabini have been produced by TSMC on their 40 nm and 28 nm processes respectively.

Kaveri will take a slightly different approach here. It will be produced by GLOBALFOUNDRIES, but it will forego the SOI and utilize a bulk silicon process. 28 nm HKMG is very common around the industry, but few pure play foundries were willing to tailor their process to the direct needs of AMD and the Kaveri product. GF was able to do such a thing. APUs are a different kind of animal when it comes to fabrication, primarily because the two disparate units require different characteristics to perform at the highest efficiency. As such, compromises had to be made.

This year’s AMD CES was actually more interesting than I was expecting. The details of the event were well known, as most Kaveri details have been revealed over the past few months. I was unsure what Lisa Su and the gang would go over, but it was actually more interesting than I was expecting.

This past year has been a big one for AMD. They seem to be doing a lot better than others expected them to, especially with all of the delayed product launches on the CPU side for quite a few years. This year saw the APU take a pretty prominent place in the industry with the launch of the latest generation consoles from Sony and Microsoft. AMD made inroads with mobile form factors with a variety of APUs. The HSA Foundation members have grown and HSA members ship two out of every three connected, smart devices. Apple also includes Firepro graphics cards with all of their new Mac Pros.

Kaveri is of course the big news here. AMD feels that this is the best APU yet. The combination of Steamroller CPU cores, GCN graphics compute cores, HSA, hUMA, HQ, TrueAudio, Mantle support, PCI-E 3.0 support, and a configurable TDP makes for a pretty compelling product. AMD has shuffled some nomenclature about by saying that Kaveri, at the top end, is comprised of 12 compute cores. These include 4 Steamroller cores and 8 GCN compute clusters. Each compute cluster matches the historical definition of a core, but of course it looks quite a bit different than a traditional x86 core.

We have gone over Kaveri pretty extensively in the past. The CPU is clocked at 3.7 GHz with a 4 GHz boost. The graphics portion clocks in at 720 MHz. It can support up to DDR-3 2400 MHz memory, which is really needed to extract as much performance out of this new APU. Benchmarks provided by AMD show this product to be a big jump from the previous Richland, and in these particular benchmarks are quite a bit faster than the competing i5 4670K.

Gaming performance is also improved. This APU can run most current applications at 1080P resolutions with low to medium quality settings. Older titles can be run at 1080P with Medium to High/Extreme settings. While this processor is rated at around 867 GFLOPS, which is around 110 GFLOPS greater than the previous top end Richland, it is more efficient at delivering that theoretical performance. It looks to be a significant improvement all around.

Software support is improving with applications from companies like Adobe, The Document Foundation, and Nuance. These cover HSA applications and in Nuance’s case, using the TrueAudio portion to clean up and accelerate voice recognition. TrueAudio is also being supported in five upcoming games. This is not a huge amount, but it is a decent start for this new technology.

Mantle is gaining a lot more momentum with support from 3 engines, 5 developers, and 20+ games in development. They showed off Battlefied 4 running Mantle on a Kaveri APU for the first time publicly. They mentioned that it ran 45% faster than Direct3D at the same quality levels on the same hardware. The display showed frame rates up in the low 50 fps area.

AMD is continuing to move forward on their low power offerings based on Beema and Mullins. Lisa claims that these parts are outperforming the Intel Baytrail offerings in both CPU performance and graphics. Unfortunately, she mentioned noting about the power consumption associated with these results. They showed off the Discovery tablet as well as a fully functional PC that was the size of a large cellphone.

They closed up the even by talking about the Surround House 2. This demo looks significantly better than the previous iteration we saw last year. This features something like a 34.2 speaker setup in a projected dome. It is much more complex than the House from last year, but the hardware running it all is rather common. A single high end Firepro card running on a single A10 7850K. The demo is also one of the first shows of a 360 degree gesture recognition setup.

AMD has come a long way since hitting rock bottom a few years back. They continue to claw their way back to relevance, and they hope that Kaveri will help them regain a foothold in the computing market. They are certainly doing well in the graphics market, but the introduction of Kaveri should help them gain more momentum in the CPU/APU market. We have yet to test Kaveri on our own, but initial results look promising. It is a better APU, but we just don’t know how much better so far.

Valve may very well produce one of the near future's most popular non-mobile, consumer, Linux distributions. SteamOS will be marketed for gaming PCs (some very compelling ones at that) starting next year. CES will definitely be interesting. With such a popular distribution, and as an existing member of the Khronos Group, it makes sense for Valve to join the Linux Foundation... and they just did.

Another addition is the HSA Foundation. AMD is already a Gold member (y'know... HSA's faja) and ARM is Silver so I cannot see HSA being much more than that. Still, Linux will be an important focus for the heterogeneous computing architectures to endorse: both in terms of back-end server optimization and customer-facing devices.

Of course I am not belittling any contribution. Still, there is that desire to see Valve lead the pack. Ultimately, though, it is not the size of the badge: it is how you wear it.