Setting HSAIL: AMD explains the future of CPU/GPU cooperation

AMD’s heterogeneous system architecture (HSA) initiative has been a steady interest since the company first started talking about “Fusion” processors in 2007. Today, at the international Hot Chips computing technology conference, the company gave a talk that laid out details behind what its HSA Foundation has designed and the language that powers the technology, dubbed HSAIL (HSA Intermediate Language).

It’s best to start with a basic overview of the problem. Despite the popularity of OpenCL and Nvidia’s direct investment of hundreds of millions of dollars into its Tesla products and CUDA software, the actual task of moving work from CPU to GPU, performing it, and bringing it back again is still a giant headache. The simplest explanation for the problem is this: For most of history, the trend in computing was to move tasks to the Central Processing Unit, which would then perform them. Gaming is virtually the only workload that has resisted this tendency (moving a GPU on-die is not the same thing as programming a game to run on the CPU).

After decades of moving workloads towards the CPU, designing a system that moves them back out again and makes the GPU an equal partner is a complex undertaking. The hardware side of HSA compatibility addresses this problem by specifying a number of capabilities that a combined CPU-GPU system must have in order to leverage heterogeneous compute. The CPU and GPU must share a common set of page table entries, they must allow both CPU and GPU to page fault (and use the same address space), the system must be able to queue commands for execution on the GPU without requiring the OS kernel to perform the task, the GPU must be capable of switching tasks independently, and both devices must be capable of addressing the same coherent block of memory.

HSAIL is designed to address the software side of the equation.

HSAIL: It’s not an API

This is a big enough point of confusion that I want to address it in early on. HSAIL is an intermediate translation language that’s created at runtime and mapped to the hardware vendor’s ISA. It’s the secret sauce that allows multiple vendors like Imagination, ARM, AMD, and Qualcomm to benefit from the technology, even though they each have very different GPU hardware. The idea is that you write code in your language of choice (C++, AMP, OpenCL, Java, or Python are all listed) and that code is then compiled to target HSAIL and run on whatever GPU is integrated into the system.

The advantage to HSAIL, according to AMD, is that it won’t require programmers to learn whole new languages. If you’re familiar with OpenCL, use OpenCL. There still may be some overlap between HSAIL capabilities and some of what OpenCL 2.0 supports, but HSAIL is explicitly designed to simplify programming for GPUs in some critical ways. It also opens up the possibility of accelerating languages like Java on the GPU, though again, this does requires that Java itself be capable of mapping well to a graphics card. This hydra has a number of heads.

The central idea, as shown above, is that an HSAIL-capable hardware block doesn’t need to be x86-compatible, or based on GCN, or tied to any other specific architecture. That means Imagination can run code just as well as Qualcomm, at least provided that each company does its own driver writing. Still, the burden isn’t sitting on the programmer for this one, and that’s a major advantage.

What about gaming?

This is a rather complex question. What we call “gaming” is exactly an incredibly complex flow of data between CPU, GPU, main memory, and attached storage. CPUs and GPUs have always had to communicate, but for most of history, that communication has been asynchronous and fast in only one direction. Historically, CPU-GPU communication has been rather lopsided, as the chart below demonstrates.

That chart shows Llano’s bandwidth for CPU and GPU when accessing various types of memory. It’s lopsided because it reflects the imbalance between typical CPU and GPU bandwidths to various parts of main memory. These lopsided connections can be beefed up, and latency can be reduced — the point of showing them here is to illustrate that this broadly illustrates the status quo that developers have been used to working with for decades. Games have historically been designed to run well in a particular type of configuration. HSA has the potential to change that, but software development always lags hardware.

That doesn’t mean HSA won’t be important or that it can’t boost game performance. One of the areas AMD highlights is that historically, while GPUs have been used to accelerate and improve game physics, most of that processing was strictly cosmetic. Nvidia’s PhysX allows for gorgeous displays of additional eye candy, but that eye candy didn’t impact the actual game. One of the things AMD highlights in its presentation is that in-game physics is a compute problem — and HSA can conceivably be leveraged to create much stronger experiences.

There are going to be benefits to HSA in gaming, but data suggests that these benefits may take time to emerge — physics engines have to be designed to pass data back and forth, HSAIL needs to ship, and moving to a new programming model is going to take time. For now, the focus is on using HSAIL for compute tasks, which is why most of the companies that have announced HSA support are focused on high performance computing.

Making GPUs more programmable and useable has implications for mobile and for complicated tasks like facial recognition and natural language processing. AMD’s goal with HSAIL and HSA is to provide a common framework that can accelerate a number of tasks, but the road to gaming use may be more complex than for other areas, where CPUs and GPUs have virtually no history of data sharing and the goal is to allow the GPU to be leveraged in the first place.

Tagged In

Reminds me of the Java Virtual Machine. I worry how much of a performance impact HSAIL would have, as well as how much of a security concern it could present.

massau

even if there will be a mayor performance impact (>20% compared to opencl) it would be still better than not using the gpu at all.

Dozerman

Oh, I’m not arguing that point. It’ll definitely be faster.

patricia666

what Jacqueline explained I didnt know that some people able to profit $9548 in one month on the computer. did you read this site w­w­w.K­E­P­2.c­o­m

Dozerman

“I don’t know who you are. I don’t know what you want. If you are looking for ransom, I can tell you I don’t have money. But what I do have are a very particular set of skills; skills I have acquired over a very long career. Skills that make me a nightmare for people like you. If you let my daughter go now, that’ll be the end of it. I will not look for you, I will not pursue you. But if you don’t, I will look for you, I will find you, and I will kill you.”

Phobos

Might as well save the drama and just say F*** off spammers! No one believe that shit.

Dozerman

I take it you’ve sever seen the movie “Taken”

Dozerman

Oh, I’m not arguing that point. It’ll definitely be faster.

Techutante

It would be nice if Java could be accelerated. On the flip-side, it would be nice if gaming companies just didn’t make games in Java. It’s really not that good.

Dozerman

I think I remember something about oracle working with AMD to add automatic GPU acceleration to java, although they were really vague about how it works. As for games, what games use java other than Web based stuff?

Techutante

A bunch of the newer Indie generation of games run on Java. Minecraft would be the largest notable example.

Dozerman

I see.

massau

isn’t the main problem whit Java that it hasn’t got a good native high level 3d lib so you have to learn low level opengl?

campdude

the main problem with Java is that it doesn’t fully support Linux/Ubuntu anymore. and/or i cant get it to work properly.

massau

that seems like a real problem. maybe you should try a previous jvm or the open jvm?

i don’t know that much about linux. (i had a bad teacher for it)

Dozerman

I think I remember something about oracle working with AMD to add automatic GPU acceleration to java, although they were really vague about how it works. As for games, what games use java other than Web based stuff?

pTmd

In Java, you run your application (in JVM bytecode) on top of the software JVM, and it does not operate on the underlying hardware directly. Everything is controlled by and done through JVM.

In HSA, the application operates on the underlying machine in machine code. However, some parts of the code that should run on accelerators are shipped in HSA IL bytecode form. They will be further JIT compiled from HSA IL bytecode to the native machine ISA at runtime by the HSA finalizers.

So HSA itself doesn’t provide the same “write-once, run-everywhere” capability of JVM (you can have such capacity by utilizing JVM/LLVM + HSA in your project, though), but it is cross-platform when the CPU ISA and OS environment is kept constant. For example, assume ImgTec has now a desktop graphics card, you may have the same piece of HSA binary running on every HSA-enabled Windows machines, say AMD CPU and either an ImgTec GPU or an AMD GPU, without recompilation.

In addition, as the HSA specification requires the compilation stack to provide a fallback path, the binary should also work on aged and/or non-HSA platforms – just there is no GPU acceleration, and everything solely runs on the CPU.

Dozerman

Thanks. That really cleared up some details for me.

Phobos

When do we expect to see all of this? This might be a noobish question but will older hardware benefit from HSAIL?(what about the first generation of APU’s will they take advantage?) or just the newer hardware?

pelov lov

It depends on the hardware and implementation, but in some respects we’re already benefiting from it now. There are plenty of applications which utilize openCL to different extents (some better than others), and GPU acceleration particularly among workstation/professional applications isn’t uncommon. There’s CUDA too, but recently CUDA has fallen back in the consumer space (it’s still king in HPC)

I imagine hardware that’s made by HSA-compliant vendors/makers will probably be likely benefit the most. AMD’s Kaveri will be the first available chip with unified memory architecture support (though you can make an argument here for Kabini). ARM vendors will likely support it extensively, as those monstrous GPUs in these modern SoCs offer fantastic compute potential. I imagine Intel and nVidia have the most to lose here but for different reasons:

Intel is still relentlessly pushing x86 everywhere, including co-processors like MIC/Knight’s corner. Pushing compute to the GPU in a manner that veers away from the x86 ISA isn’t to their benefit. They’re instead pushing wider SIMD units to stave off any potential gains.

nVidia has their own proprietary CUDA approach, and having a non-vendor specific GPU compute means lower prices on Tesla/Quadro, and that’s no good.

Phobos

When do we expect to see all of this? This might be a noobish question but will older hardware benefit from HSAIL?(what about the first generation of APU’s will they take advantage?) or just the newer hardware?

zapper

I think the best way to do is to make CPU cores as parts of GPU i.e. GPU should become the super set & CPU should be subset . Also the next thing to do is to add specialized cores , say implement cores for numerical methods , optimization , operations research , fuzzy logic etc etc. I would love to see a processor which does every numerical method and can solve Engg. maths on hardware. Just add memory.

Dozerman

What would making the GPU primary accomplish?

massau

GPU’s are only faster in some things like loop and vectorize code. it just needs to be massive parallel without branching. (examples sorting, searching , image editing, 3d)
the CPU is good at doing things that are complex (lots of branches) really fast but it can’t take much advantage of the parallel parts.
so for standard use the cpu will be the best.

the other part of using specialized cores is a good idea but it only works for SOC systems. also keep it simple, a special core that isn’t getting used is a waste of power and die space.

Master Troll

PS4!!!

ExtremeTech Newsletter

Subscribe Today to get the latest ExtremeTech news delivered right to your inbox.

Use of this site is governed by our Terms of Use and Privacy Policy. Copyright 1996-2016 Ziff Davis, LLC.PCMag Digital Group All Rights Reserved. ExtremeTech is a registered trademark of Ziff Davis, LLC. Reproduction in whole or in part in any form or medium without express written permission of Ziff Davis, LLC. is prohibited.