AMD reveals potent parallel processing breakthrough

AMD has released details on its implementation of The Next Big Thing in processor evolution, and in the process has unleashed the TNBT of acronyms: the AMD APU (CPU+GPU) HSA hUMA.

Before your eyes glaze over and you click away from this page, know that if this scheme is widely adopted, it could be of great benefit to both processor performance and developer convenience – and to you.

Simply put, what AMD's heterogeneous Uniform Memory Access (hUMA) does is allow central processing units (CPUs) and graphics processing units (GPUs) – which AMD places on a single die in their accelerated processing units (APUs) – to seamlessly share the same memory in a heterogeneous system architecture (HSA). And that's a very big deal, indeed.

Why? Simple. CPUs are quite clever, speedy, and versatile when performing complex tasks with myriad branches, but are less well-suited for the massively parallel tasks at which GPUs excel. Unfortunately, they can't currently share the same data in memory.

In today's CPU-GPU computing schemes, when a CPU senses that a process upon which it is working might benefit from a GPU's muscle, it has to copy the relevant data from its own reservoir of memory into the GPU's – and when the GPU is finished with its tasks, the results need to be copied back into the CPU's memory stash before the CPU can complete its work.

Needless to say, that back-and-forthing can consume a wasteful amount of clock cycles – and that's the limitation that AMD's upcoming Kaveri APU, scheduled to appear in the second half of this year, will overcome.

With hUMA, CPU and GPU memory is united in one cache-coherent space (click to enlarge)

The secret sauce that Kaveri will bring to the computing party is hUMA, a scheme in which both CPU and GPU can share the same memory stash and the data within it, saving all those nasty copying cycles. hUMA is cache-coherent, as well – both CPU and GPU have identical pictures of what's what in both physical memory and cache, so if the CPU changes something, the GPU knows it's been changed.

Importantly, hUMA's shared memory pool extends to virtual memory, as well, which resides far away – relatively speaking – on a system's hard drive or SSD. The GPU does need to ask the CPU to tell the system's operating system to fetch the required data from virtual memory, but at least it can get what it wants, when it wants.

In a hUMA system, the GPU can access the entire memory space, virtual memory included (click to enlarge)

At this point, you might well be asking, "All well and good, but what's in it for me?" Glad you asked.

From a user's point of view, hUMA will make CPU-GPU mashups – in AMD parlance, APUs – more efficient and snappier. Better efficiency should improve battery life and make hUMA-compliant processors more amenable to tablets and handsets. Snappier performance means, well, snappier performance.

From a developer's point of view, hUMA should make it significantly easier to create apps that can exploit the individual powers of CPUs and GPUs – and, for that matter, other specialized cores such as video accelerators and DSPs, since there's no compelling reason that they should be forever locked out of hUMA's heterogeneous system architecture party.

Developers shouldn't have much trouble – if any – exploiting hUMA, since AMD says it will be compatible with "mainstream programming languages," meaning Python, C++, and Java, "with no need for special APIs."

Also, it's important to note that although AMD was the company to make the hUMA announcement and will be the first to release a hUMA-compatible chip with Kaveri, the specification will be published by the HSA Foundation, of which AMD is merely one of many members along with fellow cofounders ARM, Imagination Technologies, Samsung, Texas Instruments, Qualcomm, and MediaTek. Should some – all? – of these HSA Foundation members adopt the shared-memory architecture, hUMA goodness could spread far and wide.

In fact, hUMAfication already appears to be on the way – and not necessarily where you might have first expected. AMD is supplying a custom processor for Sony's upcoming PlayStation 4, and in an interview this week with Gamasutra, PS4 chief architect Mark Cerny said that the console would have a "supercharged PC achitecture," and that "a lot of that comes from the use of the single unified pool of high-speed memory" available to both the CPU and GPU.