The future of AMD’s Fusion APUs: Kaveri will fully share memory between CPU and GPU

This site may earn affiliate commissions from the links on this page. Terms of use.

The company’s second HSA demo drilled down into the time-intensive data copying issue even more. To show off how the shared memory cuts down of execution time, and solves the shared memory problem (of course), the company presented a server application called Memcached (mem cache D) as an example. Memcached is a table of files kept in system memory (i.e. ECC DDR3) that they use store() and get() functions on to serve up components of web pages without needing to pull the data from (much slower) disk memory.

When the get() function is ported to the GPU, performance of the application is improved greatly due to its proficiency at parallel work. However, the program then runs into a performance bottleneck due to a needed data copy operation to bring the data and instructions from the CPU to the GPU for processing.

Interestingly, the discrete GPU is the fastest at processing the data, but in the end is the slowest because it spends the majority of its execution time moving the data to and from the GPU and CPU memory areas. While the individual hardware is available to accelerate workloads in programs that use both CPU and GPU for processing, a great deal of execution time is spent moving data from the memory the CPU uses to the GPU memory (especially for discrete GPUs).

Trinity improves upon this by having the GPU on the same die as the CPU and providing a fast bus with direct access to system memory (the same system memory the CPU uses, though not necessarily the same address spaces). Kaveri will further improve upon this by giving both types of processors fast access to the same (single) set of data in memory. Cutting out the most time-intensive task will let programs like Memcached hit performance potentials and run as fast as the hardware will allow. In that way, unified and shared memory is a good thing, and will open up avenues to performance gains beyond what can be achieved by Moore’s law and additional CPU cores can alone. Allowing the GPU and CPU to simultaneously work from the same data set opens a lot of interesting doors for programmers to speed up workloads and manipulate data.

AMD Trinity APU die shot. Piledriver modules and caches are on the left.

While AMD and the newly-formed HSA Foundation (currently AMD, ARM, Imagination Technology, MediaTek, and Texas Instruments) are pushing heterogeneous computing the hardest, it is technology that will be beneficial to everyone. The industry is definitely moving towards a more blended processing environment, something that began with the rise of specialty GPGPU workstation programs and is now starting to integrate itself with consumer applications. Standards like C++ AMP, OpenCL, and Nvidia’s CUDA languages harness the graphics cards for certain tasks. More and more programs are using the GPU for certain tasks (even if it’s just drawing and managing the UI), and as developers jump on board it should accelerate even more towards using components to their fullest on the software side. On the hardware side of things, we are already seeing integration of GPUs into the CPU die and specialty application processors (at least in mobile SoCs). Such varied configurations are becoming common and are continuing to evolve in a combined architecture direction.

The mobile industry is a good example of HSA catching on with new system-on-a-chip processors coming out continuously and mobile operating systems that harness GPU horsepower to assist the ARM CPU cores. AMD isn’t just looking at low power devices, however — it’s pushing for “one (HSA) chip to rule them all” solutions that combine GPU cores with CPU cores (and even ARM cores!) that process what they are best at and give the best user experiences.

The overall transition of hardware and software that fully takes advantage of both processing types is still a ways off but we are getting closer everyday. Heterogeneous computing is the future, and assuming most software developers can be made to recognize the benefits and program to take advantage of the new chips, I’m all for it. When additional CPU cores and smaller process nodes stop making the cut, heterogeneous computing is where the industry will look for performance gains.

Tagged In

This site may earn affiliate commissions from the links on this page. Terms of use.

ExtremeTech Newsletter

Subscribe Today to get the latest ExtremeTech news delivered right to your inbox.

Email

This newsletter may contain advertising, deals, or affiliate links. Subscribing to a newsletter indicates your consent to our
Terms of Use and
Privacy Policy. You may unsubscribe from the newsletter at any time.