Since the announcement of ARM Mali-T604 in 2010, ARM has explained that GPGPU (General Purpose computing on GPU), aka GPU Compute, would be one of the key features of their new Mali graphics processor, and the company now expects GPGPU to become mainstream in embedded and mobile devices in 2014 and beyond. I’ve just come across a presentation by Roberto Mijat, technical marketing manager at ARM, entitled “Unleashing the benefits of GPU Computing with ARM Mali” which shows practical applications and use cases where the use of RenderScript, or OpenCL can make massive performance improvements, at much lower power consumption, over the same parallel tasks processed by the CPU only. Let’s have a look at some of the most interesting slides.

GPU compute can be used for multiple applications in mobile, multimedia, and automotive sectors.

GPU Compute for H.265 / HEVC

HEVC aka H.265 is the next generation codec providing twice the bandwidth with the same quality compared to H.264. The problem is that most SoCs today don’t have VPUs supporting this new standard, and the CPU are not quite powerful enough for 1080p decoding, and software decoding via CPU will require a lot of energy, and quickly drain battery.

HEVC Processing Blocks

Luckily many of the tasks for HEVC decoding require parallel data processing, and these can be partially offloaded from the CPU to the newer GPUs supporting OpenCL or RenderScript. Several companies, including Ittiam, have then developed HEVC implementations leveraging the GPU in ARM SoCs with very good results.

CPU usage has been reduced by 50%, the frame rate doubled, and energy consumption been reduced by 20 to 30%.

GPU Compute for Image and Video Processing

Nvidia already touted the GPU compute capabilities of the Tegra 4 for computational photography, and in the ARM slides, we can see some order of magnitudes improvement over CPU processing.

High Dynamic Range (HDR) imaging is technique taking two shots (foreground/background) to generate a better image. This is computationally intensive, and GPU compute (OpenGL) can provide a speed of about 16x over a CPU only implementation in an Arndale board with Mali-T604 GPU.

Other image processing algorithms are also greatly sped-up, between 3.5x to 15.7x, as shown in table on the right. This time the tests where performed on Nexus 10 tablet (Exynos 5250 with Mali-T604) in Android using RenderScript with software implemented by MuticoreWare.

GPGPU can also be used for Super-resolution techniques aiming to increase resolution of imaging systems, as well as video pre- and post-processing, leading to performance improvements of at least 3x, and a power consumption reduced by up to 80%.

GPU Compute for Computer Vision

order to derive information to enable decisions to be made. It seems particularly suited to GPU compute, as the face detection algorithm, (OpenCV) accelerated with OpenCL is able to achieve 8.7x more detection per seconds, and consume 83% less energy, both on average, compared to the CPU only implementation.