Reducing display power to extend mobile battery life

One thing all of us miss when remembering old days of mobile phones is their battery life. As a happy owner of a Nokia 6210, I could afford forgetting a charger for a week of holiday. Of course, the roaming charges back then helped to save battery, but no one will argue the power consumption of today’s mobile devices has grown to become their greatest drawback.

There are many factors that affect the device’s power efficiency, which can be expressed in the number of hours between battery charges. Today, in the era of HD mobile screens there are two major issues that contribute to high battery drain – display brightness and power dissipation in the video and graphics subsystem. In this paper, we will discuss the latter one – smart video and display pipeline in the system-on-chip. Smart, which means providing similar performance to competitive solutions, but requiring much less power.

Challenges for video and graphics subsystems

The modern graphics SoC is required to render high resolutions at high frame rates, and above that perform multiple image post-processing tasks like scaling, rotation, pixel format conversion and others. The typical approach to this challenge is to employ a Graphics Processing Unit (GPU), however, due to its general purpose architecture the power efficiency during specific display processing operations is not optimal.

For such cases, Evatronix developed the PANTA DP IP cores, a family of display processors to take over these display specific tasks from the GPU and thus reduce the power dissipation significantly. The PANTA processors are optimized to execute tasks like multi-layer composition, YUVRGB conversion, rotation, alpha-blending, gamma correction and others before presenting the frame buffer for display. This enables significant reduction of overall SoC dynamic power consumption through partial or complete offload of the GPU. Further reduction of power consumption in the PANTA DP-aided SoC is achieved thanks to preserving the minimum system memory bandwidth through reduction of the number of accesses to the video and graphics frame buffers.

Enhancing existing architectures

Let’s consider an example of a GPU processed display pipeline handling multiple display outputs. The system is displaying graphics frames on two panels with different resolutions - external Full HD (1920 x 1080 pixels) and local HD (1280 x 720 pixels) display. Each frame is composed of three layers. The first is a decoded Full HD video previously recorded by the device’s camera. The frame is stored in the frame buffer in a YUV 4:2:0 format. The other two layers, audio volume control and recording date, are generated in RGB formats by the GPU. A number of operations must be executed before the composed layers can be displayed. These include YUV to RGB video layer conversion, three frames alpha blending, scaling and rotation. In a system presented in Figure 1, the display controllers just transfer the final data prepared by the GPU in the frame buffer.

Figure 1: Typical multi-display system

Click on image to enlarge

In this case, energy is wasted by display specific tasks being executed in the GPU, which is optimized to perform different graphics computing operations, in this case 2D graphics rendering.

I like the approach to saving power and thus extending battery life. Given the CVFsq nature of power would it make for a more efficient GPU if it was made more parallel and ran at a much slower clock frequency? Given the frequency is squared in the power calculation I would expect significant power savings through increasing the silicon used to provide the graphics processing. Just a thought (has this been done already?).