Parallel

Using Hardware Video Decode on Mobile Internet Devices

By Philippe Michelon, October 15, 2009

Good video playback is one of the promises of Mobile Internet Devices

MID Multimedia Frameworks

As an application vendor you may want to take advantage of this hardware video decode so that your application get the best out of the MID platform. You may not need to use the VA API in your code yourself. There exists several multimedia frameworks today that have been optimized to use this capability. Any application built on top of those frameworks can get the benefits of the platform without having VA API insights.

Helix framework is capable of using the video decode hardware acceleration. As a result every player built on top of the Helix framework benefits from this feature like the RealPlayer for MID.

Gstreamer is a popular multimedia framework in the open source community. Many open source media players are based on this framework in the Linux world : Totem, Rhythm, and Banshee, to name a few. Fluendo is providing optimized codecs for the Gstreamer framework for the Intel GMA 500 chipset. By using the Fluendo optimized codecs, all these applications can benefit seamlessly from the video hardware acceleration.

An implementation of the FFmpeg codecs using the VA API has been developed by Splitted-Desktop Systems, which resulted in dramatic performance improvements with video playbacks in MPlayer on the current Intel processor-based MIDs using the Intel GMA 500 chipset. For reference, the sources are available here.

Typical Code Structure

The code implementing a video decoding with the VA API must follow a certain structure.

After an initialization phase, the client negotiates a mutually acceptable configuration with the server. It locks down profile, entry point, and other attributes that are not varying along the stream decoding. Once the configuration is set and accepted by the server, the client creates a decode context. This decode context can be seen as a virtualized hardware decode pipeline. The decode pipeline must be configured by passing a number of datasets.

The program is now ready to start decoding the stream. The client gets and fill decode buffers with slices and macroblock level data. The decode buffers are sent to the server until the server is able to decode and render the frame. The client then reiterate the operation with the decode buffers over and over to decode the bit stream. See below the typical flowchart of a decoder using the VA API.

Once a decode configuration has been created, the next step is to create a decode context which represents a virtual hardware decode pipeline. This virtual decode pipeline outputs decoded pixels to a render target called "Surface". The decoded frames are stored in Surfaces and can subsequently be rendered to X drawables defined in the first phase.

The client creates two objects. It creates first a Surface object. This object gathers the parameters of the render target to be created by the driver like picture width, height and format. The second object is a "Context" object. The Context object is bound with a Surface object when it is created. Once a surface is bound to a given context, it can not be used to create another context. The association is removed when the context is destroyed. Both contexts and surfaces are identified by unique IDs and its implementation specific internals are kept opaque to the client. Any operation whether it is data transfer or frame decoding will be given this context ID as a parameter to determine which virtual decode pipeline is used. See below a code sample showing how to set the decode context.

For decoding frames, we need to feed the virtual pipeline with parameter and bit stream data so that it can decode the compressed video frames. There are several types of data to send:

Some configuration data like inverse quantization matrix buffer, picture parameter buffer, slice buffer parameter or other data structure required for the different formats supported. This data parameterize the virtual pipeline before sending the actual data stream for decode.

The bitstream data. It needs to be sent in a structured way so that the driver can interpret it and decode it correctly.

There is a unique data transfer mechanism that allows the client to pass both types of data to the driver.

Creating Buffer

The way to send parameter and bit stream data to the driver is through "Buffers". The buffer data store is managed by the library while the client identifies each buffer with a unique Id assigned by the driver.

There are two methods to set the contents of the buffers that hold either parameters or bit stream data. The first one actually copies the data to the driver data store. To do this you in need to invoke vaCreateBuffer with a non null "data" parameter. In that case, a memory space is allocated in the data store on the server side and the data is copied from into this memory space. This is the way it is used in the sample code provided:

If you call it with a null "data" parameter, the buffer object is created but the memory space is not assigned in the data store. By invoking vaMapBuffer(), the client get access to the buffer address space in the data store. This prevents doing memory copies of data from the client to the server address space. The client can then fill the buffer with data. After the buffer is filled with data and before it is actually transferred to the virtual pipeline, it must be unmapped calling va UnmapBuffer(). Find here a code example:

For decoding frames we need to send stream parameters first: the inverse quantization matrix buffer, the picture parameter buffer, the slice buffer parameter or other data structures required for the given format. Then the data stream can be sent to the virtual pipeline. This data is passed using the data transfer mechanism described in the previous chapter. The transfer of data is invoked through vaRenderPicture call.

For each frame to render, you need to go through a vaBeginPicture/vaRenderPicture/vaEndPicture sequence. In this sequence, once the necessary parameters like the inverse quantize matrix or the picture parameter buffer or any other parameter needed depending on the format, are set, the data stream can be sent to the driver for decoding. The decode buffers are sent to the virtual pipeline owing to vaRenderPicture calls. When all the data related to the frame are sent, the vaEndPicture() call makes the end of rendering for the picture. This is a non blocking call so the client can start another vaBeginPicture/vaRenderPicture/vaEndPicture sequence while the hardware is decoding the current frame that has been submitted. The vaPutSurface call will send the decode output surface to the X drawable. It performs a de-interlacing (if needed) color space conversion and scaling to the destination rectangle. Find here a code sample describing the decode sequence:

The VA API provides also other capabilities than just decoding acceleration. It provides functions for

Client and library synchronization

Subpicture blending in the decoded video stream

Host based post-processing by retrieving image data from decoded surfaces.

You can get more details on these capabilities in going through the VA API specifications. The API, which is currently in the version 0.29, will evolve overtime adding incremental functionalities supported by future version of chipsets.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!