A VP8 hardware decoding stack

Hi ! Today let’s talk about the different components needed in order to provide a working VP8 hardware decoding stack.

The goal of my Google Summer of Code project is to provide a generic way of decoding VP8 videos using hardware acceleration. This “hardware acceleration” is going to be provided by modern graphic cards and there “streaming processors”, “shaders units”, “cuda cores” or whatever names they can takes depending on vendor’s marketing departments. Basically they are small processors built inside GPUs that can process a lot of data simultaneously.

When I say a “generic way to use hardware acceleration”, I mean that a lot of media players should be able to benefit from it, and a lot of graphic cards should be able to provide it.

Decoding back-end

First, a hardware accelerated decoder has to be built. Because writing even a purely CPU based video decoder is a heavy task (in itself longer than the time allocated by the Google Summer of Code program), an existing decoder is going to be used first. Then, the heaviest computational tasks are going to be progressively rewritten to use shaders.

The libvpx library is going to be used and built inside the Mesa 3D Graphics Library. That should allow video cards drivers (r300g, r600g and nouveau are targeted) using Mesa to be shipped with VP8 hardware decoding support.libvpx is the VP8 reference implementation, supported by Google, and has a BSD style license, compatible with Mesa’s MIT license, wich make it a great candidate for inclusion.

In order to be used, the decoder located within Mesa must advertises its capabilities through a Gallium3D state tracker supporting the VDPAU API.

API

So an API to make the link between the video decoder and a media player is needed. When a media player start decoding a video, it must check if your system is capable of hardware decoding, and if so, bypass its regular CPU based decoder to use the GPU based decoder. Several APIs can help doing that, let’s review some of them :

XvMC (X-Video Motion Compensation)

Build as a xorg extension, and based on the even older X video extension, XvMC allows media players to offload a limited number of operations (motion compensation/inter-frame prediction and iDCT/inverse transformation) to capable GPUs. XvMC design is quite old and has not been thought for recent video formats.
Its primary target are MPEG 1/2 videos.

VA API (Video Acceleration API)

Originally designed by Intel, VA API main motivation was to supersede XvMC with a new design and much extended capabilities. In addition of motion compensation and inverse transform, VA API can also handle deblocking filter, intra-frame prediction and bitstream processing. As XvMC, VA API just exposes to the GPUs only particular chunks of data and their associated treatments, so a lot of the video decoding logic stays inside regular CPU based decoder. Another particularity of VA API is to handle video encoding as well as video decoding.
Its primary target are H.264, VC-1, MPEG-2 and MPEG-4 videos. VA API is currently implemented by the Intel Linux driver.

VDPAU (Video Decode and Presentation API for Unix)

VDPAU was designed by NVIDIA to offload video decoding and post-processing effects from the CPU. A media player has to start the decoding, but then passes large portions of the bitstream to VDPAU and gets back fully decoded frames. As almost all of the decoding process can be offloaded, VDPAU allows great flexibility in the implementation of a GPU based decoder.
Its primary target are H.264, VC-1, MPEG-2 and MPEG-4 videos. The libvdpau library needs to be slightly patched in order to support VP8 decoding. VDPAU is currently implemented by the NVIDIA closed source driver available for Linux, FreeBSD and Solaris operating systems. VDPAU is well supported by media players, and this is why it has been chosen.

Last but not least, you’ll need a media player. The media player loads a given video file, parses its container to gather some information about the file’s content (video length, definition, audio and video codec, etc) then launches the decoding process and finally draw the decoded pictures onto the screen.

Today most of the media players available don’t do media decoding themselves, but instead uses libraries dedicated to these tasks. The primary used library is ffmpeg and its recent fork libav. We can also mention GStreamer and xine-lib. These libraries are available on a wide range of operating systems and architectures, and can decode pretty much every video or audio file formats available in the wild (and that is a lot).

I made the choice to add VP8 VDPAU support inside ffmpeg/libav. They are well known libraries, already have support VDPAU for other video formats, and are used among others by the famous VLC media player and MPlayer.

2 Responses to A VP8 hardware decoding stack

A word of caution: make sure to benchmark your results, and benchmark often.

Although the GPU itself is fast, its communication channels (and especially passing data from the GPU back to the CPU) are much slower. So make sure you choose to accelerate operations where the data is small, and avoid (as much as you can) returning any data to the CPU.