Platform Support

Intel / i965

AMD / Mesa

H.264, MPEG-2, MPEG-4 part 2 and VC-1 decode are supported by all GCN GPUs (since Southern Islands). H.265 support was added with GCN 3 (Volcanic Islands) and H.265 10-bit with GCN 4 (Arctic Islands). Older GPUs may or may not be supported.

MPEG-4 part 2 is disabled by default due to VAAPI limitations (the main Intel driver never implemented it, so it didn't get much testing). Set the environment variable VAAPI_MPEG4_ENABLED to 1 to try to use it anyway.

H.264 encode is working on GCN GPUs, but is still incomplete. No other codecs are supported by Mesa for encode yet.

Encoding and interlacing support in Mesa are incompatible because of the data layout in GPU memory. By default, frames are separated into fields and interlaced video is supported but encoding is not. Set the environment variable VAAPI_DISABLE_INTERLACE to 1 to be able to use the encoder (but without any interlaced video support).

Device Selection

The libva driver needs to be attached to a DRM device to work. This can be connected either directly or via a running X server. When working standlone, it is generally best to use a DRM render node (/dev/dri/render*) - only use a connection via X if you actually want to deal with surfaces inside X (with DRI2, for example).

In ffmpeg, a named global device can be created using the -init_hw_device option:

ffmpeg -init_hw_device vaapi=foo:/dev/dri/renderD128

With the decode hwaccel, each input stream can then be given a previously initialised device with the -hwaccel_device option:

Surface Formats

The hardware codecs used by VAAPI are not able to access frame data in arbitrary memory. Therefore, all frame data needs to be uploaded to hardware surfaces connected to the appropriate device before being used. All VAAPI hardware surfaces in ffmpeg are represented by the vaapi pixfmt (the internal layout is not visible here, though).

The hwaccel decoders normally output frames in the associated hardware format, but by default the ffmpeg utility download the output frames to normal memory before passing them to the next component. This allows the decoder to work standlone to make decoding faster without any additional options:

ffmpeg -hwaccel vaapi ... -i input.mp4 -c:v libx264 ... output.mp4

For other outputs, the option -hwaccel_output_format can be used to specify the format to be used. This can be a software format (which formats are usable depends on the driver), or it can be the vaapi hardware format to indicate that the surface should not be downloaded.

This can be used to test the speed / CPU use of the decoder only (the download operation typically adds a large amount of additional overhead).

When decoder output is in hardware surfaces, the frames will be given to following filters or encoders in that form. The scale_vaapi and deinterlace_vaapi filters act on vaapi format frames to scale and deinterlace them respecitvely. There are also some generic filters - hwdownload, hwupload and hwmap - which support all hardware formats, including VAAPI (see <​http://www.ffmpeg.org/ffmpeg-filters.html#hwdownload>).

For example, take an interlaced input, decode, deinterlace, scale to 720p, download to normal memory and encode with libx264:

Encoding

The encoders only accept input as VAAPI surfaces. If the input is in normal memory, it will need to be uploaded before giving the frames to the encoder - in the ffmpeg utility, the hwupload filter can be used for this. It will upload to a surface with the same layout as the software frame, so it may be necessary to add a format filter immediately before to get the input into the right format (hardware generally wants the nv12 layout, but most software functions use the yuv420p layout). The hwupload filter also requires a device to upload to, which needs to be defined before the filter graph is created.

So, to use the default decoder for some input, then upload frames to VAAPI and encode with H.264 and default settings:

This works because the decoder will output either vaapi surfaces (if the hwaccel is usable) or software frames (if it isn't). In the first case, it matches the vaapi format and hwupload does nothing (it passes through hardware frames unchanged). Im the second case, it matches the nv12 format and converts whatever the input is to that, then uploads. Performance will likely vary by a large amount depending which path is chosen, though.

Mapping options from libx264

No CRF-like mode is currently supported. The only constant-quality mode is CQP (constant quantisation parameter), which has no adaptivity to scene content. It does, however, allow different quality settings for different frame types, to improve compression by spending fewer bits on unreferenced B-frames - see the (i|b)_q(factor|offset) options. CQP mode cannot be combined with a maximum bitrate or buffer size.

CBR and VBR modes are supported, though the output of them varies significantly by driver and device (default is VBR, set -maxrate equal to -b:v for CBR). HRD buffering options (rc_max_rate, rc_buffer_size) are functional, and the encoder will generate buffering_period and pic_timing SEI when appropriate.

There is no complete analogue of the -preset option. The -quality option controls local encode quality (that is, the amount of effort expended on trying to get the best results from local choices like motion estimation and mode decision), using a nebulous per-device scale. The argument is a small integer, from 1 up to some limit dependent on the device (not more than 8) - higher values are faster / lower stream quality. Separately, some hardware (Intel gen9) supports a separate low-power mode with more restricted features. It is accessible via the -low_power option.

Neither two-pass encoding nor lookahead are supported at all - only local rate control is possible. VBR mode should do a reasonably good job at getting close to an overall bitrate target, but quality will vary significantly through a stream if the complexity varies.

Full Examples

All of these examples assume the input and output files will contain one video stream (audio will need to be considered separately). It is assumed that VAAPI is usable via the DRM device node /dev/dri/renderD128.

Decode-only

Decode an input with hardware if possible, output in normal memory to encode with libx264: