Video Encoding Tips

Ever so often I see a poorly encoded video on the internet, which either has black bars, interlacing artifacts, too low bitrate, too large size, incorrect aspect ratio, … and in the worst cases, a mix of all these things. Here are some short tips to reduce the risk of making these mistakes.

The tips give concrete instructions for the program HandBrake, which is a freely available, popular, and good tool for encoding videos.

Ensure your aspect ratio is right

In short: ensure that a circle in the original video is still a circle in your encoded video. For anamorphic material (non-square pixels), you should only convert it to square pixels when downscaling it, otherwise you should preserve the pixels as they are.

The aspect ratio is the width of the displayed video image divided by its height. For instance, classic TV shows had a ratio of 4:3, modern widescreen video typically is 16:9. Make sure that no matter how you rescale your video (e.g. if you you want to reduce the resolution to make it smaller), the proportion of width divided by height remains the same as the original.

There is one caveat here regarding standard-definition TV material. Suppose you have a recording on DVD in standard 480p widescreen format. This means the video frames stored on the DVD will have a resolution of 720×480 pixels. Mind how this does not correspond to a 16:9 ratio even though your TV displays it as such, it is actually a 3:2 ratio. What gives? Well, your TV knows that the video is supposed to be displayed as 16:9 and will therefore stretch it horizontally while rendering the image. This means the pixels are not square, like they are in all recent formats like 720p and 1080p. The cinematographic term for this is ‘anamorphic’ video.

When converting such a DVD to a video file on a hard disk, you may be tempted to make the pixels square. My advice is not to do this unless you are downscaling the video. If you want to simply store the video with minimal loss of image quality, you should keep the original resolution (in this example: 720×480), and ensure the aspect ratio information is preserved, such that the media player knows how to stretch the image. In Handbrake, ensure ‘Keep Aspect Ratio’ remains checked, and ‘Anamorphic’ is set to Auto.

Apply proper deinterlacing

In short: always deinterlace if you know the video material is interlaced: do not rely on ‘smart’ methods unless the material is a mix of interlaced and progressive. Make sure that the output has the same framerate as the original source material (e.g. 24 FPS for a feature film, 50 FPS for a PAL TV show). If the input is certain to have no interlacing, then disable everything related to deinterlacing.
If the input is a movie that was converted to NTSC by a hard telecine process, enable detelecine and force the framerate to 23.976 FPS.

Interlacing is the practice of embedding two video images in one frame, with each odd line of the frame belonging to the first image, and each even line to the second (or vice versa). This stems from the era of cathode ray tube (CRT) televisions. The tube would first print the first set of lines (called the first ‘field’) and then the second. This way, one could transport e.g. 50 fields-per-second material through a 25 frames-per-second transmission. (In the rest of this page, ‘FPS’ always means frames, not fields.) Of course, the vertical resolution of each field was halved, but in case of a static image, each frame retains the full resolution. This was a clever trick to exchange temporal and spatial resolution. Mind that even if a 25 FPS interlaced video stream only has 25 unique images per second, it will still be interlaced: every frame will contain a field with half the previous image, and the other field will contain half the current image.

Interlacing does not play well however with panel displays, which lack the memory effect of a typical CRT that provides a natural smoothing of interlaced material. Moreover, your average computer monitor is totally unaware of whether you are playing something interlaced or not. Many media players simply dump the interlaced frames on-screen, resulting in typical ‘combing’ artefacts as shown in the image. Some media players can perform deinterlacing on-the-fly. If you are re-encoding an interlaced video anyway, it is better to apply high-quality deinterlacing to obtain a progressive video that can be readily played. Removing the interlacing artefacts also makes the video much easier to compress.

Yadif (or deinterlace) vs. decomb

There are two main approaches here: either always deinterlace unconditionally, or try to be smart and only deinterlace for frames (or even parts of each frame) that appear to be interlaced. My experience with software that tries to be smart, is that it rarely manages to get beyond the trying part. If interlace detection fails, you'll end up with stuttering video, as well as remaining artefacts (especially visible in animated material). If you know the source is interlaced, it makes no sense to rely on automatic detection because it will only incur a risk of making the wrong decisions. Just always deinterlace an interlaced source video! The only time when it is warranted to enable interlace detection, is if you know the source contains a mix of interlaced and progressive video that is not easy to separate by manual cutting. (This is usually very rare.) If you know the material is entirely progressive, disable both interlace detection and any form of deinterlacing: you will gain a bit of encoding speed, and eliminate the risk of destroying details that happen to look like interlacing.

The deinterlace settings in HandBrake have changed names quite a few times. In the most recent version at the time of this writing, you can choose between ‘Yadif’ and ‘decomb’. My advice is to only use the ‘Off’ and ‘Yadif’ settings in the deinterlace menu. Avoid decomb, unless you are dealing with the rare case of mixed interlaced and progressive material. I'm not sure if Handbrake's interlace detection setting will disable decomb's attempt at being smart, so it is better to err on the safe side.

(Why don't I like decomb? Well, the makers of decomb claim that automatic interlacing detection is useful even if the source is interlaced from start to end, because it would allow to preserve details in frames where there is so little motion that artefacts are barely visible. This is wrong. A good deinterlacing filter will automatically preserve all detail in a static interlaced video scene, if it can accurately simulate the effect of a CRT combined with persistence-of-vision. If your filter blurs details in a video fragment showing a still image, it simply means the filter is poorly implemented.)

Inverse telecine

Now that we have explained the above, the second question is what the framerate of the result should be. This is not a trivial question because a 30 FPS video might for instance represent a 24 FPS feature film, 60 fields-per-second video, or even a 25 FPS European TV show.

When converting a 24 FPS film to NTSC format, the so-called telecine process is used. This duplicates frames and interlaces them in a specific way to obtain 30 FPS. (For the nitpickers: the actual framerate will be 29.97, with the film equivalent being at 23.976.) The good news is that many DVDs do not use a ‘hard telecine’ where the process is applied to the encoded video, but instead contain progressive video with markers that tell the playback device to perform telecine. When encoding such material, it requires neither inverse telecine nor deinterlacing. If you encounter a hard telecined video however, you need to enable an inverse telecine filter (‘detelecine’ in HandBrake).

Very important: when using detelecine, set the framerate of the output to 23.976 FPS. If you would keep the original 29.97 FPS, you would end up with one duplicate frame every 5 frames, which looks jittery. If you encounter a poorly made encoding that suffers from this, it can be fixed but not reliably in HandBrake. Only if you are extremely lucky you might be able to re-encode it at 23.976, and the framerate reduction might happen to remove exactly the duplicate frame every time. In general this is utopian and you'll end up with both dropped and duplicate frames, which looks awful. The proper way to fix this is to use FFmpeg with its ‘decimate’ filter combined with ‘setpts’ to reliably delete the duplicate frames.

(Note: in Europe the discrepancy between film and TV framerates is handled differently: because the 25 FPS framerate from PAL TV is close enough to 24, the film is simply sped up by 4.2%. The audio as well, which is why you would hear for instance Walter White on a European TV speak in a pitch almost a semitone higher than in the USA.)

Regular deinterlace vs. ‘Bob’

The first implementations of deinterlace methods only tried to produce one deinterlaced output frame per input frame. As I explained above, video may contain two unique fields per frame and in that case one would want two progressive deinterlaced frames per interlaced frame. In HandBrake the deinterlace filters have a ‘Bob’ setting (no idea where the name comes from). This is the one you need if you want to get both fields out of a single interlaced frame. Only enable this if the video really has two fields per frame, otherwise you'll just get each frame repeated twice.

It is unfortunately difficult to give a set of rules for determining correct deinterlace and framerate settings for your output video. You won't find video with two unique fields per frame too often, because a lot of TV material was shot with film cameras running at 24 FPS. Only material directly recorded with electronic video cameras will typically have true 50 or 60 fields per second. When in doubt, my advice is to try the ‘Bob’ filter first, force the framerate to twice that of the source, and do a short test encoding on a fragment of the video. In a player that can advance frame-by-frame, check the output of a scene with a lot of movement. If every frame is unique then you should stick with ‘Bob’ and double FPS for the entire video. If every other frame repeats, you can use regular deinterlacing and stick with the framerate of the source.

If you apply proper deinterlacing or detelecine and set the correct output framerate, the result must have neither repeated frames nor ‘ghosting’ due to mixed fields or frames.

As an extreme example of framerate misery: at one time I (nearly) managed to recover the original 25 FPS video frames from an NTSC conversion of a PAL TV show. Although this conversion is similar to telecine, it is not the same and an inverse telecine filter cannot be used. In HandBrake, I enabled the ‘Bob’ deinterlace filter and forced the framerate to 25, and then I hoped that the offset of the frame resampler was in sync with the process that had duplicated frames for the NTSC conversion. This was the case for many of the episodes, but for quite a few of them I had to manually add or repeat the first few frames to get the sync right. It did not take long before I gave up and simply bought the PAL DVDs. This scenario only makes sense if the PAL source would be unavailable, and again FFmpeg would be a better tool for this.

Remove letterboxing (‘black bars’)

In short: your output file should not have any black bars above or below the image, except in a film with mixed aspect ratios. Always verify whether automatic cropping did not cut away too much!

Remember the part about aspect ratio? What happens when a film has been made in e.g. a 21:9 ratio, and needs to be stored in a format that is forced to always have a 16:9 ratio? The image is not tall enough to fill the entire height. The solution is simple: add black bars or so-called ‘mattes’ above and below the image. This is called ‘letterboxing’. In principle you could just re-encode a letterboxed video as-is, but there are a few potential problems. First, if the black areas are not perfectly black but contain a bit of noise, you will be wasting precious bits on encoding this useless noise. Second, the sharp edges between the black areas and the image will require more bits to encode and can cause compression artefacts at lower bitrates. Third, the black bars may prevent you from displaying the video in full screen, depending on its aspect ratio. Therefore it is highly recommended to crop away any letterboxing when there is no strict requirement on the aspect ratio of the encoded file.

The good news is that programs like HandBrake have a pretty good automatic crop feature. In HB it is enabled by default. You must always have a quick look at the image preview window however, to see if not by a freak chance the program picked all dark frames for detecting the auto-crop, and therefore cuts away too much of the image. You still need to ensure the aspect ratio of the result is right and matches that of the true source material, not of the letterboxed image.

There is one caveat: some films can have a variable letterbox. Sometimes aspect ratios are being switched to achieve a certain effect. Such films, like The Dark Knight, are recorded in a mix of IMAX (1.43:1 aspect ratio) and regular film format (16:9, or 1.78:1). If the autocrop would only look at the regular segments, part of the IMAX image would be cut away. In this case there is no way around leaving the letterbox around the segments with the wider aspect ratio, unless you would cut the film into the pieces with different ARs, encode them separately, and then merge them in a container format that supports multiple ARs (however, hardware players are likely to have problems with this).

Use an appropriate bitrate, or a constant quality

In short: use quality-based encoding whenever you can. Otherwise, use two-pass encoding.

How much bitrate a certain video requires to be encoded with sufficient fidelity that there is no visible quality loss or no obvious degradation, depends on two main factors: the efficiency of the codec and the content of the video. It is obvious that a video where every frame is a perfectly black area requires almost no information to describe (in fact, I just gave the description and it only took 51 bytes). The other opposite would be a video where every frame is perfectly random noise that cannot be predicted. Real videos are anywhere in between. Generally, the more stuff moves the more noise in the image, and the harder to describe the visual content the more bits are needed to represent it adequately.

The efficiency of the codec depends on its capabilities and how it is configured. Generally the longer encoding takes, the more efficient the end result will be because slower encoding usually means a deeper search for the optimal way to encode the frames. In recent versions of HandBrake, the tab with advanced codec settings is gone and is replaced by a single ‘preset’ slider that can be set between ‘ultrafast’ and ‘placebo’ for x264. Drag this slider as far to the right as you can bear until you find encodings to take too long, because the slower the encode the less bits the file will require to offer the same visual quality.

There are two main ways to encode a video file:

Enforce a certain bitrate. Either the bitrate is enforced every second or as an average across the entire file.

Enforce a certain minimum quality. The encoder will dynamically vary the bitrate to ensure the quality meets the minimum.

For some reason the fixed/average bitrate has become the typical way in which most people encode their videos although it makes little sense for modern use cases. The only case where enforcing a certain bitrate per second makes sense, is when streaming the file over a limited capacity channel like digital TV broadcast. The only case where enforcing an average bitrate across the entire file makes sense, is when the video file must fit in a medium of limited size like a DVD. The average user nowadays stores films on hard drives and has a network bandwidth that far exceeds the capacity of HDTV, therefore should not really care about the bitrate as long as it stays within bounds of the limits of their playback device.

If you have no strict requirements on how large the video file must be, using quality-based encoding makes a lot more sense. It has the additional advantage that it only requires a single pass. In HandBrake, select ‘Constant Quality’. The ‘Rate Factor’ (RF) slider for H.264 is an exponential slider: for every notch you move it to the right, the output size of your video file will be roughly multiplied by the same factor. The RF 0 value actually means lossless encoding. The lowest sensible RF value is widely considered to be 18 (beyond that, you're wasting bits on encoding invisible details). The highest sensible RF value is about 28. For high-definition movies, 22 is generally a good value, for DVD material it is better to stay around 20. You can use higher RF values if you want to stuff more movies in the same disk space and don't mind that it will be a little more blurry.

Nevertheless, if you need to ensure a video file has a certain size, a fixed average bitrate is the way to go. My video bitrate calculator can be of use to determine the required bitrate. Very important: when encoding this way, you should always enable two-pass encoding. This will first do a quick run over your video file to determine where to spend the most bits to keep the overall quality as constant as possible, then it will do the actual encoding. It takes more time but the end result will be much closer to the desired file size while having the best possible quality within those limitations, than if you would only do a single pass.

Audio bitrate

When making a file that is intended to be squeezed on a small medium or streamed over a limited bandwidth, it may be tempting to scrounge bits off the audio bitrate in order to keep a somewhat higher video bitrate, but this is generally a bad idea. Audio is usually much smaller to begin with, therefore skimping on audio bitrate won't gain much. It will only cause both video and audio to become bad. Viewers are generally forgiving when the image isn't great if audio is good, but not the other way round. Only if your source has for instance a 5.1 soundtrack and you're encoding for a portable device, you can and should gain a lot by downmixing the surround track to plain stereo.

Also a note about DTS: please never ever transcode something from 1.5 MBps DTS or any other format to 768 kbps DTS—which should never be used. DTS is excellent at full bitrate but quality degrades very quickly when going below that. At half-bitrate it has very audible quality loss especially on music. If you cannot afford to preserve the full DTS bitrate, re-encode at 640kbps AC3 instead, which offers much better quality than 768kbps DTS.

Profiles and Levels

Regardless of whether you encode quality-based or with a target bitrate, you should take care not to exceed limitations of the device you'll be playing your videos on. H.264 has two main ways of specifying playback device capabilities: profiles and levels. Profiles can be ‘baseline,’ ‘main,’ and ‘high,’ and levels are indicated by a number like 4.1. The efficiency of the codec increases with increasing profile. The bitrate, frame size, and frame rate limitations of the playback device increase with increasing level. For the average smart HDTV that can play videos from a USB stick or a network stream, a safe bet for Full HD videos is High profile at level 4.1. If you want to use higher levels, check whether the player can handle them.

Denoise if necessary

In short: if you want to preserve film grain, you will need a very high bitrate. If you want a small file, apply denoising to get good image quality at a low bitrate. NLMeans works best.

Modern codecs like H.264 are pretty good at keeping quality acceptable even at lower bitrates. However, although these codecs do have a kind of denoising effect at low bitrates, below a certain point this breaks down and the codec makes a mess of it. If you have a noisy video source (e.g. low-quality VHS tapes, a DVD of an old TV show, a film with a lot of ‘grain’), and you cannot afford encoding it at the extremely high bitrate that will correctly preserve all the noise, then it is a better idea to filter out as much of the noise as possible before the actual encoding starts. The codec will then have a much easier job at producing a good image at a low bitrate.

Recent versions of HandBrake have two types of denoise filters: the old ‘HQDN3D’ (has nothing to do with Duke Nukem 3D by the way), and the new ‘NLMeans’. The HQDN3D filter is fast and appropriate for material with mild high-frequency noise like minor film grain. If it still does not give good results at the ‘medium’ setting, try the NLMeans filter instead: it is much slower, however it performs much, much better in general. When properly configured, NLMeans can remove most of the noise while preserving details in the image, while HQDN3D will inevitably degrade the entire image. You should first do a few test runs on a fragment of the video to see what kind of result you obtain with various settings, and how much of a reduction in bitrate they offer compared to an encode without denoising.

Above is an example of denoising on a fragment from a film with quite a bit of grain. Hover over the titles to see the different results (requires JavaScript). The bitrate figures shown are for a fixed-quality encoding at RF 21. This shows several things: first, the bitrate requirement is excessive without denoising. Second, although the ‘strong’ HQDN3D setting achieves similar denoising performance as ‘medium’ NLMeans, it destroys most of the fine details while the latter does not. Third, even though NLMeans does preserve more detail, overall it still removes more noise and results in the lowest bitrate at this RF setting.