How to Encode Video for HLS Delivery

HTTP Live Streaming (HLS) is a simple and elegant architecture created by Apple for delivering adaptive bit rate streams to iOS devices and compatible browsers, essentially Safari. Since its release, HLS has been incorporated into technologies that enable desktop computers to play HLS streams with Flash installed (JW Player) or within HTML5 browsers (THEOplayer from OpenTelly). HLS has also been (poorly) adopted by Google for Android and incorporated into most (if not all) OTT platforms like Roku. Though Dynamic Adaptive Streaming via HTTP (DASH) gets all the press, HLS gets all the eyeballs, and is as close to a “one-spec-fits-all” technology as is available in the adaptive streaming space.

If you’re submitting an app to the Apple App store that incorporates video playback over cellular networks, you must use HTTP Live Streaming if the video exceeds either 10 minutes duration or 5MB of data in a five-minute period, or roughly a stream with a data rate of 133Mbps. In these cases, you must also incorporate at least one audio stream at 64Kbps or lower bandwidth, either with or without a still image.

For all these reasons, understanding how to produce for HLS is a critical skill for most streaming producers. After describing how HLS works, I’ll cover the four phases of HLS production: configuring the variants, encoding the variants, creating the segmented data and metadata files, and validating the streams.

More About HLS

Though the name implies only live streaming, HLS can also distribute on-demand videos. Beyond simple playback, the architecture includes features like AES-128-bit encryption, CEA-608 closed captions, and timed metadata capabilities like opening a web page automatically when the stream is played.

Figure 1. How HLS works.

The HLS encoding and playback schema is shown in Figure 1. Like all HTTP-based adaptive streaming technologies, HLS encodes the original video into multiple variants at various resolutions and bitrates. It then divides each variant into multiple segments.

The location of each segment is defined in a index file with a .M3U8 extension, which you can see off to the right of each variant. A master .M3U8 file, on the extreme right of the figure, describes the data rate, resolution and other characteristics of each variant, and the location of the index file for that variant (Figure 2). All these are uploaded to a standard HTTP web server.

To trigger playback, you create a link to the master index file. During playback, the HLS-compatible device checks the master .M3U8 file and retrieves the first segment (segment 1) from the first variant listed in that file (the red arrow). Then it monitors bandwidth conditions. If bandwidth is plentiful, the device will check the master .M3U8 file, find the location of a higher-quality stream, check that stream’s .M3U8 file for the location of the next segment (segment 2), and retrieve and play that segment. If bandwidth status is not good, the device will perform the same basic procedure, but find and retrieve the next segment from a lower-quality stream. During playback, the device continuously monitors bandwidth conditions, changing streams as necessary to continue playing the highest-quality stream.

Job one when producing for HLS is to choose the number of variants and their configuration. So let’s start there.

Configuring the Variants

Anyone producing for HLS should start with a look at Apple Technical Note TN2224, a sampling of which is shown in Table 1. What’s important is not so much the precise configurations recommended, but the recognition that you’re producing for three different scenarios: low bitrate for cellular connections, moderate bitrate for cellular and Wi-Fi connections on older devices, and very high bitrates for exceptional quality on newer and high-end devices. This segmentation is particularly important when creating a single set of streams for mobile, computer and OTT playback, such as when you might be using the JW Player to deliver HLS streams to Flash enabled desktops.

Table 1. Apple’s recommendations for variants in TN2224.

When configuring your streams, you should consider each segment individually. For cellular, ask the question, “What’s the lowest-speed/quality configuration we want to distribute?” Besides the audio-only file, TN2224 recommends a 416×234 stream at 200Kbps video/64 Kbps audio, but many producers provide a lower quality stream, say at 100Kbps video/64Kbps audio for those watching on very slow cellular connections.

Then consider the middle tier. For full screen playback on iPhones, 640×360 is a reasonable configuration, but iPads (and desktops) will play the video in the playback window on your web site. Since it’s most efficient encoding- and playback-wise to encode/playback video at the same size as the display window, you should also have at least one variant for each video playback window on your website.

The 960 and higher-resolution screens are all for full-screen or OTT playback. Here, the question is “How much can we afford?” In other words, send the highest quality stream you can within the fiscal constraints of your monetization program.

How many streams do you need? That depends upon a number of interrelated factors, including the following:

The original resolution of the video–you need more for HD than for SD.

Whether the customer is paying for the video–usually you need more for subscription services than for free Internet video.

The configuration of your lowest and highest quality streams–you need sufficient streams to provide a good quality stream at all relevant connection speeds.

There is no magic number, but Apple recommends that bitrates be a factor of 1.5 to 2x apart; otherwise, the streams are very similar in quality and you’re wasting encoding resources and storage space.

Otherwise, when choosing your variants, you must use the same aspect ratio, you can’t switch from 4:3 to 16:9 or vice versa. Note that if you’re encoding 4:3 source videos, there’s a separate table in TN2224 for those files.

In addition, as Table 1 suggests, don’t worry about mod-16, or a file resolution with the width and height both divisible by 16. Many compressionists recommend mod-16 because H.264 uses 16×16 blocks to encode the video file, and mod-16 files are the most efficient to encode. Typically, however, the playback windows chosen by website developers dictates the resolution, which is why 640×360 is the most widely-used resolution on the planet. No worries, at this resolution, the inefficiency of using a non-mod16 resolution is very small, well into the lower single digits.

Finally, some data rates used by Apple are unnecessarily high, particularly at the upper end. For example, YouTube and ESPN deliver 720p video at 2.5Mbps, almost half the lowest data rate recommended by Apple. Similarly, 1.8Mbps for 640×360 is quite generous. So you might adjust the data rates downward to save bandwidth dollars, but the overall schema is quite sound.

Once you’ve chosen the number and configuration of your variants, it’s time to encode the files.

Encoding the Variants

At some point, I should mention that HLS is only compatible with H.264; now is a good time as any. Note that Apple changes the profile used for each variant to maintain compatibility with older devices. This is essential, or the video files won’t play, so this is one area where I wouldn’t diverge from Apple’s recommendations.

There are several other critical areas of focus; let’s take those one at a time.

Keyframe Settings

TN2224 directs that each segment have at least one IDR keyframe per segment, most preferably at the beginning of the segment. Complying with this will involve multiple configurations that will vary by encoding tool.

First, if the encoding tool gives you the option, set all keyframes to be IDR frames. If this option isn’t provided, don’t worry, invariably, the encoding tool is making each keyframe an IDR frame. Next, make sure the keyframe interval used to encode the file is consistent for all variants, and divides evenly into the segment size. At the recommended segment size of ten seconds, you should use a keyframe interval of one, five or ten.

Most encoders have an option to insert keyframes at scene changes, which can improve stream quality. When available, don’t enable this option unless you’re certain that this won’t reset the keyframe interval, which could result in the first frame of a segment not being a keyframe. For example, some encoders, like Sorenson Squeeze, offer a control to enable “Fixed I-Frames Distance,” which ensures that there’s a keyframe at the specified interval, even if another intervening keyframe was inserted at a scene change. When this is available, you should always enable it.

Bitrate Control

The HLS schema works best when the data rate of each variant is consistent. For this reason, you should encode your streams using either constant bitrate (CBR) encoding, or constrained variable bit rate (VBR) encoding, with a maximum data rate of 125-150% of the target data rate.

As we’ll discuss below, data rate consistency is one of the file characteristics checked by Apple’s MediaValidator tool. If the actual data rate of the file exceeds the listed data rate by more than 10%, you’ll see an error message like that shown in Figure 3. In the figure, segment 16 was off-target by 54%. Interestingly, I produced that error by encoding the file in Sorenson Squeeze using VBR constrained to 300% of the target. Since the final segment contained the most motion, that’s where Squeeze packed the most data, resulting in the error. Note that when I encoded using CBR, which is Squeeze’s default for HLS video, the files passed MediaValidator’s scrutiny without any problems.

Figure 3. This file failed in Media Validator because the segment bandwidth exceeded the target bandwidth.

Audio

As Table 1 reflects, Apple recommends that you encode all variants using the same audio parameters. Though not stated in TN2224, this is because switching audio parameters during playback can cause popping or other audible artifacts. Because the recommended audio data rate is rather low, some authorities recommend using High Efficiency AAC (HE-AAC), rather than the Low-Complexity profile (AAC-LC), because HE-AAC delivers superior quality at lower bitrates.

If you decide to use different parameters to reward your high end viewers with a superior audio experience, use the same sample rate and change the data rate or number of channels (mono or stereo) in the higher-end streams.

Segmenting Your File

After encoding your files, you need to create the segments and index files, for which there are many options. For example, once you become an iOS developer ($99/year), you can download Apple’s HTTP Live Streaming Tools, which include the aforementioned Media Stream Validator, and the Media Stream Segmenter and Media File Segmenter. Both the segmenters are command line tools that create the segments and index files. The Media Stream Segmenter works with live and disk-based MPEG-2 transport stream files, while the Media File Segmenter works with disk-based MP4 files. For more information on using these tools, check out Apple’s HTTP Live Streaming Overview.

In terms of segment duration, the most confusing aspect of TN2224 is the recommendation of a segment size of ten seconds, and a keyframe interval of three seconds, as this wouldn’t seem to produce a keyframe at the start of each segment. Interestingly, the new default settings in Apple Compressor 4.1 follow these recommendations, creating a segment duration of ten seconds, but using a keyframe interval of three seconds.

In contrast, most authorities recommend making sure that the keyframe interval divides evenly into the segment size. For example, cloud encoder Zencoder’s well-written Best Practices for Encoding HLS Video states, “keyframe rate should be an even interval of the segment size.” As you see in Figure 4, all of the HLS templates implemented in Sorenson Squeeze ensure that the keyframe interval divides evenly into the segment duration, called a fragment in the application UI. As discussed above, that’s what I recommend as well, though I’ve asked Apple to explain this inconsistency, and await word back.

Encode and Segment

Beyond Apple’s tools, there are a number of programs that can encode your files and create the segments and metadata files in one operation, which is the simplest workflow. One of the least expensive is Apple Compressor 4.0 ($49.99), which only runs on the Mac. Sorenson Squeeze is cross platform and starts at $549, and offers one of the most elegant interfaces for producing HLS output, with all variants shown within a single interface (Figure 4) that ensures a common keyframe interval and consistent audio parameters.

Figure 4. Sorenson Squeeze has a very elegant interface for HLS output.

Telestream Episode, which starts at $594, can also output segments and index files, but only via command line arguments, not through the user interface. Most enterprise and cloud encoders can output HLS compatible segments and index files via the user interface or application programming interface (API).

Testing Your Files

MediaStream Validator is a command line tool included with Apple’s HTTP Live Streaming Tools. In essence, during testing, it simulates HLS playback, so you have to upload your files to a web server (or install a web server on your computer) to run the tool.

Figure 5. Running Apple’s Media Stream Validator.

Operation is simple and the command line argument is shown in Figure 5. Run time will depend upon the length of your video and the number of segments. For reference, it took for the program 65 seconds to analyze the nine streams shown in Figure 5 for a 90-second test file.

The tool can reveal a range of problems in your files, including the bandwidth issue discussed above. It will also send an error message if your segments don’t have IDR keyframes. Apple explains all error messages produced by Media Streaming Validator in Technical Note TN2235. Lack of compliance with all HLS requirements may cause playback to fail, so testing your files before making them available is recommended. If you’re submitting an application with media files to the App store for approval, you should definitely test first with Media Stream Validator. For more on this, check Apple’s Technical Q&A QA1767 entitled Resolving App Store Approval issues for HTTP Live Streaming.

Author: Jan Ozer

A leading expert on H.264 encoding for live and on-demand production, and as contributing editor to Streaming Media Magazine, has tested most cloud, enterprise and desktop encoding tools, worked with most online video platforms (OVPs) and live streaming services, and many webcast platforms.