The h.264 Sequence Parameter Set

This is a follow-up to my World’s Smallest h.264 Encoder post. I’ve received several emails asking about precise details of things in two entities in the h.264 bitstream: the Sequence Parameter Set (SPS) and the Picture Parameter Set (PPS). Both entities contain information that an h.264 decoder needs to decode the video data, for example the resolution and frame rate of the video.

Recall that an h.264 bitstream contains a sequence of Network Abstraction Layer (NAL) units. The SPS and PPS are both types of NAL units. The SPS NAL unit contains parameters that apply to a series of consecutive coded video pictures, referred to as a “coded video sequence” in the h.264 standard. The PPS NAL unit contains parameters that apply to the decoding of one or more individual pictures inside a coded video sequence.

In the case of my simple encoder, we emitted a single SPS and PPS at the start of the video data stream, but in the case of a more complex encoder, it would not be uncommon to see them inserted periodically in the data for two reasons—first, often a decoder will need to start decoding mid-stream, and second, because the encoder may wish to vary parameters for different parts of the stream in order to achieve better compression or quality goals.

In my trivial encoder, the h.264 SPS and PPS were hardcoded in hex as:

/* h.264 bitstreams */

const uint8_t sps[] =

{0x00, 0x00, 0x00, 0x01, 0x67, 0x42, 0x00, 0x0a, 0xf8, 0x41, 0xa2};

const uint8_t pps[] =

{0x00, 0x00, 0x00, 0x01, 0x68, 0xce, 0x38, 0x80};

Let’s decode this into something readable from the spec. The first thing I did was to look at section 7 of the h.264 specification. I saw that at a minimum I had to choose how to fill in the SPS parameters in the table below. In the table, as in the standard, the type u(n) indicates an unsigned integer of n bits, and ue(v) indicates an unsigned exponential-golomb coded value of a variable number of bits. The spec doesn’t seem to define the maximum number of bits anywhere, but the reference encoder software uses 32. (People wishing to explore the security of decoder software may find it interesting to violate this assumption!)

Parameter Name

Type

Value

Comments

forbidden_zero_bit

u(1)

0

Despite being forbidden, it must be set to 0!

nal_ref_idc

u(2)

3

3 means it is “important” (this is an SPS)

nal_unit_type

u(5)

7

Indicates this is a sequence parameter set

profile_idc

u(8)

66

Baseline profile

constraint_set0_flag

u(1)

0

We’re not going to honor constraints

constraint_set1_flag

u(1)

0

We’re not going to honor constraints

constraint_set2_flag

u(1)

0

We’re not going to honor constraints

constraint_set3_flag

u(1)

0

We’re not going to honor constraints

reserved_zero_4bits

u(4)

0

Better set them to zero

level_idc

u(8)

10

Level 1, sec A.3.1

seq_parameter_set_id

ue(v)

0

We’ll just use id 0.

log2_max_frame_num_minus4

ue(v)

0

Let’s have as few frame numbers as possible

pic_order_cnt_type

ue(v)

0

Keep things simple

log2_max_pic_order_cnt_lsb_minus4

ue(v)

0

Fewer is better.

num_ref_frames

ue(v)

0

We will only send I slices

gaps_in_frame_num_value_allowed_flag

u(1)

0

We will have no gaps

pic_width_in_mbs_minus_1

ue(v)

7

SQCIF is 8 macroblocks wide

pic_height_in_map_units_minus_1

ue(v)

5

SQCIF is 6 macroblocks high

frame_mbs_only_flag

u(1)

1

We will not to field/frame encoding

direct_8x8_inference_flag

u(1)

0

Used for B slices. We will not send B slices

frame_cropping_flag

u(1)

0

We will not do frame cropping

vui_prameters_present_flag

u(1)

0

We will not send VUI data

rbsp_stop_one_bit

u(1)

1

Stop bit. I missed this at first and it caused me much trouble.

Some key things here are the profile (profile_idc) and level (level_idc) that I chose, and the picture width and height. If you encode the above table in hex, you will get the values in the SPS array declared above.

A question I got a couple of times in email was about the width and height parameters—specifically, what to do if the picture width or height is not an integer multiple of macroblock size. Recall that, for the 4:2:0 sampling scheme in my encoder, a macroblock consists of 16×16 luma samples. In this case, you would set the frame_cropping_flag to 1, and reduce the number of pixels in the horizontal and vertical direction with the frame_crop_left_offset, frame_crop_right_offset, frame_crop_top_offset, and frame_crop_bottom_offset parameters, which are conditionally present in the bitstream only if the frame_cropping_flag is set to one.

One interesting problem that we see fairly often with h.264 is when the container format (MP4, MOV, etc.) contains different values for some of these parameters than the SPS and PPS. In this case, we find different video players handle the streams differently.

A handy tool for decoding h.264 bitstreams, including the SPS, is the h264bitstream tool. It comes with a command line program that decodes a bitstream to the parameter names defined in the h.264 specification. Let’s look at its output for a sample mp4 file I downloaded from youtube. First, I extract the h.264 NAL units from the file using ffmpeg:

The NAL units now reside in the file of.h264. I then run the h264_analyze command from the h264bitstream package to produce the following output:

h264_analyze of.h264

!! Found NAL at offset 4 (0x0004), size 25 (0x0019)

==================== NAL ====================

forbidden_zero_bit : 0

nal_ref_idc : 3

nal_unit_type : 7 ( Sequence parameter set )

======= SPS =======

profile_idc : 100

constraint_set0_flag : 0

constraint_set1_flag : 0

constraint_set2_flag : 0

constraint_set3_flag : 0

reserved_zero_4bits : 0

level_idc : 31

seq_parameter_set_id : 0

chroma_format_idc : 1

residual_colour_transform_flag : 0

bit_depth_luma_minus8 : 0

bit_depth_chroma_minus8 : 0

qpprime_y_zero_transform_bypass_flag : 0

seq_scaling_matrix_present_flag : 0

log2_max_frame_num_minus4 : 3

pic_order_cnt_type : 0

log2_max_pic_order_cnt_lsb_minus4 : 3

delta_pic_order_always_zero_flag : 0

offset_for_non_ref_pic : 0

offset_for_top_to_bottom_field : 0

num_ref_frames_in_pic_order_cnt_cycle : 0

num_ref_frames : 1

gaps_in_frame_num_value_allowed_flag : 0

pic_width_in_mbs_minus1 : 79

pic_height_in_map_units_minus1 : 44

frame_mbs_only_flag : 1

mb_adaptive_frame_field_flag : 0

direct_8x8_inference_flag : 1

frame_cropping_flag : 0

frame_crop_left_offset : 0

frame_crop_right_offset : 0

frame_crop_top_offset : 0

frame_crop_bottom_offset : 0

vui_parameters_present_flag : 1

=== VUI ===

aspect_ratio_info_present_flag : 1

aspect_ratio_idc : 1

sar_width : 0

sar_height : 0

overscan_info_present_flag : 0

overscan_appropriate_flag : 0

video_signal_type_present_flag : 0

video_signal_type_present_flag : 0

video_format : 0

video_full_range_flag : 0

colour_description_present_flag : 0

colour_primaries : 0

transfer_characteristics : 0

matrix_coefficients : 0

chroma_loc_info_present_flag : 0

chroma_sample_loc_type_top_field : 0

chroma_sample_loc_type_bottom_field : 0

timing_info_present_flag : 1

num_units_in_tick : 100

time_scale : 5994

fixed_frame_rate_flag : 1

nal_hrd_parameters_present_flag : 0

vcl_hrd_parameters_present_flag : 0

low_delay_hrd_flag : 0

pic_struct_present_flag : 0

bitstream_restriction_flag : 1

motion_vectors_over_pic_boundaries_flag : 1

max_bytes_per_pic_denom : 0

max_bits_per_mb_denom : 0

log2_max_mv_length_horizontal : 11

log2_max_mv_length_vertical : 11

num_reorder_frames : 0

max_dec_frame_buffering : 1

=== HRD ===

cpb_cnt_minus1 : 0

bit_rate_scale : 0

cpb_size_scale : 0

initial_cpb_removal_delay_length_minus1 : 0

cpb_removal_delay_length_minus1 : 0

dpb_output_delay_length_minus1 : 0

time_offset_length : 0

The only additional thing I’d like to point out here is that this particular SPS also contains information about the frame rate of the video (see timing_info_present_flag). These parameters must be closely checked when you generate bitstreams to ensure they agree with the container format that the h.264 will eventually be muxed into. Even a small error, such as 29.97 fps in one place and 30 fps in another, can result in severe audio/video synchronization problems.

Contact Us

We rely on Cardinal Peak for their ability to bolster our patent licensing efforts with in-depth technical guidance. They have deep expertise and they’re easy to work with.

Diego deGarridoSr. Manager, LSI

Cardinal Peak has a strong technology portfolio that has complemented our own expertise well. They are communicative, drive toward results quickly, and understand the appropriate level of documentation it takes to effectively convey their work. In…

Jason DamoriDirector of Engineering, Biamp Systems

We asked Cardinal Peak to take ownership for an important subsystem, and they completed a very high quality deliverable on time.

Matt CowanChief Scientific Officer, RealD

Cardinal Peak’s personnel worked side-by-side with our own engineers and engineers from other companies on several of our key projects. The Cardinal Peak staff has consistently provided a level of professionalism and technical expertise that we…

Sherisse HawkinsVP Software Development, Time Warner Cable

Cardinal Peak was a natural choice for us. They were able to develop a high-quality product, based in part on open source, and in part on intellectual property they had already developed, all for a very effective price.

Bruce WebberVP Engineering, VBrick

We completely trust Cardinal Peak to advise us on technology strategy, as well as to implement it. They are a dependable partner that ultimately makes us more competitive in the marketplace.

Brian BrownPresident and CEO, Decatur Electronics

The Cardinal Peak team started quickly and delivered high-quality results, and they worked really well with our own engineering team.

Charles CorbalisVP Engineering, RGB Networks

We found Cardinal Peak’s team to be very knowledgeable about embedded video delivery systems. Their ability to deliver working solutions on time—combined with excellent project management skills—helped bring success not only to the product…

Ralph SchmittVP, Product Marketing and Engineering, Kustom Signals

Cardinal Peak has provided deep technical insights, and they’ve allowed us to complete some really hard projects quickly. We are big fans of their team.

Scott GarlingtonVP Engineering, xG Technology

We’ve used Cardinal Peak on several projects. They have a very capable engineering team. They’re a great resource.

Greg ReadSenior Program Manager, Symmetricom

Cardinal Peak has proven to be a trusted and flexible partner who has helped Harmonic to deliver reliably on our commitments to our own customers. The team at Cardinal Peak was responsive to our needs and delivered high quality results.

Alex DerechoVP Professional Services, Harmonic

Yonder Music was an excellent collaboration with Cardinal Peak. Combining our experience with the music industry and target music market, with Cardinal Peak’s technical expertise, the product has made the mobile experience of Yonder as powerful as…

Adam Kidronfounder and CEO, Yonder Music

The Cardinal Peak team played an invaluable role in helping us get our first Internet of Things product to market quickly. They were up to speed in no time and provided all of the technical expertise we lacked. They interfaced seamlessly with our i…

Kevin LeadfordVice President of Innovation, Acuity Brands Lighting

We asked Cardinal Peak to help us address a number of open items related to programming our systems in production. Their engineers have a wealth of experience in IoT and embedded fields, and they helped us quickly and diligently. I’d definitely…