Decoder specification

Conventions

Headers are described in some tables. Each row of those tables describes a value which may be read from the frame. Those tables and rows are presented in the order of appearance in the frame.

Here are the meaning of each columns:

size: The size of this value in bits. Bits are counted in LSB to MSB order. As an example, with the byte 01110000b, reading 3 bits then 5 bits will return 000b then 01110b. Reading more than 8 bits thus reads as a little-endian value. Think of the get_bits function as filling up the return value from its LSB, using the bits from each byte starting from their LSB.

name: Kind of variable name, used to reference the value. When a value is named valueX, it generally means we don't know it's purpose. Lines named alignmentX means that bits reader need to skip bits until next byte boundary.

condition: The value is present in the frame only if this condition is matched. No condition means that the value is always present.

Picture header

The first two bytes of a frame tell the decoder how the following data should be interpreted. These include three fields:

size in bits

name

value(s)

comments

5

PSC

always = 0x1F

indeo5 picture start code

3

frame_type

0 => INTRA (key) frame

1 => INTER frame

2 => droppable INTER frame (scalability mode only)

3 => droppable INTER frame

4 => NULL frame

5...7 are illegal

frame type

8

frame_number

0...0xFF

frame number in GOP (0 for I frame)

Null frames don't contain anything else than this header.

GOP header

This header is present in INTRA (key) frames only. It's used for transfering of some general information (i.e. picture layout) that will be either rarely or never changed during a video sequence. The values in this header are valid for all frames in the GOP.

ID of slice size. Only present if "local decoding mode" is enabled (indicated by the bit 6 of the gop_flags).

2

luma_levels

0 => no decomposition

1 => 1 level

2 => 2 levels

3 => forbidden

Number of wavelet decomposition levels for the luma plane. Number of resulting wavelet subbands can be calculated using the following equation: num_bands = luma_levels * 3 + 1.

1

chroma_levels

0 => no decomposition

1 => forbidden

Number of wavelet decomposition levels for the chrominance planes. The value of "1" is forbidden because no knowing indeo5 software performs any decomposition of the chrominance planes.

4

pic_size_id

Index into the table of the standard picture sizes. If the picture has dimensions not listed in the table then this field contains the value of "15" and the actual picture size will be coded using pic_height and pic_width fields.

Array of the Band_info structures describing each luminance band. For a description how to calculate the number of the luminance bands see here: luma_levels.

6-8

band_info_chroma

Array of the Band_info structures describing each chrominance band. Because the chrominance planes are being NEVER decomposed by the existing indeo5 software there is only one band per chrominance plane and therefore only one descriptor of this type.

Unknown band header extension. Its content will be ignored by the known indeo5 decoders. Only present in the bitstream if indicated by the band_flags bit 5.

??

alignment6

Align the biteader on the next byte.

Scalability mode

This special feature of Indeo5 allows the decoder to adapt playback to the processor power of the particular machine being used for playback. Indeo5 offers both spatial and temporal scalability. Read more about that technique here: Scalable Video Coding.

Spatial scalability

Spatial scalability works by dividing the image into a number of frequency bands using wavelet decomposition. These bands represent the image at a different level of sharpness. All bands are necessary to perfectly recreate the original image. But if there is not enough processor power available, the decoder can decompress fewer bands of each frame, rather than simply dropping frames. This produces blurry images, but preserves the motion.

The scalability mode is controlled by the user during encoding. If this mode is disabled the encoder acts like an usual block-based transform compression algorithm: each of the three color planes will be processed using the Slant transform, quantization and Huffman coding.
If the scalability mode is enabled the encoder first performs subband decomposition using the Discrete Wavelet Transform (DWT). Although each color plane could be theoretically decomposed Indeo5 performs that only on the luminance plane data. This decomposition results in four wavelet bands, each of them is one-fourth of the original picture size. Further those band will be compressed using the Slant transform, quantization and Huffman coding.

Wavelet transform

The wavelet used in Indeo5 for decomposition/recomposition purposes is referred as CDF 5/3 or LeGall wavelet. It uses in a slightly different form in many other compression algorithms like JPEG 2000 or Snow. The coefficients for the analysis filters (encoder) are:

h0 = {-1, 2, 6, 2, -1} * 1/8
h1 = {1, -2, 1} * 1/4

where "h0" is the low-pass filter and "h1" is the high-pass filter.

The coefficients for the synthesis filters (decoder) are:

h0 = {1, 2, 1} * 1/2
h1 = {1, 2, -6, 2, 1} * 1/4

where "h0" is the low-pass filter and "h1" is the high-pass filter.

This wavelet transform has the following advantages:

- it allows an integer implementation

- a fast algorithm (lifting) exists

- it produces better quality images than the Haar wavelet used in Indeo 4 for the same purpose

- it allows the perfect reconstruction of the input signal

Wavelet bands

The Wavelet transform produces four wavelet bands whose properties are summarized in the table below:

band

name

dimensions

frequency components

transform

0

LL

width = pic_width/2

height = pic_height/2

Low freqs in both horizontal and vertical directions

2D Slant 8x8

1

HL

width = pic_width/2

height = pic_height/2

Low freqs in the horizontal direction

High freqs in the vertical direction

1D Row Slant

2

LH

width = pic_width/2

height = pic_height/2

High freqs in the horizontal direction

Low freqs in the vertical direction

1D Column Slant

3

HH

width = pic_width/2

height = pic_height/2

High freqs in both horizontal and vertical directions

No transform

The type of the transform used to process a particular band is chosen according to its frequency content. The low frequency image components are the most important components for visual sensitivity. Therefore the transform is selected so that it can process the low frequency components more efficiently than the high frequency ones. For example, the two-dimensional slant transform is used to process the band 0 because it contains the low frequency components in both horizontal and vertical directions. But the band 1 contains low frequency components only in the horizontal direction that's why the one-dimensional slant transform applied to each of the 8 rows in a 8x8 block is used. Similar to it, the band 2 uses the one-dimensional slant transform applied to each of the 8 columns in a 8x8 block. The band 3 contains only high frequency components in both directions therefore no transform is applied to its data. This band will be coded using quantization and entropy coding only.

Wavelet recomposition

The following section describes the wavelet recomposition - the last stage of the indeo5 decoder reconstructing an image from a plurality of wavelet bands. It receives up to four separate bands (labeled b0-b3) and generates recomposed plane data by performing two-dimensional wavelet synthesis.

Temporal scalability

In order to achieve the temporal scalability Indeo5 introduces special droppable frames. The main advantage of such frames is that those can be skipped without damaging the whole video sequence. If there is not enough processor power available, the decoder can decompress fewer frames and thus display the video at reduced frame rate.

The 'plane_state' states come from plane parsing; they are yet to be connected to the previous data.

block_state4 is too complicated to explain here, sorry!

Block data

Follows block header. One of these for each plane that has 'plane_flags&1'. The variable 'run' starts at -1 and carries over from one coded plane to the next. I don't really know what I'm doing with vlc's so the names might not be correct... but their functional description is.

size

name

condition

nb times

comments

vlc

vlc

while (vlc != vlcEnd)

vlc

run_add

vlc == vlcEsc

run += run_add + 1

vlc

lindex_lo

lindex = lindex_lo | (lindex_hi<<6)

vlc

lindex_hi

If vlc != vlcEsc then run_add is run_table[vlc], lindex is lindex_table[vlc].

The values of vlcEnd and vlcEsc are variable, as is the vlc table itself. However, they are all fixed for all the planes in the same block. run_table, lindex_table, scan_table are also fixed-per-block. level_tables is per-plane.

Annexes

Standard picture sizes

pic_size_id

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

width

640

320

160

704

352

352

176

240

640

704

80

88

0

0

0

custom

height

480

240

120

224

240

288

144

180

240

240

60

72

0

0

0

custom

Band_info structure

This structure is a part of the GOP header and describes a wavelet band. Its size is usually 6 bits but can be extended up to 8 bits if the ext_trans field is present. The same structure is used to describe both luminance and chrominance bands.

size

name

condition

value(s)

comments

1

mv_res

0 - fullpel

1 - halfpel

Motion vector resolution.

1

mb_size_id

0 => double

1 => single

Macroblock size factor. The real size of the macroblock should be calculated as follows: mb_size = blk_size_id << !mb_size_id.

1

blk_size_id

0 => 8x8

1 => 4x4

Block size id.

1

trans_flg

0 => standard

1 => non-standard

If this flag is set the field ext_trans specifies a transform used to code this band explicitely. Otherwise the default transform is used.