Appendix 1. Subjective Comparison and Psycho-visual Enhancements

Brief Description

In this work, we analyzed video codecs not only with help of objective
metrics, but also using subjective comparisons using SAMVIQ (Subjective
Assessment Method for Video Quality evaluation) methodology. This new
method was created by the EBU (European Broadcasting Union).

Another task in this part of comparison was to analyze the psycho-visual
enhancement quality during encoding. We tested five video codecs, and
one codec was tested with two almost identical presets: one without
psycho-visual enhancement and another with psycho-visual enhancement.

Methodology: SAMVIQ

Number of experts: 42

Number of sequences: 5

Number of codecs (presets): 6

SAMVIQ Description

During testing, each expert is able to play any sequence from the test
set and give it a mark, and he is able to play a reference video. Marks
are in the range of 0 to 100. More information about the methodology can
be found in the publication SAMVIQ: A new EBU methodology for video
quality evaluations in multimedia (Kozamernik, F., Steinmann, V., Sunna,
P. and Wyckens, E., SMPTE journal, 2005 04 April).

Subjective Assessment Description

Forty-two experts participated in this subjective assessment. The
experts represent a variety of individuals: males and females of ages 18
to 40 who are PC users or video and graphics experts. Each user watched
five video groups, with eight videos in each group. Each video group
contains six encoded videos and one reference video (the viewers were
unaware of which type they were viewing), as well as a reference video
specified as such. Users gave each video a mark of 0 to 100, and the
marks for all users and sequences were averaged. Some extreme results
were discarded before obtaining the final result.

This sequence is a fragment from the beginning of the movie Terminator 2.
The compression of this sequence is the most difficult among all of
the sequences in the analysis. This difficulty is mainly due to three
reasons: continual brightness variation (resulting from explosions and
laser flashes as seen in the picture above), very fast motion and
frequent scene changes. These characteristics often cause codecs to
compress frames as I-frames.

Results

For the “Football” sequence, x264 provides the highest quality. x264
with psycho-visual enhancement provides very similar (almost
undistinguishable) results. The codecs can be rated by visual quality as
follows:

Appendix 2. x264 Over-years Comparison

The quality of an H.264 codec, over several years, can be compared for a
given video sequence. The x264 encoder was chosen for this task because
it is present in almost every MSU VIDEO MPEG-4 AVC/H.264 codec
comparison, and it produces good results compared with other encoders.
Next figures show the position of the x264 codec compared with other
codecs for the “Battle” sequence. For all years except 2005, x264 shows
the best results. For years 2006–2009, we have shown results using
Y-SSIM as the quality metric; for 2005, we did not use this as the main
metric. In light of these results, x264 could be a good reference
encoder for analyzing the overall progress of H.264 encoders over time.

Next figure shows the RD curve for the “Battle” sequence using x264
encoders from different years. The best encoder is this year’s x264; the
worst is the 2005 version.

Using SSIM, the codecs can be ranked as follows:

x264 (2010)

x264 (2009)

x264 (2007)

x264 (2006)

x264 (2005)

These results are shown in next figure. This figure indicates that the
overall progress is very good, and that the x264 encoder has increased
in speed and quality over recent years. But the old x264 does not use
multithreading, so encoding speed varies considerably.

Interestingly, if Y-PSNR is used as the quality metric, x264 does not
exhibit the kind of progress shown in previous figure. This difference
is because for past years, the x264 developers specified, for use in
testing, the presets optimized for the SSIM metric. Also, note that no
encoder (among the different x264 versions) produces the best results
simultaneously for both SSIM and PSNR.

The bitrate handling mechanism for the x264 encoder is quite good for
each version, as following figure indicates. Results for previous x264
versions (0.98 of target bitrate) could be explained by a different
interpretation of kbps (1,024 versus 1,000 bits per second).

The per-frame analysis presented in following figure shows that the main
encoding mechanism did not changed significantly.

Appendix 3. Theora vs. x264 Single-thread Comparison

Multithreaded encoding is not supported by the Theora encoder, but CPUs
with four cores are used for the tests. This situation leads to an
incorrect speed comparison between Theora and other codecs. To eliminate
this disparity, we have turned on the x264 encoder’s single-thread mode
and compared the results with those of Theora. The overall speed/quality
trade-off for “Movies” and “HDTV” use cases is depicted in next figures.

The main conclusion is that even in single-thread mode, Theora is a
little bit slower and produces much poorer quality than does x264.

Appendix 4. Test Set of Video Sequences

Movie Sequences

This sequence is a fragment from the Ice Age 3 animated movie. This
movie has low-contrast portions and high-contrast portions, and it has
many types of motion: camera panning, slow motion and very fast motion.
Also, it has a scene with colors that differ completely from those of
other scenes. Small black letterboxes appear at the top and bottom of
the video.

This sequence is a fragment from the Indiana Jones movie. Compression of
this sequence is difficult for two main reasons: the presence of
low-contrast scenes and the high level of motion in different scenes.
Also, several scenes have very different types of motion, ranging from
almost static scenes with talking people to scenes with strong motion
(for example, the scene where stones fall).

This sequence is a fragment from the Enemy of the State movie. This
sequence includes outdoor scenes with strong motion at the beginning
when the bicyclist runs, as well as scenes with low motion and indoor
scenes with normal motion. This sequence has scenes with different
lighting conditions.

This sequence is a fragment from the Up animated movie. The sequence
contains low-contrast scenes with almost static brightness and
high-colored scenes, and it contains many scenes with a few frames that
include quick scene changes.

HDTV Sequences

This sequence contains scenes with the camera panning over a landscape
view of the Amazon; also, some scene changes take place with the camera
panning again. The video frames consist of many edges and high-contrast,
sharp details, such as leaves and branches. This sequence was downloaded
from the Microsoft website:
http://www.microsoft.com/windows/windowsmedia/musicandvideo/hdvideo/contentshowcase.aspx

This sequence is a part of the Iron Man 2 movie trailer. It has
low-brightness scenes at the beginning, followed by a scene with very
fast motion and scenes with slow camera panning. Some frames are very
blurry. Also, the sequence contains scenes with flashing lights, which
could be a big problem for some encoders. Black letterboxes appear at
the top and bottom of the video.

This close-up sequence is similar to “Mobile&Calendar” and includes a
moving calendar with text and a detailed photo of the Vasa ship. It also
includes a moving train with colorful toys. The background has two types
of wallpaper: one is brown with details, and the other is yellow with
drawn figures. The sequence is very detailed and is normally demanding.
The main potential compression difficulty is the many small, sharp
details on the calendar and on the background.

This sequence is a fragment of the “Troy” movie and contains three parts
with sharp scene changes. The video includes medium scene motion and
slow camera motion. In terms of compression, this sequence is difficult
to compress because of the many small details.

Appendix 5. Tested Codecs and Presets

Codecs

DivX AVC/H.264 Video Encoder

Console encoding program version 1.1.1.6

Presets were chosen by ourselves to meet the comparison requirements

Remarks: Owing to our choice of presets, the results for the DivX
H.264 encoder could be slightly diminished compared with the case where
the developers provide the presets.

Elecard AVC Video Encoder 8-bit edition

Console encoding program version 2.1.022202.091207

Codec and presets were provided by Elecard Ltd Company specifically for
this test

Intel® MediaSDK AVC/H.264

Console encoding program, version 1.10.1.15

Codec and presets were provided by Intel Corp. specifically for this
test

MainConcept AVC/H.264 Video Encoder Console Application

Console encoding version 8.5.0

Codec and presets were provided by MainConcept AG Company specifically
for this test

Microsoft Expression Encoder 3

GUI encoding program

Presets chosen ourselves for the analysis

Remarks: Owing to a long initial GUI loading time, the encoding time
for Microsoft Expression Encoder is significantly higher than for other
encoders. Unfortunately, we were unable to encode multiple files without
the GUI loading each time. This is likely one of reasons for the
encoder’s poor time results.

Codec and presets were provided by developers specifically for this test

Remarks: The presets provided by the developers for this comparison
were specifically chosen for the SSIM metric

Presets

The table below lists the settings used in this comparison for all of
the codecs.

Appendix 6. Figures Explanation

The main charts in this comparison are classical RD curves
(quality/bitrate graphs) and relative bitrate/relative time charts.
Additionally, bitrate handling charts (ratio of real and target
bitrates) and per-frame quality charts were also used.

RD curves

These charts show variation in codec quality by bitrate or file size.
For this metric, a higher curve presumably indicates better quality

Relative Bitrate/Relative Time Charts

Relative bitrate/relative time charts show the dependence on relative
encoding time of the average bitrate for a fixed quality output. The
Y-axis shows the ratio of the bitrate of the codec under test to that of
the reference codec for a fixed quality. A lower value (that is, the
higher the value is on the graph) indicates a better-performing codec.
For example, a value of 0.7 means that codec under test can encode the
sequence under test in a file that is 30% smaller than that encoded by
the reference codec.

The X-axis shows the relative encoding time for the codec under test.
Larger values indicate a slower codec. For example, a value of 2.5 means
that the codec under test works 2.5 times slower, on average, than the
reference codec.

Graph Example

The following figures shows a case where these graphs can be useful. In
the top left graph, it is apparent that the “Green” codec encodes with
significantly better quality than the “Black” codec. On the other hand,
the top right graph shows that the “Green” codec is slightly slower.
Relative bitrate/relative time graphs can be useful in precisely these
situations: it is clearly visible in the bottom graph that one of the
codecs is slower, but yields higher visual quality, and that the other
codec is faster, but yields lower visual quality.

As a result of these advantages, relative bitrate/relative time graphs
are used frequently in this report since they assist in the evaluation
of the codecs in the test set, especially when number of codecs is
large.

A more detailed description of the preparation of these graphs is given
below.

Bitrates Ratio with the Same Quality

The first step in computing the average bitrate ratio for a fixed
quality is inversion of the axes of the bitrate/quality graph (see next
figure). All further computations are performed using the inverted
graph.

The second step involves averaging the interval over which the quality
axis is chosen. Averaging is performed only over those segments for
which there are results for both codecs. This limitation is due to the
difficulty of developing extrapolation methods for classic RD curves;
nevertheless, for interpolation of RD curves, even linear methods are
acceptable.

The final step is calculation of the area under the curves in the chosen
interpolation segment and determination of their ratio (see next
figure). This result is an average bitrate ratio for a fixed quality for
the two codecs. If more than two codecs are considered, then one of them
is defined as a reference codec and the quality of others is compared to
that of the reference.

Relative Codec Encoding Time Computation

To compute the relative processing time of two codecs for a particular
video sequence, the encoding time is calculated for both codecs (the
encoding times are summed for all bitrates) and the ratio is taken. For
three or more codecs, one codec is chosen as a reference and the ratio
of its encoding time to that of the others is calculated.

For multiple sequences, each codec is assigned an arithmetic mean of
average relative encoding times for each sequence.

The scheme of SSIM calculation can be presented as follows. The main
idea that underlies the structural similarity (SSIM) index is comparison
of the distortion of three image components:

Luminance

Contrast

Structure

The final formula, after combining these comparisons, is the following:

L is the dynamic range of the pixel values (255 for 8-bit grayscale
images), and K1, K2 << 1.

The values K1 = 0.01 and K2 = 0.03 were used for
the comparison presented in this report, and the matrix filled with a
value “1” in each position to form a filter for the result map.

For the implementation used in this comparison, one SSIM value
corresponds to two sequences. The value is in the range [-1, 1], with
higher values being more desirable (a value of 1 corresponds to
identical frames). One of the advantages of the SSIM metric is that it
better represents human visual perception than does PSNR. SSIM is more
complex, however, and takes more time to calculate.

Examples

The following is an example of an SSIM result for an original and
processed (compressed with lossy compression) image. The resulting value
of 0.9 demonstrates that the two images are very similar.

The following are more examples how various types of distortion
influence the SSIM value.

The SSIM values for the Y-plane for these images are given below.

PSNR (Peak Signal-to-Noise Ratio)

Brief Description

This metric, which is often used in actual practice, is called the peak
signal-to-noise ratio, or PSNR.

Generally, this metric has the same form as the mean square error (MSE),
but it is more convenient to use because of the logarithmic scale. It
still has the same disadvantages as the MSE metric, however.

In MSU Video Quality Measurement Tool the PSNR can be calculated for all
YUV and RGB components and for the L component of LUV color space. The
PSNR value is quick and easy to calculate, but it is sometimes
inappropriate as relates to human visual perception.

A maximum deviation of 255 is used for the PSNR for the RGB and YUV
color components because, in YUV files, there is 1 byte for each color
component. The maximum possible difference, therefore, is 255. For the
LUV color space, the maximum deviation is 100.

The values of the PSNR in the LUV color space are in the range [0,
100]; the value 100 means that the frames are identical.

The following are further examples demonstrating how various distortions
can influence the PSNR value.

Next are the PSNR values for the Y–plane for these images

Appendix 8. VP8, x264 and XviD comparison

This appendix includes results for VP8
encoder comparison with x264 and XviD.VP8 encoder does not met our speed requirements for this comparison,
but we have included this codec due to many users requests. We would
like to express our gratitude to VP8 developers for provding codec and
presets and their help with presets tuning.

RD-curves

Movies

HDTV

Bitrate handling

Movies

Bitrate handling for VP8 encoder for movies is very good. Here are the
results for Movies.

HDTV

Bitrate handling for VP8 encoder for HDTV is quite good, except Troy
sequence at low bitrates. Here are the results for HDTV.

Speed/Quality trade-off

Movies

Comparing VP8 to XviD, VP8 is 5-25 times slower with 10-30% better
quality (lower bitrate for the same quality). When comparing VP8 and
x264 VP8 also shows 5-25 lower encoding speed with 20-30% lower quality
at average. For example x264 High-Speed preset is faster and has higher
quality than any of VP8 presets at average.

HDTV

Comparing VP8 to XviD, VP8 is 5-20 times slower with 10-20% better
quality (lower bitrate for the same quality). When comapring VP8 and
x264 VP8 shows 5-20 lower encoding speed with almost the same quality,
excluding x624 High-Quality preset. The results for HDTV

The same as good1, but --best instead of --good and without --cpu-used

Comments from VP8 Developers

We’ve been following the MSU tests since they began and respect the
group’s work. One issue we noticed in the test is that most input
sequences were previously compressed using other codecs. These sequences
have an inherent bias against VP8 in recompression tests. As pointed out
by other developers, H.264 and MPEG-like encoders have slight advantages
in reproducing some of their own typical artifacts, which helps their
objective measurement numbers but not necessarily visual quality. This
is reflected by relatively better results for VP8 on the only
uncompressed input sequence, “mobile calendar.”

Even with this limitation, VP8 delivered respectable results against
other encoders, especially considering this is the first time VP8 has
been included in the test and VP8 has not been specifically optimized
for SSIM as some other codecs have.

To date, WebM developers have focused on the VP8 decoder performance and
are only starting to optimize the encoder for speed. The WebM project
has only been underway for three weeks, and we believe that our encoder
speed will improve significantly in the near future.