if the resulting soundfield is to correctly "gel". This may be a problem

if the resulting soundfield is to correctly "gel". This may be a problem

when B-Format channels are compressed using lossy compression.

when B-Format channels are compressed using lossy compression.

−

There is a file specification in use for downloadable B-Format files

There is a file specification in use for downloadable B-Format files

Revision as of 11:02, 25 March 2009

This page is part of the XiphWiki, and is aimed at people developing file
formats and associated software for Ambisonics. For an general introduction
to Ambisonics, please go to the
Wikipedia page on Ambisonics.

Ambisonics is a surround sound system first developed in the 1970s.
Its main difference from other surround techniques is that it separates
transmission channels from speaker feeds, the speaker feeds being derived
using a decoder situated in the living room. Decoders can be implemented
in either hardware or software. Typically more speakers are used than
transmission channels, and the more speakers used then the more stable the
resulting soundfield. Speakers can be arranged in a number of configurations,
regular polygons being the most popular.

Ambisonic files can come in a number of different formats. The main one is
called B-Format, the other formats being derived from this. UHJ format is
mono- and stereo-compatible. G-Format is a set of speaker feeds, so can be
enjoyed in surround sound without the need for a decoder in the living room.

Ambisonics and 5.1

Ambisonics and conventional 5.1 surround sound are very different. 5.1 is a
set speaker feeds, the signal only being fully defined for sounds coming
from a speaker. Phantom images between speakers can be created, but the
technique to do so is left unspecified. Many 5.1 releases use pair-wise
mixing to create phantom images. This is understandable as almost all
stereo recordings are mixed using pair-wise mixing.

Pair-wise mixing is also called "pan-potting", "amplitude mixing" and
"intensity stereophony". It mixes signals into the feeds for a pair of
speakers to create the illusion that a sound is coming from a point
somewhere between the speakers. During mixing, the apparent location of
each sound is determined only by the relative amplitude of that sound in
the two speakers.

Unfortunately, pair-wise mixing works poorly when the speakers are to the
rear of the listener and not-at-all when they are to one side. You can
demonstrate this for yourself by performing
a very simple experiment.
Pair-wise mixing did not work in the quadraphonic era and it will not work
now. Such an absolute statement can be made because the way that humans
localise sound has not changed.

Ambisonics is fundamentally different from 5.1. What is encoded in
Ambisonics is not speaker feeds, but direction. When mixing in
Ambisonics, the positions of the speakers are unknown
and are of no interest. Further, when Ambisonics is decoded to speaker
feeds, all of the speakers cooperate to localise a sound in its correct
position. So, for example, when the speakers on the left push, those on the right pull. The speakers all contribute to the creation of a single coherent
soundfield.

Ambisonics to 5.1

Converting Ambisonics to 5.1 is straightforward, and is discussed below
(see G-Format).

5.1 to Ambisonics

Converting 5.1 to Ambisonics is more difficult. It is easy to make the
five speaker feeds phantom images, called "virtual speakers". (The ".1"
channel can be folded into W.) The problem with this is that even if the
Ambisonic rendering is perfect, the result will only be as good as the
original 5.1 played through real speakers. It will not be an
improvement. Nobody has yet come up with a way for Ambisonics to improve
5.1; 5.1 is simply too broken.

B-Format

B-Format is a single coherent soundfield composed of a set of related
channels. The number of channels used depends on whether the soundfiled
is horizontal-only or full-sphere, and on the order. These B-Format
channels are transmission channels, not speaker feeds. Listening to
B-Format requires a decoder in your living room. Some numbers of
channels are tabulated below.

Channel correlation

The correlation between B-Format channels depends on the content.
Four-channel B-Format consists of an
omni-directional component, called W, and three figure-of-eight
components pointing forward, left and up, called X, Y, Z.
(Pictures are available.)
Three-channel, horizontal-only B-Format simply omits the Z channel. This
means that anything in X also appears in W. Same for Y and Z. (W is
omni-directional; everything appears in W.) Also, if content comes from
Front-Left then it appears equally in X and Y. Same for content from
Front-Right, Back-Left, Back-Right; only the relative polarities change.
So there can be a lot of correlation between B-Format channels, but it is
content dependent.

One problem with B-Format is that it is big on low-frequency phase. The
phase relationships between the different B-Format channels are important
if the resulting soundfield is to correctly "gel". This may be a problem
when B-Format channels are compressed using lossy compression.

There is a file specification in use for downloadable B-Format files
called the
".amb" specification.

Limitations of the ".amb" specification

The ".amb" specification
for downloadable B-Format files is based on the WAVE-EX format. There are
currently over 100 pieces available in this format
for free download. Most of these are
first-order full-sphere soundfields. (The same website also has details of
ad hoc software decoders.)
Some of the limitations of the specification are:

It is limited to 4 GByte files (2 GBytes if somebody screwed up).

It is limited to third-order soundfields and below. While third-order looks like a lot (16 channels), there already exists a prototype mic that can record up to fourth-order (25 channels).

No compression (particularly lossless).

The reason that the ".amb" file specification is limited to third-order
and below is because it uses the number of channels to uniquely define the
soundfield order. Unfortunately this simple and elegant scheme does not
work above third-order as ambiguities creep in. (One ambiguity is
illustrated in the table below.)

A more general file format will have to use something else, such as
Malham notation, or storing both the horizontal-order and
height-order. There is a one-to-one correspondence between Malham notation
and the pair of orders, and either can generate the number of channels.

Malham notation

Malham notation specifies the order of a B-Format soundfield using a
string of characters, each character being either f (for full-sphere)
or h (for horizontal). The first character in the string specifies
the type of the first-order components, the second character the type of
the second-order components, etc.

Horizontalorder

Heightorder

Soundfield_type

Malhamnotation

Numberof_channels

Channels

1

0

horizontal

h

3

WXY

1

1

full-sphere

f

4

WXYZ

2

0

horizontal

hh

5

WXYUV

2

1

mixed-order

fh

6

WXYZUV

2

2

full-sphere

ff

9

WXYZRSTUV

3

0

horizontal

hhh

7

WXYUVPQ

3

1

mixed-order

fhh

8

WXYZUVPQ

3

2

mixed-order

ffh

11

WXYZRSTUVPQ

3

3

full-sphere

fff

16

WXYZRSTUVKLMNOPQ

4

0

horizontal

hhhh

9

extra channels unlabled

Default channel conversions from B-Format

Converting a B-Format file to a mono file is straightforward. Use Mono =
W*sqrt(2).

Converting a B-Format file to a stereo file is more difficult. The "proper"
way to do this is to convert the W,X,Y channels to two-channel UHJ.
Unfortunately this requires the use of wide-band 90-degree phase shifters.
In the digital domain these are usually implemented as convolution filters.

Assuming 90-degree phase shifters are unavailable then the problem is one of
choice. Starting from B-Format, it is possible to synthesize any mic
response pointing in any direction. Hence, it is possible to synthesize
all coincident stereo mic techniques. Two popular stereo techniques are
Blumlein Mid-Side and Blumlein Crossed Pair.

Blumlein Crossed Pair

Which conversion to stereo is better depends on the material and how it was
recorded. A good suggestion is to not specify a particular default
channel conversion; instead, simply specify that there must be one. If one
has to be specified then Blumlein Crossed Pair is the simpler.

UHJ format

B-Format is the main format for Ambisonic files. However, B-Format is
not mono- or stereo-compatible. This is why the UHJ hierarchical system
was developed. Depending on the number of channels available, the UHJ
system can carry more or less information, but at all times it is fully
mono- and stereo-compatible. Up to four channels (Left, Right, T, Q) may
be used. The T-channel can also be band-limited but, as this
"2½-channel UHJ" was only ever used for FM radio transmission, it
will not be discussed further.

To listen to UHJ files in surround requires a decoder in your living room.
Also, UHJ is restricted to first-order soundfields, either horizontal (two-
and three-channel UHJ) or full-sphere (four-channel UHJ).

Converting B-Format channels to UHJ channels, and vice versa, requires the
use of wide-band 90-degree phase shifters. In the digital domain these
are usually implemented as convolution filters. Conversion between
four-channel B-Format (W, X, Y, Z) and four-channel UHJ (Left, Right, T,
Q) can be accomplished without loss of information. The same with
three-channel to three-channel (W, X, Y) <=> (Left, Right, T). It is
possible to recover three-channel B-Format (W, X, Y) from two-channel UHJ
(Left, Right), but not without loss. It is also important for the Ambisonic
decoder to be aware that the B-Format channels were recovered from
two-channel UHJ (because of the need to apply different shelf filters).

Several hundred
two-channel UHJ LPs and CDs
have been released. Three- and four-channel UHJ recordings have never been
commercially released.

There is a file specification for downloadable two-channel UHJ files
called the
".uhj" specification, but it is not currently in use.

Limitations of the ".uhj" specification

The ".uhj" specification
for downloadable two-channel UHJ files is based on the WAVE or WAVE-EX
format. A UHJ chunk is added to the file to indicate it is UHJ. As
unrecognized chunks are always skipped, use of this chunk maintains stereo
compatibility. Some of the limitations of the specification are:

It is limited to 4 GByte files (2 GBytes if somebody screwed up).

It is limited to two-channel UHJ files. Three- and four-channel UHJ are not accommodated.

No compression.

The ".uhj" spcecification is only defined for two-channel UHJ to maintain
stereo compatibility. While it would be possible to add the UHJ chunk to
three- and four-channel WAVE-EX files, the recommendations from Microsoft
for playing such files is that the audio device should render the extra
channels to output ports not in use. This can happen even when the extra
channels are masked off. (Put simply, in WAVE-EX files the channel mask
does not mask channels.) Because of this, three- and four-channel
WAVE-EX files can not be made stereo compatible.

In the Xiph world, it should be possible to use default channel conversions
to ensure that three- and four-channel UHJ files remain stereo compatible.

Default channel conversions from UHJ

Converting a UHJ file to a stereo file is even easier. Use Left = Left, Right = Right, and discard T and Q if present.

G-Format

A G-Format file is a common multi-channel surround file containing an
Ambisonic soundfield pre-decoded to its speaker feeds. This allows
listeners who do not own an Ambisonic decoder to enjoy Ambisonics.

The sound engineer creates a set of speaker feeds for a particular number
and arrangement of speakers. This is typically four speakers arranged in
a square. Other speaker arrangements are also possible

In Ambisonics, all speakers cooperate to localise sounds in any particular
direction; there are no "surround speakers" as such. Because of this, best
results when playing G-Format recordings (and Ambisonics in general) are
obtained when the speakers are matched. The easiest way to accomplish this
is to use identical speakers. Unfortunately, many home theatre systems
include a centre-front speaker which is different from the other speakers.

An easy way to cope with this is adopted on G-Format recordings commercially
released on DVD-A by Nimbus Records.
They use four speakers in a square, the centre-front speaker being unused.
If a centre-front speaker is used, it should be fed at a very low level;
centre-front = 0.1*X has been used successfully for movies.

G-Format files can also contain conversion coefficients to recover the
original B-Format channels. The recovered B-Format channels can then be
fed to a decoder in the listener's living room, and so accommodate a
speaker arrangement different from the one used when the G-Format file
was produced. Each B-Format channel is recovered using a weighted
combination of the speaker feeds in the G-Format file. Obviously, if a
B-Format version of the file exists then it can be fed to the decoder
directly without the need for G-Format.

File formats for G-Format include all multi-channel formats that contain
speaker feeds. However, these will not contain information to allow the
B-Format channels to be recovered. A ".amg" file format
(based on WAVE-EX) for downloadable G-Format files, which will allow
the B-Format channels to be recovered, has been proposed.

Default channel conversions from G-Format

Converting a G-Format file to a mono or stereo file is straightforward.
First, recover the B-Format channels using the conversion coefficients
contained in the file. Second, follow the advice given above for
Default channel conversions from B-Format.