Reset your password

IMMERSIVE SOUND / OBJECT-BASED AUDIO – AND MICROPHONES

Audio formats have developed over time. Starting with narrow bandwidth mono, moving on to various versions of two-channel stereo and finally to full-band, multi-channel immersive audio. The sound is reproduced in many ways, ranging from personal headphones to multi-channel systems in cinemas or other big venues.
Immersive audio can be described as a group of recording- and reproduction formats that involve more than a basic two-channel stereo

*) the .1 indicates an individual sound channel that only contains a fraction of the full frequency range, namely the range from 20 Hz to 120 Hz.

There are many ways to record immersive audio. In this article, you will find descriptions of microphone setups for the majority of immersive audio formats. It is important to define the listening setup before selecting the recording setup. In broadcast and in music production, the starting point is the ITU-775 standard listening configuration.

Coincident arrays vs. spaced arrays

A microphone array is just a physical arrangement of microphones. The array may consist of individual microphones mounted on one single microphone stand or perhaps on several stands or holders. In some cases, the microphones are built into one single unit (like the 5100 Surround Microphone).

In a coincident array, the microphones are mounted extremely close to each other. In principle, all microphones in this type of array receive sound simultaneously.

In the coincident technique, localization cues are based only on level differences between signals. This technique can create proper localization accuracy but will, to some degree, lack envelopment and have a small sweet spot (in two dimensions: left/right and front/rear). The advantage of a coincident array, however, is that it is compact, portable and mono compatible. It is easy to down mix the channels to one single mono channel without coloration from comb filtering and other artefacts.

A spaced array creates a three-dimensional enveloping audio sensation by providing an adequate amount of decorrelation between the signals (localization cues are based on time-of-arrival differences). When adapting the microphone placement (distance and angle) to the sound field, spaced arrays still provide proper localization accuracy.

The spaced techniques, in general, give a nice, large sweet spot and give listeners the sense of an enlarged and enveloping sound stage in a larger listening field. The disadvantage is their size and, in some situations, setup time. In addition, it is not advisable to collapse the signals to a mono signal – instead, one signal can be used.

Envelopment

Size of listening area

Size and portability

Localization accuracy

Coincident arrays

-

-

+

+

Spaced arrays

+

+

-

-

5.x

The basic and simple setup for channel-based 5.x (5.0/5.1/5.2) surround sound is the application of five microphones in a spaced array. There are different ways to select and arrange the microphones; it depends on many factors like the acoustic qualities of the recording room (i.e. a concert hall/jazz club/church), the layout of the sound sources present, the directivity of the microphones applied, or maybe, just taste. The setups may vary from strictly mathematically-calculated, psycho-acoustically verified to more "feel-like" configurations.

One way of thinking about the coverage of a 360° circle around the listening position is to consider each two neighboring microphones as a stereo pair. Each pair covers a specific segment of the circle. Sometimes the segments overlap, sometimes they "underlap". Another way to look at it, is to consider the frontal microphones as providing the main soundstage and the rear microphones establishing a sense of surround/atmos.

The following setups are not exhaustive but can be seen as inspiration and are examples of best practice:

The omni-based surround array

Five omnidirectional microphones arranged in a spaced array provides a good tonal balance. The low-frequency content is reproduced very convincingly. This setup also provides an excellent envelopment – when reproduced, the listener is surrounded by sound. The drawback of this setup can be the lack of isolation between channels.

The three frontal microphones – often called the frontal triplet – are arranged as a Decca-Tree. The positions are chosen in accordance with the optimum recording angle of the given sound source.

The position of the rear microphones is chosen independent of the surrounding soundfield. Normally, the rear microphones should not be placed too far from the front microphones. If the distance is too large, the delay may become audible. Furthermore, some directivity might be preferred for the surround pickup. This can be provided by acoustic pressure equalizers (APEs), which ensure directivity at higher frequencies but keep the advantages of the omnis, for good low-frequency response.

Distance between the outer frontal microphones: 60-120 cm (24-47 in). The wider the width of the source, the narrower the spacing of the mics should be. The center microphone is approximately 15-45 cm (6-8 in) in front of the L/R-pair.

The two rear microphones are placed 2-5 m (80-200 in) behind the frontal triplet. The distance between the rear microphones should be in the range of 2-3 m (80-118 in). As mentioned, APEs can be used to avoid frontal impulsive sounds being reproduced by the rear channels.

The Scottish sound engineer, recording specialist and lecturer, Michael Williams, has done intensive studies on Multichannel Microphone Array Design (MMAD). Look up the literature from Michael to find a precise setup for any given situation. Two publications are mentioned below and further references can be found there.

The cardioid-based surround array

The five cardioid (directional) mic array has the advantage of higher channel separation compared to the omni-based array. To provide the correct coverage in the spaced array, the microphones can be placed closer to each other, creating a smaller array. Of course, this can be taken to the extreme by arranging the microphones in a coincident configuration.

Example: A cardioid-based, 5-channel setup, providing equal coverage of all segments on the circle.

The Wide Cardioid Surround Array

The Wide Cardioid Surround Array (WCSA), introduced by Mikkel Nymand, provides equal timbral qualities, a high degree of envelopment and good low-frequency properties.

To obtain the desired sound character (and to enhance the listening position from a sweet spot to a sweet area), the five signals should be decorrelated. This means the microphones must be placed at an adequate distance from each other. On the other hand, the signals should not be too different (distant) from each other. If this happens, the resulting sound will not be coherent.

Omnidirectional microphones are often preferred for spaced arrays. This is due to their natural sound color and their ability to blend direct signals with room timbre. Wide cardioids (also named sub-cardioids) have a slightly more directional quality, which gives more ambience control and improved front imaging and localization accuracy.

DPA Microphones adapted this array to use five identical wide cardioid microphones (matched within a very narrow tolerance of ±1 dB on frequency response and sensitivity). Choosing five identical microphones instead of just a specific microphone type keeps the blend natural and leads to a more authentic and uniform reproduction of all channels.

After intense listening sessions and numerous practical tryouts in different recording applications (symphonic music, modern jazz, PA/Live, pop concerts and ambience recording), it has been found that this adaptation tends to work best with a larger spacing, especially of the rear channels. This array creates an intense, dynamic and enveloping sound character.

For wide ensembles (or large array-to-source distances), try expanding this array with two left/right omnidirectional outriggers to benefit from the pressure transducers' low-frequency pickup. These microphones are blended with L/R from the array at an appropriate level, offering a beautifully coherent, precise and rich surround sound image.

Soundfield / Ambisonics

In the early 70s, the British engineers Peter Felget and Michael Gerzon invented the soundfield principle later known as Ambisonics (today known as "First Order Ambisonics"). The format is based on a coincident array of microphones. The aim is to facilitate arbitrary microphone orientation in any direction, left/right, front/back, up/down. Basically, the soundfield principle works like MS, by addition and subtraction of the available signals. Two configurations are associated with Ambisonics: A-format and B-format.

The A-format is the physical arrangement of four cardioid microphone capsules and their output: FU (front upper), RU (rear upper), LD (left lower) and RD (right lower). The angles between the capsules are congruent with a tetrahedron, a triangular pyramid.

The B-format is a converted version of the A-format, resulting in a virtual format consisting of three orthogonally-oriented figure-of-eight "capsules"; X (front/back), Y (side), Z (up/down) and one omni (W).

By addition and subtraction, the individual signals can be converted to a directional microphone pointing in any direction. For instance, one omni (W) and one figure-of-eight (X) creates a cardioid pointing in the X-direction.

DPA Microphones formerly produced microphones for the format but does not at present.

Example: B-format components

Optimized Cardioid Triangle (OCT)

OCT is an array designed for the three front channels only. The system offers high separation between left-center and right-center. An additional configuration for the surround channels should be chosen carefully.

A cardioid microphone is used for the center channel placed only 8 cm (3.1 in) in front of two higher-order directional cardioids for left and right channels, pointing outwards. The spacing between the left and right microphones is the key to the desired recording angle. Distances between 40 cm (15.7 in) and 90 cm (35.4) are recommended from the designers, resulting in recording angles from 160° to 90°.

One or more pressure (omnidirectional) microphones can be added to the system to compensate for the missing low frequency from the pressure gradient capsules of the cardioids.

Example: The OCT2 variation suggests that the center microphone should be placed 40 cm (15.7 in) in front of the left/right microphone base line, giving larger time differences and spaciousness more like the Decca Tree.

Double MS

A time coincident, compact and adjustable surround configuration.

The Double MS setup is a time coincident, compact and adjustable configuration for surround sound/immersive sound. Two cardioids microphones and one figure-of-eight microphone are used. Alternatively, the setup can be created from four cardioid microphones.

The principle of the Double MS technique is a forward and backward pointing MS set, sharing the same side microphone. As in a standard MS setup, the side microphone is positioned with the in-phase side pointing left so only three microphones are needed. In this setup, processing/mixing is necessary to create the final format. As always with MS setups, two different transducer types are applied to provide the mid-information (cardioid microphones) and the side-information (bi-directional microphones). There is the risk of different frequency and phase responses of sound reproduction from the sides or the front.

The amount of each signal is adjusted for correct spatial distribution, especially regarding the frontal image. Typically, the L/R width is produced a little wider compared to standard MS for two-channel stereo.

The Double MS technique can be attained by using four identical – evenly matched – 4011A or 4011C Cardioid Microphones angled on the horizontal plane at 0°, 90°, 180° and 270° respectively. The membranes should be arranged above each other for best time alignment in the horizontal plane.

*) In practical recording using a mixer, just pan "cardioid left" to the left and pan "cardioid right" to the right + invert the phase (swap pin 2 and 3). The "dirty" way to do this is by using a Y-summing cable and invert the XLR-connector for the cardioid right.

Fukada Tree

The Fukada Tree is a Decca Tree array, but with five cardioid microphones and two additional omnidirectional microphones as outriggers to blend in between the front and rear channels. This setup was designed by Akira Fukada in 1997.

The choice of cardioid microphones improves the channel separation, and the backward-oriented rear cardioids also minimize leakage of direct frontal sound to the rear speakers.

Omnidirectional microphones are often preferred in Decca Tree configurations for music recordings due to their natural sound color and full frequency bandwidth. The two omni outriggers serve this very important component in the Fukada Tree array.

Since first announcing the Fukada Tree arrangement, Akira Fukada has designed a number of positioning modifications to improve front localization, but his choice of microphones remains constant and he continues to use DPA mics for their transparent feel.

Hamasaki Square

The Hamasaki Square consists of four bi-directional microphones arranged in a square.

The Hamasaki Square is designed for capturing the ambient/diffuse part of a surround sound recording. It is a four-mic square with 1.8-2 m (5.9-6.6 ft.) between the figure-of-eight microphones, which are routed to left, right, left surround and right surround at an appropriate level compared to the front array. The figure-of-eight microphones are pointed with their in-phase sensitive directions against the sides and with their nulls to the direct sound.

Compared to other systems for ambiance recording, this system is the least sensitive regarding the distance between the main array and the ambiance array.

The setup is defined by the Japanese sound engineer Kimio Hamasaki.

Immersive audio with height

Setups developed for traditional surround recordings (like 5.1) have proven to work very well. However, adding height to these recordings is interesting as it may also add new dimensions to the perceived experience.

The challenge is, however, how to add upward-directed sound images, without changing the perceived localization of horizontally positioned sound sources, meaning minimizing vertical inter-channel crosstalk. This leads to considerations regarding vertical time and level differences. The spacing of vertical microphones needed for decorrelation must also be considered. Finally, how can we avoid comb filtering in the unavoidable downmix?

When height information is added in the right way, the perceived envelopment created by the sound is enhanced. More than that, good practice has demonstrated enhancement of the perceived precision when localizing the sound sources, even in the horizontal plane!

Examples: A standard reproduction setup for immersive audio containing height information is 9.1, which is a standard 5.1 ITU 775 layout with additional upper-layer speakers above the left, right, left surround and right surround speakers. The height of the additional four speakers should provide a vertical listening angle of approximately 30°.

Dr. Hyunkook Lee of Huddersfield University (UK) and his research group have provided a lot of theoretical and practical information on the perceived sound imaging.

One important factor he found is that the precedence effect (the effect that the first arriving sound determines the direction) does not work in the vertical plane. Hence, it is worth looking at level differences. When playing back the same sound in the lower and the upper loudspeaker, it was found that the presence of higher frequencies and transient signals pulls the localization towards the upper loudspeaker [2,3].

Example: To keep the localization in the horizontal plane, it was found the upper signal should be attenuated by at least 7 dB.

These findings have led to the microphone setup shown below. It consists of eight cardioid microphones and two supercardioid microphones.

The orientation of the microphones is such that there is a minimum of frontal sound entering the upper layer of microphones. In general, any upper-layer microphone should receive as little sound as possible that contains sound from the primary horizontal sources and sources below the horizontal plane.

IRT Cross

The IRT Cross is designed for ambiance pickup. The setup consists of four cardioid microphones.

The IRT cross is designed for capturing the ambient/diffuse part of a surround-sound recording. It is a four microphone square with 20–25 cm (7.9–9.8 in) between the cardioid microphones, which are routed to left, right, left surround and right surround in an appropriate level compared to a front array.

The IRT Cross is normally positioned a couple of meters behind the main array. However, it should not be placed too far away as there may occur timing problems (like an echo) in the reproduced signal. The optimal placing of the IRT Cross is a balance between getting enough ambiance while at the same time avoiding echo.

Object-Based audio

For years, the most enveloping loudspeaker-reproduced sound has been channel based. One channel is for mono, two channels are for stereo and six channels are for 5.1 surround-sound (or 24 channels for NHK 22.2).

Conventions regarding the placement of the loudspeakers for each format have been the backbone of the sound design. Inter-channel panning by the aid of delay- or level adjustments has been the tool for the placement of sources of the sound scene. The finished product would be contained in a fixed number of channels; even though the program material originally was recorded on a huge number of audio tracks, the final product would fit into a specific number of channels, one for mono, two for stereo, etc.

The Object-Based Audio (OBA) is somewhat different. A "sound object" can be recorded on one or more tracks. Along with the audio goes the metadata that tells where to position the sound in the soundstage.

An object could be a voice recorded in mono. If the producers intend to let the voice come from the right of the soundstage, then the metadata of the voice recording contains the coordinates of this sound. The voice is for this reason recorded as a stereo track. Then the metadata of these stereo tracks provides the data for the positioning.

In principle, an object may also stem from an ambisonic recording or any other format. Therefore, an AV program with OBA is built up from a string of objects, like voice recordings, music, ambient sounds, special sound effects, etc. Each object will contain metadata on when and where to be reproduced.

OBA has already found its way to the cinema (Dolby Atmos and the like). However, it is the intention to bring it into broadcast as well, and many experiments have been carried out. In addition, virtual reality (VR) is an obvious target for OBA.

Why?

The general idea is to leave a higher degree of freedom to the listener, especially in broadcast. Now it is possible to emphasize a single object. If a hearing-impaired listener wants to level up the dialog, this is a possibility, if you record the dialog as an object. You can also change the language of the commentary, if you allocate each language to separate objects.

From TV productions like Formula 1 races, we know that special onboard cameras can be selected, if the viewer wants to follow a specific car. The sound of that specific car is an object in conjunction with the image. Specific musical instruments in an orchestra can be regarded as objects. Alternatively, the sound from a concert, recorded at different listening positions can be objects.

Another argument for OBA is that almost any reproduction format is valid. The downmix is optimized depending on the number of channels and their positions available for the playback (as long as the number of channels is at least two). Binaural reproduction is also allowed for.

Microphones?

The basic idea is that the sound engineer can use the kind of microphones that he likes. There is not necessarily a demand for specific microphones, microphone configurations or microphone brands. The special requirement goes on the production equipment, that can establish the metadata and of course to the formats that carry the complete information.

3506A Kit of two matched 4006A microphones, clips and windscreens in Peli™ case

S5 Surround/Decca Tree Mount

Optimized cardioid triangle (OCT)

4011A Cardioid Microphone

4011C Cardioid Microphone, Compact

4018A Supercardioid Microphone

S5 Surround/Decca Tree mount

Double MS

DPA does not provide any figure-of-eight microphones. We suggest the Schoeps MK8 with CMC6 preamp. However, if you want to try this setup with DPA microphones, we suggest you substitute each figure-of-eight microphone with two cardioid microphones:

ST4011A Stereo Pair with 4011A Cardioids

SB0400 Modular Stereo Boom

UA0836 Stereo Boom

DUA0019 Spacer for Stereo Boom, 19 mm (0.75 in)

Fukada tree

4011A Cardioid Microphone

4011C Cardioid Microphone, Compact

4006A Omni Microphone

3506A Kit of two matched 4006A microphones, clips and windscreens in Peli™ case

S5 Surround/Decca Tree Mount

ST4011A Stereo Pair with 4011A Cardioids

SB0400 Modular Stereo Boom

Hamasaki square

DPA does not provide any figure-of-eight microphones. We can suggest the Schoeps MK8 with CMC6 preamp. However, if you want to try this setup with DPA microphones, we suggest you substitute each figure-of-eight microphone with two cardioid microphones:

ST4011A Stereo Pair with 4011A Cardioids

S5 Surround/Decca Tree Mount

Immersive audio with height

8 x 4011A Cardioid Microphone

2 x 4018 Supercardioid Microphone

IRT Cross

4011A Cardioid Microphone

4011C Cardioid Microphone, Compact

ST4011A Stereo Pair with 4011A Cardioids

MMC4011 Cardioid Microphone Capsule

MMP ER/ES Modular Active Cable

SB0400 Modular Stereo Boom

UA0837 Stereo Boom

DPA 5100 Surround Microphone

The 5100 Mobile Surround Microphone is a plug-and-play solution.

One unit contains three directional (DIP-MIC, directional pressure microphones), coincidently arranged frontal microphones. The rear channels are recorded by a spaced pair of two omnidirectional microphones. The unit also provides an LFE-output. All channels are calibrated to unity gain. The LFE is reduced by 10 dB, according to the standard.