It will describe the procedures of producing multichannel surround sounds based on NHK's experiences in the production of HDTV programs and radio dramas. Specific methods are reviewed in the following sections regarding:
1) Acoustics of the control room;
2) Surround sound design methodology;
3) Mixing considerations, and
4) Drama and music production examples.

1 Design of control room acoustics and monitoring environment

For the optimal mixing of surround sounds and their precise reproduction in the home, more appropriate guidelines for the design of the control room environment, and its acoustics in particular, need to be drawn up. Table 1 shows the guideline specifications proposed by HDTV Multichannel Sound Study Group (HDTV-MSSG) which operated from 1992 to 1995. These specifications were intended for sound mixing work in small-capacity studios, an environment quite different from that for motion picture sound production. Table 2 lists the test items at that time. Table 3 shows the specifications of NHK's surround sound post-production studios, mix-down studios, music recording studios and TV studios built and used since 1987. We started with rather vague ideas and have tried various design approaches, leading to the primary design methodology in 1999 as follows:

(1) Each channel must undertake even work in accordance with the number of channels in the multichannel system. This means that increased sound field homogeneity is needed for the control room.
To achieve such homogeneity, NHK has designed the control room to absorb sound in its front section and evenly diffuse it at the rear ceiling and sidewalls. We adopted this idea to secure a broad and homogeneous sound field over the entire band in the optimal listening area, which is different from the conventional method of combining different acoustic conditions, such as live end and dead end. The control room's interior is an irregular heptagon that is right-left symmetrical.

(2) The front section of the floor must be made of sound absorbing materials to prevent the primary acoustic reflections of L-C-R speakers from the floor. In the conventional construction, monitor speakers are built into high-rigidity walls, but this construction tends to generate rear reflections in multichannel surround sound production and disturb the frequency characteristics near the mixing area. Therefore, the present front surface has a structure that transmits sound.

(3) Diffusers must be used for homogeneous diffusion of medium and high frequencies. The installation location will depend on the studio's capacity and space arrangements.
In studios built so far, the diffusers are installed on the ceiling, side walls and back wall.

(4) The control room's own low-frequency processing function and unnecessary resonance must be inhibited since the control room has higher Low-frequency reproduction ability than conventional rooms. Sound traps are installed in the air behind the speakers for Low frequency processing near the front side. The walls behind monitor speakers are made of burnt bricks to increase the articulation of low-frequency sound.

(5) Fire-resistant stone blocks and 'Acoustone'TM porous heavy bricks are used to facilitate low- and medium frequency processing in the studio's production area.
This improves low-frequency absorption ability and eliminates unnecessary resonance with the studio's base structure.

(6) The studios have common parameters: RT = 0.2 to 0.3 sec. with a 'dead' tendency; NC = 15 or less. The front L-R speakers are installed 3.6 to 3.2m's apart in consideration of the alignment of L-C-R speakers compatible with stereophonic production. The distance to the effective acoustic center, which is at the apex of an equilateral triangle formed with the line connecting L and R speakers as its base, is 3.4 to 3.5 m's, slightly behind the mixer seat. This arrangement is intended to enlarge the listening area. When more than one pair of speakers are installed in the rear section, one pair is at 60dgree and the other at 120dgree. When a single pair is installed, it is set at 130degree, somewhat laterally.

(7) In fact, it is difficult to install all speakers at the same height in an actual production studio as specified in the ITU-R Standards, which place greater emphasis on the reproduction environment. For this reason, allowances must be taken into account and, particularly in the case where the same listening area, as that in the front section cannot be secured in the rear section, an electrical correction method shall be used. Figure -1 shows an example of the time-base correction of monitor speakers at the mixer seat in NHK's HVD-520 post-production studio. Because of this, the same distance is maintained on a plan-view drawing between respective channel speakers and the acoustic center in the control room. A cross-sectional view shows that the speakers are some 30% higher than the control room. This configuration, however, is a practical way to ensure a broader listening area because there are also production staff on duty in the rear. Since the monitor balance of each channel may vary due to secular changes, it should have a trimming function to enable fine adjustments to be made.

Figure1:SUPEAKER TIME ALIGNMENT

(8) No LFE speaker dedicated for low frequency processing is installed. The CA-421 screening room is only intended for reproduction. In the use of LFE components, the output is chiefly delivered to the L and R speakers. The LFE components must be retrieved on the reproduction side as necessary and matched with the main-channel components.

(9) All the front speakers must be of the same model, while small types of the same model must be used for surround sound production due to physical restrictions.
This unifies the quality of sound in all the channels.

2 Primary positioning for sound design for TV dramas

Figure 2 shows the positioning for dialogue, music and sound effects for HDTV dramas. Typical patterns of surround sound design are shown in Figure 3.

Figure2:SURROUND COMPONETS PLAN

Figure3:SURROUND SOUND DESIGNs

2.1 Recording of dialogue

Since dialogues that accompany images are positioned in principle in the center channel, they may be recorded as monaural even when recorded on the spot. Dialogue spoken in a crowd scene or party scene where the voice needs to generate a sensation of expanse are recorded in stereo as material to be processed in post-production.
If a portable multichannel recorder is developed in the future, it will become possible to record 3-2 surround material locally. Sound collages and dialogue used as special effects may be positioned in accordance with producer's ideas.

2.2 Foley and sound effect materials

Foley may be recorded in principle as monaural because it is positioned in the center channel. It is recommended that the material be recorded in three L-C-R channels and processed if a sensation of expanse, like doors being fully open over the whole screen, is required. In case of using the material for special effects, it can be positioned so that all the channels may be used efficiently. Basically, the front speaker configuration for sound effects may be any of L-C, C -R, or L-R combination, while the rear alignment may be any combination of SL-SR, SL only and SR only. The size of sound image and the sensation of expanse should conform with image shots.

2.3 Music

As regards scoring music, usually the front L-C-R speakers are used for main music and the rear configuration is for ambient components. The center channel is primarily used for source music in dramas and all the channels effectively undertake ME.

This is the most fundamental surround sound design for either music or dramas. For music, it produces an environment space behind the audience so that they perceive a stronger sense of reality or atmosphere. In dramas, the environment sounds enable the audience to better perceive how the story is proceeding. The difference between drama environment sounds and environment music is that surround components used for the former are not necessarily those recorded simultaneously.

(2) Fly-over

As suggested by the name, specific sound flows longitudinally between the front and rear sections of the studio. A sharp snap of sound effect adds a strong impact to the scene.

(3) Whirlpool

The audience is thrown into a spiral whirlpool of sound and so feels as if the place is swinging in every direction.

(4) Proceeding sound field

Sounds which may predict what is about to happen in subsequent scenes are reproduced, not merely generating a sensation of reality or feeling of unity as with the method in Item (1) above. The sounds produced here must be short, have a punch and allow the audience to guess what's coming next.

(5) Sound shower from above

A shower of sound comes from above the audience. It is theoretically impossible to reproduce vertical relations using any configuration of the current horizontal 6-channel speakers. The method, however, makes use of the physical advantage of surround speakers, which are typically installed higher than the audience seats.

(6) Big, closer sound feeling

Most of the sounds come horizontally instead of from above; the main components are reproduced with the front C-channel, and supplementary components with the L-R/SR-SL channels. This method is useful for emphasizing a specific human voice in dialogue or monologues or for big sound effects representing gunshots and explosions. In this method, sound can be boosted larger than that reproduced from a single channel. Using more than one channel can strengthen the drive to a higher level than representing all with a single channel, while securing the peak margin.

The main music components are positioned in the front section, while spatial information, such as reverberation of the hall and indirect acoustic components, are reproduced in the rear.

(2) Discrete layout

This layout is not intended to reproduce theatrical performances, but is suitable for the musical representation of something unrealistic by actively using more of the assigned channels. It is aimed at the front section but reproducible sound can be laid out freely over the audience's surroundings.

(3) Omni-directional layout

The audience's front axis is not fixed so that they can receive the sound from all over. Music artists, such as Japan's Isao Tomita and Britain's Allan Parsons, have created 'sound walls' by making good use of such Omni-directional acoustic space.

Figure4:3MUSIC SURROUND DESIGNs

4 Considerations for surround sound mixing

In this section, we discuss the following subjects and review usage examples by rule of thumb:

Since the production of broadcasting programs involves a variety of areas including motion pictures, large monitor level settings are corrected at 85 dB per channel in pink noise in order to make them compatible with movie sounds. Medium and small speakers are also corrected at such monitor levels as 82 dB, 80 dB, 78-76 dB/channel so that monitor levels can be mutually switched over where necessary to optimize the balance of sounds reproduced in the home. NHK's CR-602 studio, remodeled in March of 1998 to produce radio dramas, is equipped with various digital oscillators, thus facilitating the correction of speaker levels. Figure 5 shows monitor speakers in operation and typical monitor balances according to size.

Figure5:REFERENCE MONITER LEVEL

4.2 Mix-down and dynamic range control

A monitoring function compatible with both surround mixing and 2-channel stereo mixing is important for maintaining compatibility in mix-down. This function must allow the 2-channel stereo balance to be checked during surround mixing. Figure 6 shows the mix-down circuitry installed in the CR-602 studio. There are large differences in the audiovisual environment and reproduction levels between 3-2 surround mixing and 2-0 stereo mixing. To resolve these problems, a processor capable of audibility compensation and appropriate dynamic range compression will be necessary. Future developments of multichannel music software production rather than drama production will increasingly require such processors.

4.3 Representation of the center channel (See Figure 7)

Figure7:HOW TO USE CENTER CH

a) Positioning for the phantom center

The same as with the conventional 2-channel stereo method, this position is used to emphasize high-quality sound blending between the front L-R speakers or when no articulates sound image is necessary.
This is enough provided there is no pressing need to use the center channel or the speakers in the reproduction area have narrow intervals. It should be noted, however, that phantom center sounds would form a diffused sound image and would sound to be out of balance if the image screen size was 50 inches or larger or the L and R speakers were placed 2.5 m or more apart.

b) Positioning for the hard center

This is used to clearly distinguish a real sound image from the images of other channels or when an articulate sound image of center position, instead of a phantom center image, is needed. This has an advantage that it allows a mix-down using coefficient, which approximately match theoretical values. This method helps stabilize overall positioning when used for the main vocals or a specific solo instrument as well as for narration and monologues. Even if out of the sweet spot area, the sounds would largely be kept in equilibrium.

c) Mixture

By combining the foregoing two methods, this approach sets positions by combining the specific hard center in the center channel and the supplementary phantom center between the L and R speakers. It is useful for smoothly blending sound images in the whole front together while articulately placing center components in position. For this purpose, cross talk between the L/C/R channels must be controlled using what is called the divergence function. Typical examples are as follows: for monologues, position the main sound in the hard center channel and also in the L-R channels but with the level reduced by 3 to 4 dB, or avoid the risk of over level by concentrating the bass and kick drum parts only in the hard center channel. The latter example requires special care in mix-down to prevent any difference in the balance between surround sounds and 2-channel stereo sounds.

4.4 LFE control

Heavy bass components of 120 Hz or under offer a useful means of representing motion picture and drama sounds. In general, music performed with acoustic instruments contains little heavy bass components unless bass drums, 'cannons' and contrabasses are played in concert. Adding a slight flavor with a limited amount of heavy bass components may be enough except when they are intentionally added in bulk for some purpose.

These combinations can be freely added together for monitoring, so well balanced mixing on three systems is possible by switching A-D for pre-mixing or final mixing procedures. As shown in Figure 9, materials are taken by DAW and stored in magneto-optic discs.

Figure9:HD0TV DRAMA LAST BULLET 1995 MIXING FLOW

The Logic-2 console incorporates an audio file capable of 16 channels which is responsible for sound rectification, ADR and base noise, as well as for dialogue including monologues and narration. It is recommended that music be also stored in this file if there is sufficient track space to facilitate the delicate timing procedure.

For the surround mixing of sound effects, pre-mix output from the Fairlight recorder that stores material and records finished sound effects on the PCM-334 8. In the final mixing, mix dialogue, music and sound effects, in that order, using the Logic-2's automation mode. If everything is all right, record them by segments and then check and retouch respective scenes, as needed, by monitoring them by blocks.

Finally, three formats are simultaneously recorded on Tracks 9 to 16 of the PCM-33 48, as shown in Figure 10. Lay them back in the master VTR to complete the whole mixing procedure. Optimal tracks may be chosen from the master VTR in accordance with the medium and format used.

What is the best way to record spatial environment components to reproduce natural sound space from them? In recent years, the Omni-directional microphone has been used as the main microphone in many cases of stereo recording because it can take in rich reverberation components. However, if surround microphones are added to this stereo method for surround recording, an excessive amount of reverberation components may produce sounds such that the environment is overly enhanced. For reverberation component energy containing spatial information to be perceived as natural by the audience, the total sound levels in both stereo and surround-recording procedures must be equal. Recording reverberation components with an Omni-directional mike will result in the reproduction of rich reverberation. However, since the mike picks up delayed direct sound components as well, rear sounds will become unnatural and the sound positioning in the front will be unstable.
Installing surround microphones apart from the sound source would reduce direct sound energy, but would also lower reverberation energy and thus aggravate the reverberation texture.

To resolve all these problems, Fukuda has devised a new microphone arrangement plan. The plan, in order to clarify mike positioning and record rich environment sounds, suggests the installation of unidirectional mikes for main and environment sounds at around the critical distance where the concentration levels of direct sound and indirect sound components become equal. The Fukuda plan, which basically consists of seven mikes and supplementary environment mikes, may be indicated in the form of a tree as shown in Figure 11. The microphones marked LL/RR are Omni-directional, while all the other microphones are unidirectional. For recording an orchestra performance, for example, the main mike and environment mikes are set at 2 m or smaller intervals. If they must be installed further apart; insert a delay in the front main microphone to adjust the time axis.

Figure 12 shows the sound images which may be covered with this mike arrangement. The LL/RR microphones on both sides are intended to pick up the orchestrated sound expanse and a smooth sound envelope covering the front and rear sections of the hall. The unidirectional mikes are B&K 4011 and the Omni-directional mikes, B&K 4006. The configuration of the tree can vary depending on the hall's acoustic characteristics, while the mike intervals may be changed conforming to the orchestra's size and formation. The horizontal directivity angles of microphones are important in order to pick up the orchestra's sounds precisely. They are usually installed horizontally. This mike arrangement is chiefly intended to record a sensation of expanse of the hall. The random energy efficiency of the unidirectional mikes is 4.8 dB lower than that of the Omni-directional mikes, which provides appropriate separation between direct sound and indirect sound. Since the unidirectional mikes have a distance factor of 1.7, if they are installed at the same positions as the Omni-directional mikes, they will offer a close and 'dry' feeling. With this taken into account, the microphones are placed at adjusted distances from the sound source. Since unidirectional mikes come in a variety of types with different characteristics, their positions vary accordingly. Examples include the Neumann M-50 which is Omni-directional but actually has directivity characteristics from 5 kHz and up, as well as the Schoeps MK21 capsule featuring a wide directivity range and the B&K 4006 which has directivity when used together with an acoustic pressure equalizer. All these mikes are compatible with the tree system. The center mike is set for a balance level lower by 6 dB and 4 dB than the L and R microphones, respectively. Fig13 shows actual example.

Figure12:FUKADA-TREE

Figure13:FUKADA-TREE 97’ SUPER CONCERT

5.3 Radio drama ~Concept of sound design for 'Yume-no-Hitsugi'

This section introduces a sound design example called 'Yume-no-Hitsugi' or Coffin of Dreams, a 1998 Radio drama. The design concept is itemized below:
* Two different worlds are depicted: a warped jungle where God lives and sobers everyday life. Each line of information inserted in the scenario is supposed to be converted into full-range sounds from fine to dynamic. One of the issues is how to effectively use the heavy bass area or LFE with frequencies of not higher than 120 Hz.
* Another challenge is how to successfully represent totally different environments, the inside of a closet or elevator and a vast jungle, in a 3-2 surround sound field.
* The 3-2 surround sound field also requires more sophisticated music representation than usual. So, Mr. Fukada, the NHK mixer in charge of verifying his own 'Fukada Tree' mike arrangement plan, undertook the mixing of the music score.

To realize images envisaged by the composer, we first carefully considered which sound should be resounded, the desired sound-origin directions and the timbre balance of instruments, and then determined the positions of instruments. We made a basic sound field in compliance with the Fukada Tree and created a sensation of expanse by positioning a harp and vibraphones in the rear. The percussion was separated from the other parts with a sound barrier to prevent sound overlap with the Tree. This was particularly necessary to prevent sound confusion with the Tree when producing something like sound effects in the rear.
A prepared piano was fine-tuned to its optimal condition in cooperation with the composer. In consideration of the music coming down to a background level, we carefully mixed music sounds so as not to attenuate their reverberation energy and bass power.

* The method of dealing with dialogue in dramas in the L-C-R front in the 3-2 surround recording procedure is still a problem. Also, how to represent center components, as in the case for 3-2 music, and possible microphone arrangement for the recording of dialogue must be clarified. In this 'Yume-no-Hitsugi' example, we positioned ordinary conversations in a phantom position in between the L-R channels and monologues and narration in a hard position in the center C channel.

The following sections introduce the flow of production processes from recording of dialogue, to mix-down.

5.3.1 Recording of dialogue

The dialogues of the drama were recorded in the CR-601 studio that was built adjacent to the CR-602 studio in 1997. The new studio, which is dedicated for recording dialogue, consists of three booths, each incorporating a main area to reproduce virtual space, a semi-dead room and a full-dead room, respectively. An X-Y pair of U-87Ai mikes, which have a versatile quality of sound, is usually used to record dialogue. This time, however, we employed the U-87Ai for monologues and a pair of M-149 new tube mikes for dialogue. Conversations over the telephone were recorded using a SM-58 microphone with the other party in the semi-dead room. The dialogues of the God of Forests were recorded through an MKH-40P mike placed in the full-dead room. The recorder was Fairlight MFX-3 Plus using a M.O disc, which was then edited, in an editing room.

5.3.2 Foley recording

We took special care regarding two points: one was to record at a good S/N ratio weak sounds, such as a teacup being put down on a saucer or the rustling of a dress, and the other was to record materials that could add to effective texture in later sound effect pre-mixing. For these procedures we mainly used an expander and Program 8-1 pitch shift included in Lexicon 480. Materials to be added later with LFE's heavy bass components were somewhat reinforced in their low-frequency area during recording. Typical examples were a door or a closet being opened/closed, a monster bird flying up into the sky, a man running up a steel staircase, etc. Sounds representable in the front R-C-L as a whole, e.g. a door being opened/shut, were recorded through an MKH-60P mike into the monaural center. Also, we positioned the expanding and resounding door sounds through a pair of U-87Ai mikes into the L-R channels. We also represented the flying monster bird appearing at the start of the drama by recording artificial sounds and synthesizing them in pre-mixing and produced a sensation as if the bird emerged from behind the audience using a joystick panning method. Delicate rustles of bedclothes were recorded with a very sensitive MKH-60P mike in a close-up position. The sound of the alarm clock that resounded in the hero's bedroom was made from 5 alarm clocks with different tones so that the sounds could be positioned in the whole surround channels in pre-mixing.

5.3.3 Recording of music in CR-509 studio

Music, put through a 3-2 mix-down process, was then recorded on the Fairlight MFX-3 Plus recorder using a magneto-optic disc. It was downloaded in the CR-602 studio.
Fig 14 shows scoring music recording.

Figure14:FUKADA-TREE 97’ SUPER CONCERT

5.3.4 Pre-mix

Pre-mixing and subsequent processes were performed in the CR-602 studio. Table 1 shows track assignments for dialogue, music and sound effects. The introduction, lasting about 8 minutes from the start, incorporated a variety of sound designs. They are explained below.

FAME NO-1

FAME-2

1

DIALOGU-A

L

MUSIC

L

2

C

C

3

R

R

4

DIALOGU-B

L

SL

5

C

SR

6

R

7

DIALOGU

SL

8

SR

TEMPMIX-1

L

9

SFX-A

L

C

10

C

R

11

R

SL

12

SFX-B

L

SR

13

C

TEMPMIX-2

L

14

R

C

15

SL

R

16

SR

SL

17

SFX-C

L

SR

18

C

FINAL-MIX

L

19

R

C

20

SL

R

21

SR

SL

22

SR

23

Lt

24

Rt

table1:track assignments for dialogue

5.3.5 Sound design for the jungle

We will describe how these acoustic scenes were assembled following the illustrated continuity. The scenes, coming at the start of the drama, had to impress the audience. We imagined the following visual scenes:

A close-up view of a big parrot resting on a tree somewhere, but it is not known that it's a jungle. The parrot flies up going from SR to FL.

The camera sinks with a jungle unfolding in a long shot. The camera on a dolly further sinks until it catches a girl walking through a bush with a cat in her arms. The camera sinks to the girl's foot level and follows her moving feet. FL-SL and FR-SR lateral sounds are also moving together. The camera rises and pans to catch the God of Forests who spreads his dark wings saying in a loud, resounding voice, 'Buenas tardes!' Fascinated by the voice, the girl sighs.

Sound materials used for this were the flapping of wings; the girl going pit-a-pat through the bush, trotting and then halting; the big bird flapping up into the sky; and the rustling of grass and plant leaves.

To blend these sounds with sounds in the jungle, we prepared four types of stereo sound as the jungle sound base, collecting close-up sounds of shouting monkeys, twittering birds, running water and rustling grass from the Imai Library. God's resounding voice was made by putting an original monologue in the center channel then reducing it 5% on the L-channel side and 3% on the R-channel side using the pitch change program of Lexicon 480 and putting the results in position. To produce a sensation of entire vastness using the Big Voice program of DSP-4000, we positioned another sound effect component in the SR and SL channels. Also, we added sub-woofer components to the FR and FL channels using a sub-harmonic synthesizer to produce a heavy bass feeling in the front.

5.3.6 advantage of 3-2 mixing procedure

Production of the radio drama revealed the following features of the 3-2 mixing

1.The presentation capabilities of the 3-2 discrete layout are attractive also for
Radio dramas because all the channels have equal frequency characteristics.

2.The center channel should be used for parts of dialogue, such as narration and
monologue, while ordinary conversation should be positioned in a phantom
position.

3.The production-monitoring environment is fairly good if sub-woofer components,
called LFE, are distributed to the front L-R channels.

4.Switching the monitoring environment between 3-2 and Dolby 3-1 in pre-mixing can
maintain an approximate balance. Sound stage deviation toward center due to
Dolby steering logic and sound field bleeding on account of crosstalk must be
tolerated.

5.During production, monitor levels were set for 85 dB/channel in pre-mixing and 82
dB/channel in mix-down. No problem is likely at these levels for reproduction in
the home

Conclusion and Pending Issues

We have discussed the application of a multichannel surround sound recording method for drama and music production. Development of such recording and production methods has just started in Japan and overseas. However, the following issues remain:

1)The methodology has evolved from 2-channel stereo to 3-channel stereo and then to 3-2 surround. How many channels will eventually have to be installed for ideal
spatial reproduction?

4) Does the sound field of multichannel music reproduction have a broad sweet spot?

5) What level of deviation from the appropriate arrangement should be allowed in the
home music reproduction environment?

6) Is there any adequate down-mix method to secure compatibility with stereo?

7) How to ensure an appropriate dynamic range for various audibility levels

8) Development of a practical method of using the center channel for music

9) Is there any way of effectively using the sub-woofer band, called the LFE channel,
for music?

We believe that multichannel surround sounds offer tremendous potential as a presentation means since broadcasting is not only an entertainment medium but also can be used to create new arts and other possibilities. In conclusion, we would encourage software producers to continue their work.