There are many questions in the medium of 360 film. For a long time, the main concern was the image – is it realistic enough? Now, sound in 360 film is becoming more and more important, as it aids in the immersion of the viewers. It is also a big challenge. For 360/VR to be truly immersive, it needs convincing sound to match. The role of the sound editor is to create the illusion of space.

In 360 it becomes tricky to create edits as the experience is jarring. The immersion could be disrupted quite easily. A problem with audio for 360 film is the directionality of it. What if the spectator doesn’t look where you may want them to? On the other hand, if directional sounds are persuasive enough, the viewer could be convinced to change their position.

Another concern is the placement of recording equipment. How do we conceal it from the viewer?

Main Techniques for Recording for 360 Film.

How can all the different sound recordings be compiled together, resulting in a ‘believable’ mix? These are the main techniques I discovered.

Binaural Audio

Binaural audio would be great for VR, if you could track head movement. There is a technique when you can make binaural recordings on a dummy head with microphones for ears. Using the resulting audio can be problematic. For example, you would feel if you’re in the soundspace, but you wouldn’t feel the same if you moved your head too much. This is because typical Binaural recording is static and records sounds from one exact position in space.

This is a quote with an analogy of why this happens: “A stereo pair of videos gives us the illusion of a 3D view, but doesn’t have information about how the world looks if we move our head. With binaural audio we do not have the information of how the sound would be if your ears were in other locations.(3)”

There are alternatives. Some have been using a setup in which multiple Binaural Recorders are used simultaneously. This is a picture of an Omni binaural recording setup by 3Dio.

Ambisonics.

Using a tetrahedral mic array. The information recorded really is not what should come out of the speakers, but the information can be decoded into audio tracks that can be played out of the speakers. Head tracking functions would determine a transform applied to the recorded sound. Disadvantages with this technique are that it’s complex, and finicky. It is also impractical: you would need to figure out a specific speaker setup for playback, and to decode the data for that setup. I don’t believe this is a candidate because it requires speakers.

Non-Head Tracking Strategies.

Another option is to not worry about head-tracked audio. An example presented in the article (3) is when the author of this article recorded a rock concert. He placed a clay modeled dummy head and put it above the audience to get the ‘feel’ of things happening all around. The author also suggests that doing voice over helps you not worry about head-tracking. Specifically, he used the (locationless) voice over supported by ambient sound for this.

Another strategy he used is to ‘read’ the viewer’s mind and not go super fancy with sound. For example if you know where they’ll probably look at, you don’t need head-tracking.

Go for more advanced plugins.

Some gaming development platforms, like Unity , provide spatialization tools.

Dolby Surround (Not really 360 – but interesting)

Surround sound was based on fixed audio channels, when implemented by Dolby.

But then, the channels would be sent to a specific speakers in the room.

This did not always work in larger rooms- like cinemas.

Dolby’s Atmos appeared in response, which allows for more control of where sounds are, without having to route the audio to specific speakers.

Psychoacoustics Principles Used

How do humans perceive sounds? The following ideas are taken advantage of to ‘fool’ the brain into believing the localization of sound.

The cues that the human brain needs to localize sound are challenging to recreate. One of these is Proximity. E.g. if the sound arrives to one ear before the other, the sound appears to come from that direction. Also, audio levels would differ, and that would also help determine the direction of the sound.

Some directions are hard to recreate – front or back are ambiguous. There are HRTFs (Head Related Transfer Functions) that have to do with the interaction of the sound with the person’s body – head, shoulders, ears and neck that help the brain solve the ambiguity. (As shown below, level of sound changes slightly depending on the direction of the incoming sound) But how can that be recreated in 360 film?

Every person’s HRTFs is different. This is why dummy binaural recordings don’t work for everything.

Recommendations/Strategies for Recording/Mixing

Aki Silventoinen provides good insight into his experience with 360 audio. He shares several recommendations, here compiled.

Comment: Recording dialogue with lavalier microphones – syncing the audio later could be complicated and there might be disparity between dialogue and lip movement.

Backup the Backups – He has copies of everything at least in three locations – and in the cloud.

To create the illusion of sound localization when the soundscape is fixed in stereo with no adaptive head tracking, the best you can do is to keep all the dialogue and sound FX in the middle of the stereo field as mono mix. This is not perfect, but stereo is limiting.

Keep everything that should be spatialized in mono format. This takes advantage of psychoacoustics – the brain tries to ‘reason’ what is the most likely source of the sound. ‘If for example, there is a person speaking in front of you in the 360 VR film, then your brain presumes that’s where the sound comes from. There is still a problem with this – when a person turns their head too much during dialogue lines.

Although it is possible to create panning automations in Pro Tools, don’t do that in stereo mixes in 360 videos. These automations don’t make sense when viewers turn their heads.

Ambience tracks and other audio tracks with less significance can be presented in stereo.

Use equalization to reduce the level of the ambient tracks so that there’s room for dialogue. Also, this helps un-localize these tracks. Specifically, use a parametric EQ (wide – with low Q) to reduce the 1-2kHz area and the 6-10kHz area. Why those frequencies: that’s where human hearing is fine tuned to listen to voice.

Do the opposite when you wish to increase the localization of a certain sound. Use the EQ to increase the level in those same frequency ranges. The maximum is -6dBFS (full-scale, where 0dB is the maximum level achievable).

Mix using headphones. Since this isolates the external soundscape, all the nuances in the sound are very noticeable. Using high sample rates is a good idea – 24/48 kHz. Also – use different models of headphones to check the mix.

It’s a very bad idea to let the mix peak near 0 dBFS. -6dBFS is maximum. Always check the level of the final mix. There level shouldn’t be above -3dB.

It is hard to justify the use of music in a 360 video. It might break the illusion, if it’s presented as a regular stereo mix. He recommends if music is used, it should be world-generated/ diegetic/ inside the virtual world.

When syncing the video to the audio:

Have reference track from one or two of the VR cameras. Otherwise it’s too hard to place the audio where it should go. (take it out/mute it at the end).

Perform phase-checking for all audio signals when syncing – if there is a delay between signals, they could interfere constructively or destructively (sound louder/ lower than they should).

Conclusions

The sound for the 360 film should be of high quality. For best results, it should be directional. The sound should be recorded and played back with two or more channels (stereo/binaural), and synchronized with the pictures. There are specific ranges of frequencies that are optimal for hearing human voice – this can be exploited. There are psychoacoustic principles that can be used to make the viewer more immerse in the virtual reality/360 experience.

Going Further – Tools existing worth exploring.

Pro Tools alone won’t save the day if you’re going for a truly immersive experience. You could use some third party plugins in game engines – Unity , for example to create a Binaural mix. ‘Two Big Ears’ is a virtual reality audio developer group that provides good tools for virtual reality 3D mixing. The software recommended by Aki Silventoinen is 3 Deception Spatial Workstation (by Two Big Ears) – now FaceBook 360 Spatial Workstation. It can be found at: