3D Audio on Headphones: How Does It Work?

June 23, 2016

Learn why listening to three-dimensional sound in the real world and listening on headphones are two different experiences – and how you can bridge the gap.

The perception of spatial sound in the real world is a complex phenomenon. It combines the interactions between the acoustic sound waves and the room or space, the interaction with our head and ears, the reaction of our middle and inner ear and the audio nerve, and finally our brain’s cognition and interpretation of the acoustic scene.

The perception of sound over headphones is a completely different experience. If we take listening to stereo headphones vs. listening to speakers as our example, there are several major differences:

1. Channel Crosstalk: The Stereo Magic
When we listen to a left/right speaker configuration, the signal from the left speaker arrives at both our left and right ears and is summed with the input from the right speaker. When we listen to the same content on headphones, the left ear receives only the left channel and the right ear receives on the right channel.

Channel crosstalk

2. HEADS UP! Head and Ear Filtering and Delays
After propagating through the air and before arriving at the eardrum, the soundwave undergoes a filtering and delay effect due to the size and shape of our head and ears. The wave front arrives at the ears at different times and with different frequency shapes. The delays and filters will depend on the angle from which the sound originates. When listening to headphones, this filtering and delaying effect is essentially bypassed and the direct signal is inserted almost directly to our eardrums (depending on the headphone type).

Ear filtering: Direction-dependent frequency change

3. Early Reflections: We Are Not Alone
In the real world, and even in the driest studio, the direct sound from the speakers is not the only thing that arrives at the ears. The soundwave interacts with the room by bouncing off the walls and other physical objects creating multiple highly correlated signals coming from numerous directions. These are referred to as early reflections, they also undergo filtering and delaying based on the direction from which they arrived. Our brain uses the gains, times of arrival, and directions of these early reflections relative to the direct source to estimate the distance of the source and the dimensions and acoustic characteristics of the listening space. Again, on headphones, none of this happens; only the dry signal is introduced to the ear, and there is no indication of how it will interact with a physical environment.

Early reflections

4. Head Motion: “And Yet It Moves!”
Since all of the above-described phenomena depend on the direction of sound, even the slightest nudge to our head causes the complete audio scene to shift in the opposite direction because the external world is not moving with the head. Now, this cue is as crucial as any of the others—perhaps even more so. Our brain, being highly sensitive to change, remembers where the sound used to be and where it is now, combines this with its knowledge that it itself (and not the source) has moved, and uses this information to locate the static external source. When we listen on headphones, the audio scene constantly moves with the head, contradicting any previous supposition the brain has retained regarding the location of sound sources.

Head movement

All of the above are imperative cues that the brain uses while continually making decisions about the location of sound sources. Now, our brain is not a rash decision maker, and it is not easy to fool. It has developed the ability to localize sound through millions of years of evolution. To know by listening where a predator might be lurking or where prey can be caught is obviously crucial for survival. When sound cues are missing or contradicting, the brain becomes confused until it ultimately gives up the attempt to locate sound, and the scene collapses into our head.

This is the experience of headphone listening. Since the cues that help us locate sound sources in space are missing, we hear the sounds as if they are nested within the head. All the elements we are hearing are crowded along the two-dimensional line stretching through the head from ear to ear instead of the three-dimensional space outside of our head.

Three-dimensional stereo image on speakers

Flat stereo image on ordinary headphones

The flatness of the ordinary headphone experience can have several negative consequences:

♦ Wrong or missing spatial image:When listening on headphones, we either fail to perceive or misunderstand the spatial intentions of the mix that the producer or recording artist wanted to convey.

For example, in the Beatles’ song “A Day in the Life,” the vocals start on the right channel and the piano on the left. Then, in the course of the song’s first verse, they gradually move towards each other, until by the second verse they’ve completely traded places. This is an essential part of the listening experience, and we can hear the transition properly occurring in the auditory space when we experience it through speakers. Listening to the song on headphones with flat headphone sound will not reproduce this auditory scene properly, and the experience will be greatly reduced.

♦ Mixing decisions: When you mix audio on headphones, contradicting spatial cues can be perceived as frequency or phase differences. This might cause the sound engineer to make incorrect or exaggerated EQ and/or mixing decisions that will become apparent only once the track is played on speakers.

♦ Listening fatigue: The inner-head experience created on headphones can cause listening fatigue since the brain is not used to this type of sensation. The brain is continuously trying to comprehend the spatial audio scene, but the cues are either contradicting or missing, leaving the brain in a constant state of confusion.

♦ Surround Sound: It is practically impossible to create surround sound on ordinary headphones, primarily because they cannot convey the surround image of sources located behind the listener.

Waves Nx technology has been developed in order to bridge the gap between listening to sound from external sources and listening on headphones. The Nx algorithm inserts all of the above-described missing cues into the signal in order to convince the brain that sound is coming from virtual speaker positions in space, with options for both stereo and surround.

3D stereo image on headphones with Nx

3D surround image on headphones with Nx

Nx does all this in a subtle manner, adding only the critical and global cues required in order to recreate the spatial 3D audio image, without otherwise modifying or coloring the sound. The filters and ambience are optimized to create a transparent-sounding room in order to minimize the frequency alteration, such that all changes are perceived as relating to space rather than equalization. By adjusting the sound to the user’s head movements, the 3D perception is created without dramatic changes to the frequency response.