Sound system lets listeners
move

By
Kimberly Patch,
Technology Research NewsReal-life audio is complicated -- our brains
can keep track of sounds as they move around us even when we are moving
around as well.

Researchers from the University of California at Davis have designed
a relatively inexpensive spatial sound system that accounts for the user's
movement as it creates the impression of sounds in the space around him.

The headphone-based system could be used in remote listening situations
like teleconferencing, surveillance and teleoperation to allow people
to hear events as they happen, said Richard Duda, a research engineer
at the University of California at Davis. It could also be used to make
spatial audio and video recordings, and for immersive interactive multimedia
like computer games, augmented reality systems, and industrial and military
training, he said.

People locate sounds in three dimensions -- azimuth, or left versus
right, elevation, or up versus down, and range, or near versus far, said
Duda.

The primary cues for azimuth are the loudness and timing differences
of a sound arriving at the right versus left ear. The primary cue for
elevation is how the outer ears change the sound. And the primary cue
for range is loudness -- the difference in energy of the sound that arrives
directly versus sound that arrives later reflected from environmental
surfaces. More subtle cues include torso reflections, familiarity with
a sound source, and visual cues.

Binaural recordings, which reproduce sound in space, are routinely
made by placing microphones in the ears of a dummy head. This reproduces
sound at the location where it is heard and accounts for the changes in
sound produced by the shape of the head.

The trouble with this method, however, is that it is only accurate
when the user is facing the same direction as the dummy.

People are sensitive to how sound cues change when they turn their
heads, said Duda. "When the cues all change consistently, the perception
of a well-defined spatial location for the source is strong; if the cues
are inconsistent, the perception can be so vague that the listener has
no idea where the source is," he said.

One way to address this is to use a head tracker to determine
the location of the user's ears and a servo mechanism that rotates the
dummy-mounted microphone array in real-time to match the user's head movements.

The researchers' system overcomes the need to move the dummy head
by embedding a series of microphones into the dummy head and sampling
from the nearest microphone as the listener rotates her head, said Duda.
"It... occurred to us that the dummy head was really only sampling the
sound field in space, and that instead of turning the dummy head physically
it should be sufficient to sample of the sound field with multiple microphones
around the head," he said.

In effect, this puts virtual copies of the listener's ears in
the sound field and moves them as the user moves his head in order to
capture the dynamic cues that are missing in conventional binaural recordings,
said Duda.

Key to method was finding a way to blend, or interpolate, signals
from two microphones when an ear was between microphones, said Duda. "The
key technical question is how to interpolate the signals without introducing
spectral distortion or requiring an unaffordable number of microphones."

The most obvious method would have required at least 128 microphones
to provide CD-quality sound. The researchers found a way to use eight
microphones for speech and 16 microphones for music. "Our breakthrough
came from the recognition that because humans are not sensitive to differences
in arrival time above about 1.5 kHz, we only had to interpolate the low-frequency
components."

The researchers' recording device also has a torso. Removing the
torso not only changes the perceived elevation of a sound source but also,
surprisingly, the perceived distance of a source, said Duda.

The researchers' prototype doesn't possess outer ears, which for
some people causes a significant shift in the apparent elevation of sounds,
particularly for sounds in front, said Duda. "Or near-term goals are to
understand why some people experience greater shifts than others, and
to develop procedures for correcting these problems."

The researchers' ultimate goal is to create a mixture of real
and computer-generated sounds, images and other stimuli to provide a compelling
experience of being actually present in a remote or synthetic environment,
said Duda.

Although sound is of central importance in human communication
in general, including telephones, radio and television, it is woefully
underused in human-computer interaction, said Duda. Especially with the
advent of portable systems, sound is likely to become more important in
human-computer interaction, he said.

The basic audio system is ready now, said Duda. The researchers
have a rudimentary audio/video demonstration working in the lab, and could
produce a low-to-moderate-quality practical audio/video application within
a year, he said. High-quality audio-video applications could be practical
within three years, he added.

Duda's research colleagues were V. Ralph Algazi and Dennis M.
Thompson. The researchers presented the work at the 116th Audio Engineering
Society Convention in Berlin, Germany May 8 to 11, 2004. The research
was funded by the National Science Foundation (NSF).