Evaluating the applications of spatial
audio in telephony

ENGLISH ABSTRACT: Telephony has developed substantially over the years, but the fundamental auditory model
of mixing all the audio from di erent sources together into a single monaural stream has not
changed since the telephone was rst invented. Monaural audio is very di cult to follow in
a multiple-source situation such as a conference call.
Sound originating from a speci c point in space will travel along a slightly di erent path to
each ear. Although we are not consciously aware of it, our brain processes these spatial cues
to help us to locate sounds in space. It is this spatial information that allows us to focus
our attention and listen to a single speaker in an environment where many di erent sources
may be active at the same time; a phenomenon known as the \cocktail party e ect". It is
possible to reproduce these spatial cues in a sound recording, using Head-Related Transfer
Functions (HRTFs) to allow a listener to experience localised audio, even when sound is
reproduced through a headset.
In this thesis, spatial audio is implemented in a telephony application as well as in a virtual
world. Experiments were conducted which demonstrated that spatial audio increases the intelligibility
of speech in a multiple-source environment and aids active speaker identi cation.
Resource usage measurements show that these bene ts are, however, not without a cost. In
conclusion, spatial audio was shown to be an improvement over the monaural audio model
traditionally implemented in telephony.