The auditory system of living creatures provides useful information about the world, such as
the location and interpretation of sound sources. For humans, it means to be able to focus one's
attention on events, such as a phone ringing, a vehicle honking, a person taking, etc. For those
who do not suffer from hearing impairments, it is hard to imagine a day without being able to
hear, especially in a very dynamic and unpredictable world. Mobile robots would also benefit
greatly from having auditory capabilities.
In this thesis, we propose an artificial auditory system that gives a robot the ability to locate
and track sounds, as well as to separate simultaneous sound sources and recognising simultaneous
speech. We demonstrate that it is possible to implement these capabilities using an array
of microphones, without trying to imitate the human auditory system. The sound source localisation
and tracking algorithm uses a steered beamformer to locate sources, which are then
tracked using a multi-source particle filter. Separation of simultaneous sound sources is achieved
using a variant of the Geometric Source Separation (GSS) algorithm, combined with a multisource
post-filter that further reduces noise, interference and reverberation. Speech recognition
is performed on separated sources, either directly or by using Missing Feature Theory (MFT) to
estimate the reliability of the speech features.
The results obtained show that it is possible to track up to four simultaneous sound sources,
even in noisy and reverberant environments. Real-time control of the robot following a sound
source is also demonstrated. The sound source separation approach we propose is able to
achieve a 13.7 dB improvement in signal-to-noise ratio compared to a single microphone when
three speakers are present. In these conditions, the system demonstrates more than 80% accuracy
on digit recognition, higher than most human listeners could obtain in our small case study
when recognising only one of these sources. All these new capabilities will allow humans to
interact more naturally with a mobile robot in real life settings.