Reverberation: Why two ears help and you can
not talk easily to your computer

Why is that it is so hard to understand someone who is talking on a
hands-free telephone? The most common reason is reverberation
caused by the sound waves bouncing around the room before being picked
up by the microphone. The same phenomena prevents automatic speech recognition
systems working unless the user holds a microphone very close to their
mouth.

In many text books on signal processing you can find as exercises
or examples the illustration of the fact that the process of reverberation
can be modelled as a so called ``linear time invariant filter'', which
is conceptually the simplest sort of filtering a signal can undergo.
It is also often suggested that an ``inverse'' filter can be used to
remove the effect of reverberation. (A process known as inverse
filtering or equalisation.)

This is where there is a large gap between what is conceptually possible
and practically possible. Inverse filtering of reverberation is
possible if 1) one can determine the exact filter that the room is effectively
applying to the signals and 2) that filter does not change.

Neither of these conditions are satisfied in practice. Even if one
could determine the filter (which would need exact knowledge of the
original voice signal), it turns out that the slightest changes in position
of the source change the filter drastically. So much so that in a typical
room, if the room filter was inverted exactly, and the source moved
by just a few centimetres, then the result would be worse than if no
inverse filtering were applied at all.

It is easy to see why this is so. To sound waves the walls of a room
behave like dirty mirrors: they reflect most of the sound waves hitting
them. Imagine yourself in a room with optical mirrors that reflect say
70% of the light that impinges on them and there is a single light bulb
in the room. You will see many reflections (a finite but large number
because although each time the light is reflected it loses intensity,
the number of reflections increases rapidly with the order of the reflection).
Exactly equalising the room essentially involves exactly cancelling
out all of the reflections by adding them together with the appropriate
phase (destructive interference of the waves). Now imagine moving the
light bulb slightly. The pattern of reflections changes in a very complex
manner. And even if one knew the geometry of the room, the slightest
deviations in exact geometry would mean that the pattern of reflections
would be very different.

So how to pick up sound in a reverberant room? Use more than one sensor,
which is exactly what our heads do and is why if you block one ear whilst
listening to someone in a reverberant room, they become so hard to understand.
(try it next time you are in a boring lecture!) How can it be done technologically:
by utilising constructive interference rather than destructive.
If one uses several microphones and their outputs are delayed such that
the signal coming directly from the the desired source is phase (time)
aligned exactly, then the direct signal is reinforced. The reverberation
is not removed entirely, but the resulting device is far more robust
to movements of the source.

Thus this illustrates a rather generic effect: if one tries to solve
a problem perfectly one often gets a very non-robust solution, but if
one only aims at a modest improvement, the solution can be intrinsically
robust.