Quantum math models speech

By
Eric Smalley,
Technology Research NewsIt
is easy to tell whether a voice heard over the phone is that of a person
or a computer. This is a good indication that scientists still don't fully
understand how the human voice works.

Researchers at King's College London and Phonologica Ltd. are using
mathematical tools from quantum physics to address the problem. They have
found that the vocal tract shapes sound waves in a way that is more complicated
than the conventional wisdom, which is based on science from more than a
century ago, tells.

The researchers' concise model of the physics of speech could play
a significant role in improving telecommunications, speech recognition and
speech synthesis technologies.

The researchers' modeled the way the frequencies that make up sound
waves spread out when the waves encounter the dents and bumps that appear
in the human vocal tract during speech. Although wave dispersion is widely
studied in optics, in part because wave dispersion degrades optical communications,
the standard model of vocal acoustics does not take wave dispersion into
account.

The researchers' model can be understood in terms of sound waves
in a straight pipe, like an organ pipe, said Barbara Forbes, founder of
Phonologica Ltd. and a visiting research associate in physics at King's
College London. "We find that the natural resonance frequencies of a straight
pipe can be shifted up or down in a precise and controllable way by the
introduction of [dents and bumps] at particular places," she said.

The traditional view of sound wave physics holds that the maximum
shift in resonance occurs at the point a sound wave pressure node, or point
of minimum pressure, meets a change in the shape of the pipe wall. The researchers'
results, however, showed that the wave does not spread out at that point.
Instead, they found that complex effects near the pressure node are responsible
for the shifts, said Forbes.

The researchers are able to shift multiple resonance frequencies
independently, which is a key aspect of how humans produce speech. They
found that specific degrees of change in curvature at only six places in
the vocal tract are sufficient to reproduce 30 vowels sounds, said Forbes.
"[This is] enough to describe the basic systems of all the world's languages,"
said Forbes.

The researchers' model is a step toward providing researchers with
a simple method of analyzing and reproducing speech. Keeping things simple
is key to advancing speech-related technologies. "The search for a minimal
number of parameters to describe speech acoustics and the speech signal
has been going on since the 1950s," said Forbes.

Having a small number of parameters accurately represent speech
makes it possible to compress the relatively large amount of acoustical
information that makes up speech into a much smaller amount of digital information,
making it easier to transmit and store. "Mapping the full-bandwidth speech
signal onto a sparse representation or code is necessary for ultra-low-bit-rate
technologies such as mobile telephony," she said.

The model could be used in new approaches to speech recognition.
"Current systems work by statistical modeling alone, and make no use of
knowledge about either vocal tract physics or linguistics," said Forbes.
Current systems use statistical probabilities to match sound wave patterns
to phonemes. There "This is why they have such problems in adapting to natural
human... speech in normal levels of background noise," she said. "Our system
is based on [a] parameterization of vocal tract physics, and we believe
this will eventually lead to a more natural speech interface."

Speech synthesis has improved in recent years but still has a long
way to go to produce natural-sounding voices. The researchers have used
their model to generate vowel sounds, but the results are preliminary. "Really
natural speech synthesis will require incorporation of finer physiological
detail than we are currently considering," said Forbes. "For example, our
current simulations assume a rather simple model of excitation at the larynx,"
she said.

The researchers are extending their wave-mechanical model to consonant
sounds, and are using quantum mathematics to determine the parameters of
speech acoustics, said Forbes.

The researchers are aiming to have a prototype recognition system
ready for demonstration within two years, said Forbes. "The modeling of
connected speech processes will take a bit longer, say around 5 years,"
she said.

Forbes' research colleagues was E. Roy Pike Of King's College London.
They published the research in the July 30, 2004 issue of Physical Review
Letters. The research was funded by the UK Engineering and Physical
Sciences Research Council and IP2IPO PLC.