Sine Wave Synthesis

Question: Which acoustic elements are essential for the perception of speech?

Answer: None.

How can we be so sure? Studies using sinewave replicas of natural utterances promote this
conclusion.

What is the evidence? This page summarizes the findings of research done by Robert Remez and
Philip Rubin, and their colleagues, and provides examples of sinewave
synthesis for you to hear, along with information about this technique.

Introduction

Most familiar synthetic speech aims to copy natural acoustic elements
meticulously. That is why synthetic speech sounds voicelike, despite the
mechanical quality of its articulation. In contrast, sinewave replication
discards all of the acoustic attributes of natural speech, except one: the
changing pattern of vocal resonances. By fitting 3 or 4 sinusoids to the
pattern of resonance changes, sinusoidal signals preserve the dynamic
properties of utterances without replicating the short-term acoustic products
of vocalization.

If speech perception depended upon the particular sounds produced by
talkers (the pop of the "p", the hiss of the "s", the hum of the "m", the click
of the "k", or the buzz of the "z"), then sinusoidal signals lacking these
attributes should not evoke impressions of consonants, vowels, words, etc. In
fact, listeners who were asked to identify sinewave signals, reported "bad
electronic music," "radio interference," etc., and no speechlike qualities.
However, when asked to transcribe a "strangely-synthesized sentence," listeners
readily reported the words of the natural utterances on which the sinewave
signals were modeled. Below is an interactive introduction to the
phenomenon of sinewave speech. Please judge for yourself.