ISDL Resources - Speech Modulation Analysis Demo*

*Every sound file on this page was generated using the Matlab script
"application1_speechAnalysis," which is included with the
Modulation Toolbox.

We acknowledge the support of the U.S. Air Force Office of Scientific Research and the U.S. Office of
Naval Research in the development of this toolbox.

Introduction

This page presents an informal comparison of how coherent and incoherent demodulation methods represent
the linguistic content in recorded human speech. Specifically, how much of speech intelligibility is
in the envelopes (or "modulators"), and how much is in the fine structure (or "carriers")? As shown next, the
answer depends on which modulation decomposition we use. The harmonic coherent model cleanly
defines envelopes and carriers such that only envelopes contain speech information, leaving the carriers to
represent the speaker's pitch. By contrast, the distinction is more dubious in the incoherent Hilbert
envelope decomposition, in which the carriers are extremely noisy and even intelligible when listened to by
themselves. The following is a series of demonstrations to support these claims.

We begin by using harmonic coherent analysis to decompose a recorded speech signal into harmonically-related
carriers and complex-valued envelopes. Rather than play the original speech signal, we will instead reconstruct
the signal from its harmonic modulation decomposition, one component at a time. Below is the spectrogram for
one carrier, which is a frequency-modulated tone following the fundamental of the speaker's pitch.

What we have seen up until now is the synthesis of speech from a time-varying filterbank
in which individual subbands track harmonics in voiced speech. Each harmonic "subband signal" consists
of a high-frequency carrier (the "fine structure") that multiplies a low-frequency modulator (the "envelope").
How much does speech intelligibility depend upon the carriers? Replacing each modulator with a
flat DC term, we can listen to the "carriers-only" signal below:

The carriers-only representation is not intelligible and therefore contains no linguistic
information (aside from secondary pitch cues). The harmonic decomposition has thus reduced the speech signal
to a high-frequency, harmonic buzz plus a collection of low-bandwidth, information-bearing modulators.

Perhaps a better test for intelligibility of the envelopes is to replace the carriers with fixed-frequency
sinusoids that themselves contain no speech information. Despite occasional tonal inflections in the
modulators (a result of imperfect fundamental-tracking), the fixed-carrier synthesis is completely intelligible.
This implies that the envelopes, not the carriers, contain the information in speech.

As a final test, we compute broadband spectrograms for the 16-harmonic original-carrier synthesis and
the 16-harmonic fixed-carrier synthesis. In the side-by-side plots below, the spectrograms depict
very similar resonant structure in the time-frequency plane (only the first 2 seconds are shown). This
is a visual confirmation that replacing the original carriers with neutral tones has not changed the
speech content of the signal.

The conventional demodulation method for speech is to use a fixed filterbank to obtain subband signals, and
then rectify and smooth each subband to find its envelope. Such methods are "incoherent" because they do
not use an explicit carrier estimate to demultiply the subband signal; instead, the envelope is defined
as the subband magnitude and the carrier takes on whatever fine structure remains. The often-used Hilbert
envelope is one such method of incoherent envelope detection. We now analyze the same speech signal as above
using incoherent Hilbert demodulation.

Using 16 fixed subbands spaced evenly between 0 Hz and the Nyquist rate of 8000 Hz, we sum together the
Hilbert carriers to produce the incoherent carriers-only signal below. From the spectrogram and the audio,
it is clear that the Hilbert carriers contain a large amount of broadband noise. More surprising is the fact
that the speech itself is still audible in this carriers-only synthesis!

Finally we examine the broadband spectrogram of the incoherent carriers-only synthesis,
in which spectral resonances can be seen. This is a visual confirmation of the presence of speech
information in the incoherent carriers even without modulation.
Thus the incoherent decomposition does not clearly distinguish
between envelopes and carriers with respect to intelligibility, as was done in the coherent decomposition.

The Modulation Toolbox for Matlab
is publicly available for non-profit research purposes. All of the sound files on this page were
generated using the toolbox, which includes the
modulation spectrogram GUI, as well as standalone functions that can be used for a variety of
experimental topologies.