Pitch in Praat

Jan 15, 2012

This page uses SVG files for the figures, so that they don’t lose detail when
being resized. If your browser does not support SVG files, a PNG file should be
available as a fallback. If you have any trouble viewing the figures, I would
appreciate if you could drop me a line.

The analysis of periodic curves is best suited when working with continuous,
stable curves. However, the sounds of speech are nothing if not unstable, which
makes analysis more difficult. Praat overcomes this by assuming that speech is
sufficiently stable when looking at small enough fragments of it, which are
called ‘windows of analysis’.

This is the sound we’ll be working on: a complex sound wave with a fundamental
frequency of 140Hz and a harmonic of 280Hz.

The sound in the analysis window

Each window is filtered to make sure there are no intensity peaks on the edges,
which facilitates analysis. The filter used in this demonstration is a Hanning
window (Boersma, 1993). According to the
Praat documentation, a Hanning window is more responsive when working with 3
periods per analysis window, while a Gaussian window is better when working with
a larger analysis window (the Gaussian window is twice as large as the Hanning
window).

Praat indeed uses both these windows as default depending on the task and the
degree of precision that is required.

The Hanning filter function

We apply the filter by multiplying both curves.

The filtered window

To detect a sound’s pitch Praat uses autocorrelation, comparing each window with
itself.

An autocorrelation plot shows the degree to which the compared curves are
related on the Y-axis, and the time lag for each comparison on the X-axis. If
the curve is periodic, then there should be a peak on the autocorrelation curve
when the lag is equal to the original curve’s period.

The autocorrelation is highest at a time lag of 0, so we need to look for peaks
that are greater than 0 for significant periodicity. However, in this case,
since we are working with a complex sound wave with a loud harmonic, the
autocorrelation curve shows a false peak (red line) before the time lag that we
know is the sound’s actual fundamental frequency (blue line), which is alligned
with a lower peak.

Normalized autocorrelation of the filtered sound

In order to correct for this, we need to divide the filtered signal by the
normalized autocorrelation curve of the windowing function.

Normalized autocorrelation of the window function

The result is an estimate of the autocorrelation of the original signal, which
is both robust and better suited for the analysis of speech signals than
previous methods used. Note that this estimate gets increasingly unreliable
after roughly half the length of the analysis window
(Boersma, 1993).

Estimated autocorrelation of the original signal

And by finding the maximum at a time lag > 0 in this estimated curve, we can
calculate the pitch of the original signal converting from samples to Hz.

f_0 = 1 / (lag / f_s)

Update (2015/10/03): This page, which was originally written as notes to
myself, is by far the most popular page in this site. So I feel obligated to
clarify that this article covers only one improvement that can be done to the
use of the autocorrelation function for pitch detection. More robust algorithms,
such as the one actually implemented in Praat, make use of other corrections,
such as pathfinding to ignore sudden changes in f0 that are impossible for the
human vocal tract.

All figures in this page were generated with R with the commands that can be
found in this file.
Feel free to look at the definitions of each curve and to modify it for your own
ends.