How does auto-tune pitch correction work?

With the return of The X Factor to TV screens, we look at the physics behind pitch correction.

Credit: Antares Tech

Auto-Tune software in action.

Back on the TV schedule for the autumn, the singing competition programme The X Factor was criticised last year after it emerged that the contestants’ voices were being altered via Auto-Tune.

Producer Simon Cowell subsequently banned any further use of the technology on the show.

But how does audio signal processing help to make singers’ voices sound better?

What would you do if I sang out of tune?

Few singers are perfect. Sometimes, the pitch of their vocal slightly misses the exact note they’re trying to hit.

If they are a little out of tune, the vocal track can still be rescued – or ruined, depending on your point of view – with a little help from the science of signal processing.

The pitch of a note is dependent on the frequency of the sound wave produced – the A above middle C is usually defined as 440 Hz. Therefore manipulating the frequency can produce a different note, or hit an exact note from a noise that is slightly off-key.

Musical scales are divided into 12 pitches each separated by a semitone – the difference in note between two adjacent keys on a piano or frets on a guitar neck. The goal of pitch correction is to retune a slightly high or low note to the nearest semitone.

In the system usually used by MIDI instruments in which pitch is assigned a number, with the 440-Hz A being 69 and each semitone increasing or decreasing the pitch number by 1, it is related to frequency f by a simple formula.

If an attempt at singing that A note actually came out at, say, 445 Hz instead, then using a computer to correct the frequency back down would ensure that the recording sounds in tune.

Sound engineers can’t simply change the frequency by itself, however.

Because the frequency of a wave is related to its speed via its wavelength, the duration of the sound would change too – this is why sped-up tapes sound chipmunk-like.

The frequency can be altered without changing the speed by going digital.

Music by numbers

Although it is possible to alter analogue signals – those based directly on the electrical signal generated by a microphone, or by a guitar pickup – a wider range of effects is possible when working with digital signals.

A digital signal uses discrete values rather than continuous ones, so converting an analogue signal requires taking sets of discrete points or samples. (Higher sampling rates more closely approximate the original sound).

The green line is the continuous analogue signal. The blue dots are the points at which it is sampled.

These digital signals can be altered so that a sound produces the correct musical note by using a phase vocoder.

This works by first changing the duration of the sound without altering its frequency, and then changing the frequency to both hit the correct pitch and restore the original duration.

The name comes from its use of the signal’s phase information to manipulate the signal in the desired way.

It breaks an audio signal down into many small, overlapping frames and then changes the spacing of those frames to change the total duration of the sound. In practice, this is a complicated task that requires the use of the advanced maths of Fourier transforms to convert the signal into a form that can be manipulated in this way.

The sound is then resampled to take it back to its original duration and hit the desired note.

As guitarpitchshifter.com explains, if the aim was to double the frequency then this would be as easy as picking one out of every two samples and constructing a waveform from those. But to fit the signal back into its original length when not scaling by an integer, interpolation is used to determine which bits of the sample should go at which points.

For imperfect singers to remain perfectly in key, we have this piece of maths and physics to thank – or to blame.