Digital Audio - Sampling

Sound, as we know by now, is a cyclic variation of atmospheric pressure. A microphone will pick up this variation and reproduce it as an electric signal. The transformation of the signal from analogue (continuous) to digital (discreet) is called sampling. The etymology of this word comes from the fact that samples are literally "picked up" from the original signal, undergoing this process at regular time intervals. So, samplers are circuits that pick up samples (each sample is equal to the signal's amplitude the instant in which the sample is taken) from the analogue signal at a constant rate. The latter is identified by a frequency named sampling frequency. The following diagram illustrates a continuous signal and its sampled version:

Sampling of a sinusoid

Each stored sample represents the amplitude of the original signal at a certain instant in time. We could consider taking each sample and storing it on an appropriate support system and afterwards, when reproduction is about to take place, use a circuit that re-transforms each sample into its corresponding electric voltage; if we then connect everything to an amplifier and a loudspeaker we'd be able to listen to the sound we had previously sampled.

However, in this conversion some things go astray. What happened to all the intermediate voltages between one sample and the other? Not a trace to be found. But are they really necessary? Let's dig a bit deeper to find some answers, and let's start to bring numbers into the equation too.

The audio signal can be broken down into the sum of single sinusoids each with its own frequency, amplitude and phase. Since the human ear is limited in its perception of frequencies, audio signals can be considered to be band-limited. In other words the sinusoids they are made up of, have frequencies that are confined within a limited interval. An audio signal's typical band ranges from 20 Hz to 20 KHz.

Nyquist's theorem states that when sampling is carried out at a frequency equal to at least double the band of the signal being sampled, the transition from analogue to digital takes place without any loss of information. This means that, when returning from digital to analogue, once we will have re-converted the samples into voltage values, we'll obtain the exact same sound we had before the sampling process took place (in order to listen to the sound we need a loudspeaker which in turn needs to be fed by an electrical signal).

Unfortunately, in the chain of operations carried out to recover the analogue signal starting from stored samples, we'll nevertheless have some loss of information if we compare it to the original signal. This loss isn't due to sampling as such, which when carried out respecting Nyquists theorem doesn't generate errors, but is due to the memorization process of the samples. More specifically within this process a quantization operation takes places, which we will shortly take a closer look at and which is the true cause of the loss.

To sum things up: if we sample a signal that is band-limited at a frequency that is double its band-width and we do not store, or in any case, manipulate the samples, it will not be deteriorated (in actual fact this last statement isn't fully true; its validity belongs to the theoretical sphere which doesn't, by its very nature, take into account the limits of physics. But it's still too early for us to venture into such dissertations which we will investigate once things become clearer). In this sense Nyquist's theorem identifies the smallest possible number of samples through which one can reconstruct the original wave-form without loss of information.

We will not deal with Nyquist's theorem in this course (you've been let off the hook); we shan't disturb its natural habitat in signal theory textbooks, it seems to us to be more useful to give a practical explanation of the theorem.

Sampling a signal at less than double its band would mean extracting an insufficient number of samples. This implies that very high frequencies within the signal wouldn't have enough samples to reconstruct them; they would give back a lower frequency. This is called aliasing frequency and being low, it is part of the audible band. Thus, we will have added to the initial signal a frequency which didn't exist before the sampling process, whereas we will have lost the high frequency.

For the audio signal, let's choose a sampling frequency of 44.1 KHz (if this number sounds familiar that's because it's the frequency used for music CDs).

The value of the aliasing frequency is given by the empirical formula (which is a simplified approximation of a more precise yet elaborate mathematical formula):

fa = fc - very high frequency

Let's say we superimpose a frequency of 30 KHz to an audio signal. This frequency is well out of the audible band and it would turn out to be undersampled if we were to use a sampling frequency of 44.1 KHz (whereas to have it back properly we should use a sampling frequency at least equal to 60 KHz):

fa=44.1 KHz - 30 KHz = 14.1KHz

In the following diagram we can observe an aliasing frequency superimposed over an under-sampled sinusoid:

Aliasing frequency

What happens then if the audio signal we wish to sample contains frequencies that are greater than 20 KHz? We shouldn't bother ourselves width them, because they'd be outside our audible band. However, once the sampling operation has taken place at 44.1 KHz, the frequency would be undersampled, hence it would pop up in the audible band in the guise of an aliasing frequency. To avoid this problem we need to filter out the audio signal of all the frequencies above 20 KHz before they reach the sampling stage.

The following table illustrates some typical sampling frequency values and their usages: