Video Production Stack Exchange is a question and answer site for engineers, producers, editors, and enthusiasts spanning the fields of video, and media creation. It's 100% free, no registration required.

Given that most people can't hear so many frequencies above 20kHz anyways, I've never understood the exact arguments for using sampling rates above 48kHz. At 48kHz, I understand that it is easier to construct a lowpass filter with a bit of higher bandwidth to remove aliasing, but I don't understand why anybody would want to record at 96kHz.

For projects which are strictly digital, ie, using pure digital synthesis and not recording any material which would be converted from analog -> digital, is there any advantage to using sampling rates above 44.1kHz?

For everything else, is there any benefit at all to using 96kHz? Is it beneficial when applying some particular type of DSP operation later on? Or is it purely a placebo effect for the ear?

Note: There are other questions here asking about which sampling rates to use for various types of recording projects, but here I am asking for real, hard facts for any mathematical or DSP-related reasons supporting use of higher sampling rates.

6 Answers
6

I always use double sampling rates if possible, for two important reasons.

First reason: to get rid of the characteristics of the anti-imaging filter when working with analog sound sources. What is an anti-imaging filter?

Let's say I am recording on 44100 Hz.
If I would record a sine wave of less then 10 KHz, you could clearly see the sinewave when you plot the sample values in a graph.
If I sample a sine wave of 0dB FS with a 22,5KHz frequency, the samples read 1 and -1 alternately.

Now, here's the problem. If I record a sine wave of 0dB FS with a 30 KHz frequency, and plot the samples, each sample is taking more than half a sine period, and - if you would play back the samples - it would return an 11KHz sinewave. (If you don't believe me, just make a simple drawing.) This behaviour is called the 'imaging effect'.

This means that before sampling the signal, we have to be sure that there are NO frequencies present what so ever above the so-called "nyquist frequency" (which is half the sampling rate). When using digital sound sources that provide their sounds already sampled, this is not really that big a deal, since they can sometimes just be programmed to never generate a signal above half the sampling rate, or they can filter everything out using a linear-phase brickwall filter that has no effect on the rest.

But, if you are sampling signal from an analog source, this filtering is done before the signal is sampled. The only way to filter analog sound is by use of an electronical circuit. And since the filter is supposed to have a very steep curve, it will affect the frequencies within the audible range, even though the filter was not designed for it. Now there are quite some good filters inside A/D converters, so the problem is minimal, but it gets relatively irritating to listen to when you are working several days on 44.1 KHz audio, compared to using 96KHz. The filter that is going to be applied when you downsample 96 back to 44.1 is of course a digital filter, and is probably of a much better quality. And, it is only applied when you are completely done with all the work, so it won't bother you.

Second reason: to get rid of the characteristics of the dithering signal.

When you are recording in 24 bits resolution and you plan to have your master at 16 bits, you will need a dithering signal to mask away the rounding errors. Now noise is not a pretty thing to have in your recording and while broadband noise is best for masking rounding errors, noise shaping can be a big improvement applied to the dithering signal in order to make it less disturbing. Now if the recording was made using 96KHz, you can noise shape most of the dithering signal to frequencies higher than 24KHz, so nobody will hear them. The dithering noise is at the end of the recording finally filtered out, at the moment you downsample your project back to 44.1 KHz.

So, bottom lines:
Is it useful when recording analog stuff:

Yes, definitely. You have less disturbance from the anti-imaging filter and less disturbance from the dithering signal when used with proper noise shaping.

Is it useful when working with digital stuff that came right of my softsynth?

Yes, still useful if you plan to work with 24 bits, and mastering it down to 16 bits. You can gain a great deal with noise shaping the dithering signal.

"since they can sometimes just be programmed to never generate a signal above half the sampling rate" Definitely true, however: "or they can filter everything out using a linear-phase brickwall filter that has no effect on the rest" I'm not sure that's possible. In order to filter out ultrasound from a digitally-generated wave, you would need to generate it at a higher sampling frequency in the first place (which would still alias, but not as much in the audible band). You cannot filter frequencies that are already aliased.
–
endolithSep 15 '13 at 14:32

3

"Now if the recording was made using 96KHz, you can noise shape most of the dithering signal to frequencies higher than 24KHz, so nobody will hear them. The dithering noise is at the end of the recording finally filtered out, at the moment you downsample your project back to 44.1 KHz." I don't think that's right, either. If you filter out all of the dither, then your output doesn't have dither anymore? It will go back to having quantization distortion?
–
endolithSep 15 '13 at 14:35

Re first comment: You're absolutely right. I think what I meant to say is that when you're using a digital effect, you can expect the frequency range of its output to be taken care of. Put it this way, if the output comes out aliased in the first place, upping your own sample frequency isn't going to change that. As to your second comment: interesting; it totally depends on the filters used pre-downsampling. If the noise was imaged back, it would obviously mask away the quantisation noise, but it wouldn't sound exactly the same. I think I would shape my noise around the final nyquist freq.
–
Pelle ten CateSep 28 '13 at 14:22

For projects which are strictly digital, ie, using pure digital synthesis and not recording any material which would be converted from analog -> digital, is there any advantage to using sampling rates above 44.1kHz?

Yes. Some examples:

Creation of frequencies you don't want

Aliasing from digital synthesis

Many square/sawtooth/triangle wave generators are naively-written, in that they produce an infinite number of harmonics, which are aliased and sound clearly bad. ([1, 1, 1, 1, -1, -1, -1, -1,] is not a correct square wave, and the aliased harmonics will produce radio tuning sounds in the background during portamento.)

If the sampling frequency is higher, this effect is reduced, because the aliasing frequency is farther away from the audio band.

Aliasing from digital distortion

Likewise, when you use any kind of digital non-linear distortion, it produces an infinite number of harmonics or intermodulation products. The ones that would be produced above the Nyquist frequency are actually aliased back into the audible range.

I'm not sure how much of a problem this is practically. Lots of things cause small amounts of distortion, like a compressor or volume fade, but the amount is already negligible, so the aliased amount is even more negligible. For heavy distortion, the aliased frequencies may also be not noticeable because they're buried in the noise. Regardless, higher sampling rate will help to minimize any harmful effects.

Lack of frequencies you do want

Another possible concern is that synthesized ultrasonic frequencies might become useful later in processing, even though you can't hear them directly in the recording:

Frequency shift from time changes

If you decide to slow something down for whatever reason, those ultrasonic frequencies will become audible frequencies. If you had filtered them out to avoid aliasing at the lower sampling rate, the slowed sound would be missing the high end.

Distortion/Modulation

As said before, distortion will create new intermodulation frequencies at sum and difference locations from the frequencies in the original recording. This time, we're concerned about desirable audible frequencies being produced by distortion/modulation of ultrasonic frequencies (not related to aliasing). If those ultrasonic frequencies aren't in the recording before distortion, the output will be missing the audible frequencies they produce, and it won't exactly emulate an equivalent analog effect.

Again, I'm not sure if this is practically a problem, but it's at least plausible, and higher sampling rates that include ultrasound will improve it.

In general, working at higher sampling rates gives "headroom" to prevent problems with effects and stuff that may not be implemented correctly. Like photocopying a photocopy, the better the quality of each copy, the less degradation there will be in the final product.

This is not to say that higher sampling rates are a good idea for playback of the finished mix. They're not. As described above, distortion of ultrasound can produce audible sound, and loudspeakers are the least linear thing in the audio chain, so you want to eliminate any ultrasound from the final mix to prevent it from being distorted by the speaker. There's no benefit to higher sampling rates for music playback; they should only be used in the recording and processing stages. See 24/192 Music Downloads ...and why they make no sense.

To have headroom for effects is a theoretically (and practically) valid reason to have a higher sampling rate than twice the human hearing limit.

The reason for this is easily visualised by comparing with image editing – if you only have say 800x600 px image with an overall shot of a high contrast brick wall, fishnet, striped textiles, or other finely spaced high contrast texture, you can only rotate in 45° multiples without causing a moiré effect and blurring the details. With audio, the distortions that occur with editing have different terms, but the same Nyquist-Shannon sampling theorem principles apply. Aliasing is a more commonly used term than "imaging effect", for the event that the sampled sound has frequency content above half the sampling rate (called Nyquist frequency).

In practise, like Pelle ten Cate already explained, a brick wall low pass filter is not achievable, but there is always some gradient (slope) on the cut off.

Another good reason to record with higher sampling rates is to achieve a more precise stereo image, as human hearing in large part relies on small time differences (about 5-20 ms, and physically these are phase differences) between ears to localise sound sources. The heads "shadow" and other aspects also play a part.

With the audio CD sampling rate of 44100 Hz, each sample represents 22,6 microseconds, and for example one period of a 882 Hz frequency has 50 samples. Also, a rather long delay of 20 ms delay lasts 50 samples. So, only 25 samples at that middle frequency means a 180° phase cancellation.

So, 44,1 KHz sampling rate is just good enough, but does not really have much headroom for editing.

Another thing that should be kept in mind is to use dithering (just as in image editing) to prevent quantization noise. And next you will ask, should I use 24 bit quantization instead of 16 bits...?

Has it been shown that ultrasound still has an effect on our stereo perception even though we can't consciously hear it?
–
endolithSep 15 '13 at 15:13

1

No, the effect of interaural time differences on stereo image has most effect on low frequencies (below 1500 Hz), where the distance of ears is shorter than the wavelength, so there is a phase difference. On higher frequencies, the difference in sound level has more effect on sound localization. See: en.wikipedia.org/wiki/Interaural_time_difference#Duplex_theory
–
peterhilSep 15 '13 at 18:23

Another good reason to use a higher sampling rate is to work around deficiencies of plugin implementations. Many plugin writers do not properly take into account the bandwidth-expanding effects of nonlinear signal operations, and as a result you can get aliasing effects before you leave the box.

For example, a compressor is basically a voltage-controlled amplifier... it multiplies one signal (the audio signal) by another signal (the gain). Multiplication of 2 signals is also known as ring modulation or heterodyning; it has the effect of producing sum and difference signals of the 2 inputs. If you multiply a 15 kHz sine by a 10 kHz sine, you get an output signal that has a 5 kHz and a 25 kHz component. If your compressor's gain has a very fast attack, and the input signal has a wide bandwidth, the "sum" component signal could easily go over the Fs/2 limit on a transient basis, resulting in spurious aliased low-frequency junk in your output signal.

The real fix for this is for the plugin to be implemented using oversampling internally, but if you can't get that the next best thing is to run the system at as high a Fs as you can. You won't have any actual audio content up in the stratosphere but you are protected against some plugin blowing past the boundary.

For what it's worth, the mathematical rationale, at least to the needs of the audio world, is generally described by the Nyquist-Shannon sampling theorem, sometimes just referred to as the Nyquist theorem, which in basic language just states that to fully reproduce a waveform with a max frequency n Hz, you need 2n samples per second.

While this is not 'untrue', working with 24 bits introduces the drawback that you have to use dithering if you want to go back to 16 bits. Dithering noise can be reduced hugely if applied to a 96 KHz signal when using noise shaping. (see other answer for details)
–
Pelle ten CateDec 11 '10 at 15:29

1

All professional audio software works with 32 or 64 bit floating-point internally during the mix, regardless of the bit depth used during recording.
–
leftaroundaboutNov 23 '12 at 21:42

3

@PelletenCate if you work with 16 bits, you are already screwed, because you add quantization noise at each non-trivial editing step. This is very wrong to imply that working with 24 or more bits introduces such drawback.
–
Sarge BorschSep 15 '13 at 12:01

I +1'd that. I should not be describing that as a drawback, because it's not. I should however say that both quantisation noise and unshaped dithering noise are audible on a 44/16 mix. My point is that by switching to 24 bits you give yourself the opportunity to change one issue (quantisation noise) for another (dithering noise) that can effectively be diminished by recording in a higher sampling rate.
–
Pelle ten CateSep 28 '13 at 14:30