The limits of human audio perception are well established running from 20 to 20,000 Hz on average with only young children being able to hear the high frequencies and the lower ones generally felt rather than heard. Right now music is sampled at 24 bits, but from everything I've read (there are a couple of articles on this forum that cover it pretty well) what you get is increased dynamic range (headroom), not a better representation of the sound.

I'd be interested in a hypothesis on why more is better, rather than the presumption. The best explanation for why more != better is gregorio's thread from two years ago.

Yes, 20 to 20 kHz is the established metric for the average range of human hearing. No disputes there. However, as stated earlier, higher sampling rates can yield more accurate frequency response within that audible range.

gregorio's explanation hinges on the idea that dither solves everything. "The result is that we have an absolutely perfect measurement of the waveform plus some noise. In other words, by dithering, all the measurement errors have been converted to noise." In other words, dither is perfect except for the fact that it's not perfect; it adds randomized white noise. Instead of adding random noise to the signal to reduce quantization errors, why not start with a more accurate representation of the signal to begin with?

Edit: Dig deeper than the first post on gregorio's thread and you'll see him correcting some of the mistakes in his assertions.

First off, the 20 to 20 kHZ range isn't average human hearing. It would be an absolutely extraordinary range of hearing for any adult. Most people who THINK they have perfect hearing don't. The great irony among audiophile communities are pundits having spent four decades in mixing rooms with loud rock bands somehow thinking their ears are in perfect condition. I'm going to go ahead and say that nobody posting in this thread really has that range of hearing right at this moment.

You're kind of misrepresenting Gregorio there - the pursuit of a more "accurate" representation of the signal becomes silly once you're operating far, FAR beyond the significant error of any of the tools or devices used to record, transmit, or interpret sound, including your ears. The last 8 bits of any 24 bit signal taken from ANY analog source, even a single microphone preamp, will be hard at work representing a stream of random 1s and 0s. There is no signal to be found there. Dithering, while free and harmless, is only theoretically important when the noise floor of the existing material is low enough that the few least significant bits of the 16 bit signal are actually following some kind of regular, and thus more theoretically audible, pattern. With 8 bit signals it's absolutely essential, but working with 16 bits, it's USUALLY going to be a luxury rather than an absolute requirement.

Quote:

Originally Posted by infinitesymphony

Yes, 20 to 20 kHz is the established metric for the average range of human hearing. No disputes there. However, as stated earlier, higher sampling rates can yield more accurate frequency response within that audible range.

gregorio's explanation hinges on the idea that dither solves everything. "The result is that we have an absolutely perfect measurement of the waveform plus some noise. In other words, by dithering, all the measurement errors have been converted to noise." In other words, dither is perfect except for the fact that it's not perfect; it adds randomized white noise. Instead of adding random noise to the signal to reduce quantization errors, why not start with a more accurate representation of the signal to begin with?

First off, the 20 to 20 kHZ range isn't average human hearing. It would be an absolutely extraordinary range of hearing for any adult. Most people who THINK they have perfect hearing don't. The great irony among audiophile communities are pundits having spent four decades in mixing rooms with loud rock bands somehow thinking their ears are in perfect condition. I'm going to go ahead and say that nobody posting in this thread really has that range of hearing right at this moment.

You're kind of misrepresenting Gregorio there - the pursuit of a more "accurate" representation of the signal becomes silly once you're operating far, FAR beyond the significant error of any of the tools or devices used to record, transmit, or interpret sound, including your ears. The last 8 bits of any 24 bit signal taken from ANY analog source, even a single microphone preamp, will be hard at work representing a stream of random 1s and 0s. There is no signal to be found there. Dithering, while free and harmless, is only theoretically important when the noise floor of the existing material is low enough that the few least significant bits of the 16 bit signal are actually following some kind of regular, and thus more theoretically audible, pattern. With 8 bit signals it's absolutely essential, but working with 16 bits, it's USUALLY going to be a luxury rather than an absolute requirement.

Agreed. I wasn't trying to claim that everyone has perfect hearing, just that the accepted range of human hearing is around 20 Hz to 20 kHz. That's from birth. Due to presbycusis and exposure to loud noise, frequency response diminishes over time.

Yes, systems capable of playing back 24-bit audio are still rare, but we have certainly surpassed 16-bit in terms of SNR and dynamic range specs commonly seen in decent equipment. I understand your point about dither and how it relates to the limits of human perception.

It seems that no one understands the point I'm trying to make because the focus is on absolute dynamic range. Yes, 16 bits is probably more than enough for most music in terms of absolute dynamic range. No one is asking for 144 dB peaks. But in what way can having more data and more possible amplitude values be a bad thing? Even if music only uses the top 5% of dynamic range, the difference in available amplitude values between 5% of 2^16 and 5% of 2^24 is large.

Here is our disagreement in a nutshell. You say that the extra information is not necessary because we have passed the limits of human perception. I say that the extra information is a positive step because I do not believe we have passed the limits of human perception, and I would want the extra info even if we had passed the limits. How good enough is "good enough," is the question. The answer is subjective, to a certain extent. This is why the debate goes in circles.

Agreed. I wasn't trying to claim that everyone has perfect hearing, just that the accepted range of human hearing is around 20 Hz to 20 kHz. That's from birth. Due to presbycusis and exposure to loud noise, frequency response diminishes over time.

Yes, systems capable of playing back 24-bit audio are still rare, but we have certainly surpassed 16-bit in terms of SNR and dynamic range specs commonly seen in decent equipment. I understand your point about dither and how it relates to the limits of human perception.

It seems that no one understands the point I'm trying to make because the focus is on absolute dynamic range. Yes, 16 bits is probably more than enough for most music in terms of absolute dynamic range. No one is asking for 144 dB peaks. But in what way can having more data and more possible amplitude values be a bad thing? Even if music only uses the top 5% of dynamic range, the difference in available amplitude values between 5% of 2^16 and 5% of 2^24 is large.

Here is our disagreement in a nutshell. You say that the extra information is not necessary because we have passed the limits of human perception. I say that the extra information is a positive step because I do not believe we have passed the limits of human perception, and I would want the extra info even if we had passed the limits. How good enough is "good enough," is the question. The answer is subjective, to a certain extent. This is why the debate goes in circles.

Well, we'll have to agree to disagree then. I think that the limits of human perception are pretty well established. Furthermore, there's nothing that says there's more information in the higher sampling rates. You're saying. "What's the harm?" I'm saying," Why bother?"

Even though my point was about bit-depth, you are reiterating one of gregorio's assumptions about sampling rate that was disproved a few pages into his thread. There are instruments with overtones that reach past 50 kHz, in particular a lot of metallic percussion instruments (cymbals, etc.). In any case, it is not necessary to have superhuman hearing to reap the benefits of using higher sampling rates. There are improvements in the audible range.

However, as stated earlier, higher sampling rates can yield more accurate frequency response within that audible range.

Not according to the Nyquist theorem. All the information is captured if the sample rate exceeds twice the highest frequency the signal contains.

Quote:

Originally Posted by infinitesymphony

It seems that no one understands the point I'm trying to make because the focus is on absolute dynamic range. Yes, 16 bits is probably more than enough for most music in terms of absolute dynamic range. No one is asking for 144 dB peaks. But in what way can having more data and more possible amplitude values be a bad thing? Even if music only uses the top 5% of dynamic range, the difference in available amplitude values between 5% of 2^16 and 5% of 2^24 is large.

The reason people focus on the dynamic range is that it's what is ultimately defined by the bit depth:

I was with you until you said a 30-year-old bit-rate standard is "already good enough." Good enough for whom? For some people, 128 Kbps MP3s are good enough. For some people, vinyl is good enough. However, there is always more to gain, even if not everyone can perceive the difference. Your comment about optical perception works well to illustrate my point. Human sight is not fixed at a certain frequency, it is variable. 24 Hz and 30 Hz are good enough for fluid motion, but they are nowhere near the limits of human perception. Your other point that a 2 MP and a 20 MP picture look the same when the larger image is made smaller does not really apply here; it would be like saying "16-bit and 24-bit sound the same when 24-bit is reduced to 16-bit." When you reduce the detail on purpose, it no longer matters.

I think "good enough" is when the noise level is quieter than your heartbeat or the blood rushing through your ears. If you need a custom sound proof room (the type that make your ears ring with the silence and let you hear your own blood) in order to hear your blood masking the sound of the lack of resolution I think thats "good enough". As for the Mpixel thing I actually had a similar conversation with a professor in signal processing who works with images. Think of the area of a printed picture as its "dynamic range" they both put limits on far apart our information is spread. So what I am saying is that a 20Mp image on a 20x20in print has more "resolution" than a 2Mp image on the same size print. Likewise the same song in 24-bit vs 16-bit would fit in the same dynamic range (or canvas in the image analogy). The only difference is how far apart each step in volume is. My argument is that even with 16-bit and a song with a full 96Db of dynamic range (a very big canvas) the step size is the scientifically accepted threshhold for human perception (roughly the volume of a mosquito flying 3m away according to wikipedia). The other thing to consider is that in audio we are talking about bit-depth which increases exponentially while the Mp topic is linear. Hence to compare apples to apples a change from 16 to 24 bit is like going from 2Mp to 131072Mp

PS I poked around the inter-tubes and found that the human limit of resolution in Mpixels for a 20x13in picture is closer to 74, so clearly image technology has a ways to go before truly matching human vision resolution.

* The amount of quantization steps determines the amount of quantization error.

* The amount of quantization error determines the amount of quantization noise.

* The amount of quantization noise determines the level of the noise floor.

* The level of the noise floor determines the dynamic range.

So the difference between 24 and 16 bits (properly dithered) is in the dynamic range.

The Nyquist theorem is perfect in theory, imperfect in practice. No real-world filters can brickwall from 22.05 kHz to 0 Hz without severely affecting the rest of the frequency range. To compromise, the cut-off frequency is moved further back into the audible range to allow for a smoother roll-off. The most effective solution to this problem requires using a higher sampling rate and moving the cut-off frequency further into the higher frequencies, which gives it the dual benefits of being inaudible (because the roll-off is outside of the range of hearing) and affecting the audible range less because a more gradual roll-off can be used.

I understand the definition of dynamic range and that bit-depth essentially is a measurement of possible dynamic range. My point is that if a higher bit-rate gives greater dynamic range overall, then it also affects dynamic range at a local level. This, IMO, is the whole point of using higher bit-rates.

The Nyquist theorem is perfect in theory, imperfect in practice. No real-world filters can brickwall from 22.05 kHz to 0 Hz without severely affecting the rest of the frequency range. To compromise, the cut-off frequency is moved further back into the audible range to allow for a smoother roll-off. The most effective solution to this problem requires using a higher sampling rate and moving the cut-off frequency further into the higher frequencies, which gives it the dual benefits of being inaudible (because the roll-off is outside of the range of hearing) and affecting the audible range less because a more gradual roll-off can be used.

That's why DACs upsample.

I understand the definition of dynamic range and that bit-depth essentially is a measurement of possible dynamic range. My point is that if a higher bit-rate gives greater dynamic range overall, then it also affects dynamic range at a local level. This, IMO, is the whole point of using higher bit-rates.

One is not going to hear subtleties of microdynamics that are some 76 dB below the main signal (and that's with a very quiet CD mastered at -20 dB, not to mention that proper noise shaping will allow to gain another 20-25 dB)

DACs employ oversampling, not upsampling (though some do also upsample), but it's still not equivalent to using a higher native sampling rate.

I've explained my point about local dynamic range as clearly as I can and people still think I'm talking about low-level signals. Perhaps someone else can explain the practical benefits of 24-bit better than I can, so I'll leave it up to them.

DACs employ oversampling, not upsampling (though some do also upsample), but it's still not equivalent to using a higher native sampling rate.

I've explained my point about local dynamic range as clearly as I can and people still think I'm talking about low-level signals. Perhaps someone else can explain the practical benefits of 24-bit better than I can, so I'll leave it up to them.

What I understand is that ADCs oversample, ie. they sample at a frequency higher than the nyquist frequency and that DACs upsample (let's say by a factor L), the simple method would be by adding L-1 zeros between two samples.

If by local dynamic range you mean creating more intermediate values between 2 fixed values, I believe I've already addressed this point. Going from 16 to 24 bit, you are going to get about 8 million more intermediate values between 0 and -6 dB, fine, but in reality the maximum difference you are going to get between the 16 and 24 bit signal is at 90 dB lower, which is in practice inaudible, and that's without even noise shaping.

24/96 for recording is plenty useful, but for playback, it's overkill.

The Nyquist theorem is perfect in theory, imperfect in practice. No real-world filters can brickwall from 22.05 kHz to 0 Hz without severely affecting the rest of the frequency range.

The theorem is not imperfect in practice, it's real world physics that (in some ways) complicate the implementation of the theorem. The theorem works exactly as predicted.

Quote:

To compromise, the cut-off frequency is moved further back into the audible range to allow for a smoother roll-off. The most effective solution to this problem requires using a higher sampling rate and moving the cut-off frequency further into the higher frequencies, which gives it the dual benefits of being inaudible (because the roll-off is outside of the range of hearing) and affecting the audible range less because a more gradual roll-off can be used.

The roll-off is not a problem. Modern converters run at MHz speeds and use advanced digital filters, so the frequency response can be completely flat 20Hz-20kHz in a 44.1kS/s system.

And modern filters are, of course, phase linear and the ringing is minimal.

Higher sample rates would have made more sense in the past, when good filters were expensive and difficult to optimize. A good converter will sound the same at 44.1kHz as it does at higher sample rates, but a converter with shoddy or improperly implemented filters will in some cases sound better at 48 or 96kHz.

If you want to optimize the sample rate to allow better performance from the worst converters, then it would probably be somewhere between 48 and 60kHz. Dan Lavry suggested 60kHz 10 years ago, but today 48kHz (or maybe even 44.1kHz) is probably enough:

I understand the definition of dynamic range and that bit-depth essentially is a measurement of possible dynamic range. My point is that if a higher bit-rate gives greater dynamic range overall, then it also affects dynamic range at a local level. This, IMO, is the whole point of using higher bit-rates.

I'm not sure what you mean by "local level". Do you mean at specific frequencies? The dynamic range refers to (unless a specific frequency range is specified) the total amount of noise in the system, but how the noise is distributed is antoher thing.

You can, for example, have 144dB of dynamic range at critical frequencies in a 16 bit system. Another example is DSD/SACD. The total dynamic range is only 1 bit (~6dB), but thanks to the high bandwidth (1.4Mhz) you can move large amounts of quantization noise to ultrasonic frequencies and get what's roughly equivalent to 20 bits of dynamic range up to 20kHz. That's also how most modern converters work (but with a few more bits and at higher speeds).

The goals when choosing bit depth (and noise shaping) are:

a) That the dynamic range is sufficient for the signal.

b) That the noise floor is low enough to not be audible in the intended listening environment at the intended SPL.

If both a) and b) are covered by the bit depth there are no additional benefits (at least not for playback) in using higher bit depths.