Ok, back to the original question of the thread. This is what I'd always thought was the reason for needing higher samplerates. Feel free to correct me if I'm wrong about this.

Firstly as humans, we can all hear up to roughly 22050Hz, and as nyquist worked out, to record all this this frequency and all the ones below it, you have to take a number of samples per second which is double that, so we get to 44100Hz aka 44.1Khz

However if you do this, although it might seem fine, the higher frequencies have less deatail the further up you go. A 22050Hz sound will only get an on and off point, but none of the smoothness of the curve, so it will be a triangle wave. Not to mention aliasing, which might make it capture a half on and half off point.

Now I've been told that this isn't correct, but from everything I've read online it seems to be.

There's also the more debatable issue that higher frequencies add to the warmth of the sound, even though they're inaudible on a concious level. Hence the term analog warmth, where valves would add random high frequencies to the audio.

Anyway thats just some of the stuff floating around in my mind I don't want another TOS 8 violation >_<

Somehow this question brings out all the know-alls whose world-view evidently depends on CD being a perfect medium. It really mystifies me how a person can become emotionally involved in this issue.In any case some mathematics comes as a relief.

QUOTE (Qjimbo @ Jan 8 2006, 03:20 PM)

Firstly as humans, we can all hear up to roughly 22050Hz, and as nyquist worked out, to record all this this frequency and all the ones below it, you have to take a number of samples per second which is double that, so we get to 44100Hz aka 44.1Khz

However if you do this, although it might seem fine, the higher frequencies have less deatail the further up you go. A 22050Hz sound will only get an on and off point, but none of the smoothness of the curve, so it will be a triangle wave. Not to mention aliasing, which might make it capture a half on and half off point.

You are not right. Nyquist allows the original function to be reconstucted under certain conditions (that it's frequency components found by fourier transformation should be within a bound, say 20khz). To go from the samples to the original wave is slightly complicated I think and you will need to look at the proof of the theorem. It certainly isn't linear interpolation, which would leave you with your triangle wave. To say this again, nyquist says that the function which goes from waves with frequencies under 20khz to samples separated by 1/44khz has an inverse, but don't think that this inverse is simple.

Now Nyquist won't go up to 22050 hz it will go to any limit under 22050hz. Say 20khz. If you have a pure 20khz tone and sample at 22khz (for an infinite length of time) you will be able to deduce that your the original wave was a 20khz tone if you knew it had no components above 44khz.

The Sampling Theorem says that signals containing frequencies up to 1/2 the sampling frequency (S/2) can be exactly represented by S samples per second. Restated, the highest frequency that can be recorded, and played back, is equal to 1/2 the sample rate. It isn't approximately, or "in the neighborhood of", or "a degraded representation of", it is exactly the same waveform, right up to S/2 (e.g. 22, 050 Hz at 44,100 Hz). No information is lost.

The input must be bandlimited to no more than S/2 (the Nyquist frequency) or there will be distortion. If higher frequencies exist in the input they will be partially sampled, creating false information that portrays them as lower frequencies in the audio band. These are called alias images or foldover distortion. If S is the sample rate and F is some frequency between S/2 and S, then its image is created at S - F. Thus as F approaches S, the images are created at lower and lower audio frequencies.

Since it isn't feasible to build analogue filters that truly act as "brick walls," a stop band filter is employed that begins attenuating the input several thousand Hz below the Nyquist frequency. By S/2 the signal level is down sufficiently that aliasing isn't a significant problem. It is real world filter insufficiency, not any sampling theory constraints, that may limit actual performance to something less than the Nyquist frequency.

It is easier to do a better job by sampling at a much higher frequency and handling the details in the digital domain, either in hardware, or especially in software. In actual practice many ADCs, probably those in most of our soundcards, sample at several million times per second, use some form of sigma-delta conversion, then filter down to the desired sample rate and bit depth.

Sampling means that the continuous analogue waveform voltage can be measured only in terms of a limited number of discrete digital values. Although frequency information can be captured exactly (up to S/2), amplitude accuracy is limited by the size of the intervals into which the amplitude (from zero to maximum) must be assigned (e.g .01V, .02V, .03V, ...) The input is always recorded as exactly one of those values although the actual sampled input voltage is usually not exactly one of those values. The error can be as great as 1/2 interval.

Increasing the bit depth increases the number of possible values, increasing accuracy. Going from 16 bits to 24 bits increase the accuracy by a factor of 256, or decreases the average error amplitude by 1/256.

As signal level decreases the errors become larger relative to the signal. Also as signal level decreases, the number of bits, or levels, between it and zero decreases. This provides relatively fewer choices to recorded its value. Thus the increased number of quantization levels available with greater bit depth becomes more important at lower input levels.

At high signal levels the quantization error is experienced as white noise but at very low signal level it is experienced as distortion. For every bit of depth added, the error, or quantization noise, is decreased by 6dB. Distortion of low level signals (e.g. -90dB to -96dB) is much greater when using 16 bit rather than 24 bit.

There are of course many other considerations to getting good digital audio reproduction . Whether or not some of the factors discussed here are always audible, they are real and measurable.

The Sampling Theorem says that signals containing frequencies up to 1/2 the sampling frequency (S/2) can be exactly represented by S samples per second. Restated, the highest frequency that can be recorded, and played back, is equal to 1/2 the sample rate. It isn't approximately, or "in the neighborhood of", or "a degraded representation of", it is exactly the same waveform, right up to S/2 (e.g. 22, 050 Hz at 44,100 Hz). No information is lost.

I thought that only frequencies under half the sampling rate can be represented exactly (so, Fs/2 is the first frequency that cannot be represented anymore). See also wikipedia.

QUOTE (Qjimbo @ Jan 9 2006, 01:20 AM)

There's also the more debatable issue that higher frequencies add to the warmth of the sound, even though they're inaudible on a concious level. Hence the term analog warmth, where valves would add random high frequencies to the audio.

Well, as fas as I know these added harmonics are well within audible range.

The majority of people can NOT hear that high. Most are well below 20KHz, unless you turn up the volume to levels which will certainly damage your hearing.

This thread will end at the same conclusion as every other thread of this kind: for listening, 44KHz/16bits is sufficient and going higher has no benefit(for listening). Thus, the only advantage which media like DVD-A etc will bring is multichannel-support, storage-space for video, etc.

The majority of people can NOT hear that high. Most are well below 20KHz, unless you turn up the volume to levels which will certainly damage your hearing.

This thread will end at the same conclusion as every other thread of this kind: for listening, 44KHz/16bits is sufficient and going higher has no benefit(for listening). Thus, the only advantage which media like DVD-A etc will bring is multichannel-support, storage-space for video, etc.

- Lyx

Absolutely and for multichannel 96 khz is overkill. Actually, compared to stereo, multichannel improves the dynamic (I’d say more than 10 db for 5.1).There are other formats which are more supported with a better footprint and a good level of quality.

My immediately available source is Principles of Digital Audio, fourth edition by Ken C. Pohlmann. What it says is consistent with what I remember reading elsewhere. I don't know about the "Wikipedia" as I get a message it is temporarily unavailable, but isn't that a collection to which anyone who thinks he knows something can contribute?

on page 25:"a sampling frequency of S samples/second is needed to completely represent a signal with a bandwidth of S/2 Hz. ... For example, an audio signal with a frequency response of 0 to 20 kHz would theoretically require a sampling frequency of 40 kHz for proper sampling."Further text extends the explanation, making clear that it is indeed S/2 Hz, not S/2 less something.

As for adding "random high frequencies to the audio" for "warmth," another way of saying that is "adding distortion that just happens to please some people." Of course an original analogue source might have had higher order harmonics that cannot be captured. The Sampling Theorm is quite clear that only a properly bandlimited input can be sampled "correctly."

on page 25:"a sampling frequency of S samples/second is needed to completely represent a signal with a bandwidth of S/2 Hz. ... For example, an audio signal with a frequency response of 0 to 20 kHz would theoretically require a sampling frequency of 40 kHz for proper sampling."Further text extends the explanation, making clear that it is indeed S/2 Hz, not S/2 less something.

As for adding "random high frequencies to the audio" for "warmth," another way of saying that is "adding distortion that just happens to please some people." Of course an original analogue source might have had higher order harmonics that cannot be captured. The Sampling Theorm is quite clear that only a properly bandlimited input can be sampled "correctly."

We're being splitting hairs and in practice it really doesn't matter that much ... but here I go:I agree with bug80 that it's everything under S/2, not S/2 itself.Wikipedia is right and the book is not.

AndyH-ha, suppose there's a sine of amplitude 1.0 and frequency 10 kHz, sampled at 20 kHz with a certain phase offset 'ofs'. The samples will be:1.0*sin(0+ofs) 1.0*sin(pi+ofs) 1.0*sin(2pi+ofs) ...Now, if you set 'ofs' to pi/2 you'll get+1 -1 +1 -1 +1 fineset 'ofs' to pi/6 you'll get+0.5 -0.5 +0.5 -0.5 whoa! what's that ?Is it a sampled version of an 1.0-amplutude sine with a phase offset of pi/6 or an 0.5-amplitude sine with a phase offset of pi/2 ???Even worse: Try a phase offset of zero ;-)

Truth is: You can't properly sample and reconstruct a frequency of S/2 because there's this ambiguity ... hence you can only represent everything below S/2.

This thread will end at the same conclusion as every other thread of this kind: for listening, 44KHz/16bits is sufficient and going higher has no benefit(for listening). Thus, the only advantage which media like DVD-A etc will bring is multichannel-support, storage-space for video, etc.

- Lyx

Well, for some very, very limited sets of circumstances, one might imagine that 120dB might be required, i.e. a young person with good hearing in the quietest room in the world, listening to a very loud orchestra with lots of percussion, miked pretty close, and wanting to hear the crowd rustle and air conditioning sounds between movements. This would also require quite an extraordinary playback system, indeed. Such playback systems ARE possible, but extremely rare.

But for most (if not all) living rooms, and most people with normal hearing, yes, I think it's pretty much sufficient.

N.B. Before somebody gets nasty and screams TOS (Look, I'm tired of the audio-woo contingent, too, but not every surprising claim is audio woo.), I'm simply stating something that can be observed from Fletcher's zero-loudness curves.

It would be an extraordinary room and system, but both are possible, with effort and cost, and with a young person who doesn't listen to rock to listen.

The Sampling Theorem says that signals containing frequencies up to 1/2 the sampling frequency (S/2) can be exactly represented by S samples per second. Restated, the highest frequency that can be recorded, and played back, is equal to 1/2 the sample rate. It isn't approximately, or "in the neighborhood of", or "a degraded representation of", it is exactly the same waveform, right up to S/2 (e.g. 22, 050 Hz at 44,100 Hz). No information is lost.

True, however, you should also point out that as a filter becomes sharper and sharper, it's length grows without bound, to a filter with infinite sharpness that has infinite length.

For a standard FIR antialiasing filter, this requires infinite delay for both sampling and reconstruction. For an IIR filter, this requires infinite delay after the first filter.

So if you are within dF of fs/2, you need at least approximately 2/dF time for the filter to work. That is approximate, but a reasonable estimate of a minimum. When you get to fs/2, the filter delay is infinite in some fashion.

Another thing to consider is that with proper dithering and noiseshaping, the effective resolution at 44.1kHz can be boosted by as much as 15dB. This means that a properly produced CD (unfortunately very few exist, but HDCD are an example) will have an effective SNR of 111dB.

How many ADC or DAC's have you seen that can attain that resolution?

Hi Garf,It's quite perturbing to talk about 111dB of SNR with a DAC having 96dB (16bits).Actually dithering/NS decrease distortion while increasing the noise. This noise being created in order to be barely audible, we can say that we have a psychoacoustic SNR of 111 dB on a device with a dynamic of 96 dB.Is that correct?

Yes, that's well-explained. The 96dB is over the entire frequency range. You can make tradeoffs and gain at lower frequencies to lose at higher ones.

If you say that CD has an SNR of 96dB, you must also say that SACD has an SNR of 6dB.

Well, for some very, very limited sets of circumstances, one might imagine that 120dB might be required, i.e. a young person with good hearing in the quietest room in the world, listening to a very loud orchestra with lots of percussion, miked pretty close, and wanting to hear the crowd rustle and air conditioning sounds between movements. [...]It would be an extraordinary room and system, but both are possible, with effort and cost, and with a young person who doesn't listen to rock to listen.

Would you agree with my argument that prolonged exposure to such a large loudness variation (which means necessarily very loud levels at the loudest peaks) would eventually cause hearing damage, thereby making itself useless?

Well, for some very, very limited sets of circumstances, one might imagine that 120dB might be required, i.e. a young person with good hearing in the quietest room in the world, listening to a very loud orchestra with lots of percussion, miked pretty close, and wanting to hear the crowd rustle and air conditioning sounds between movements. [...]It would be an extraordinary room and system, but both are possible, with effort and cost, and with a young person who doesn't listen to rock to listen.

Would you agree with my argument that prolonged exposure to such a large loudness variation (which means necessarily very loud levels at the loudest peaks) would eventually cause hearing damage, thereby making itself useless?

It depends on the actual duration of the loud parts, but in general, I agree that listening over a short-term or long-term mean of 85dB SPL is bad news.

On the other hand, means to measure exposure are still rather primitive, and even those of us who prefer to be in quiet surroundings routinely see peaks well above that. Consider applause, but we don't all go deaf from it.

I have to step out and disagree that there is no audible difference between 16-bit and 24-bit recordings. The decision to use 16/44.1 was arrived at, not because it was audio nirvana, but because it was the best comprimise possible with the digital to analogue conversion technology of the 80's.

To hear the benifits of 24-bit vs. 16-bit, you need a few things:

1) Capable and descriminating ears. The fact is that no everyone can hear the difference, and that's fine. However, even with a capable ear, the individual has to know what he's looking for. We're talking about very subtle points here, the difference is not going to be night and day.

2) Capable audio equipment. Just because you have a DVD-A or SACD player doesn't mean your system is capable of delivering an audible difference. You need very good equipment to make 24-bit listening worth your while.

3) Quality recording. It doesn't matter if the recording is 64-bits - if it was poorly recorded, and poorly produced, it's going to sound poor. Adding another 8 bits is not a magical fix-all.

QUOTE

Well, the advantage of DVDA and SACD is exactly that they are multichannel. And have DRM, which is obviously a big advantage to some people.

I disagree with this, especially with respect to SACD. Alot of hardcore audiophiles (the people who would be investing in expensive high bit recordings) feel very strongly that music should only bet two channels.

In SACD multichannel is optional, but 2-channel is required; in other words every SACD will have high-bit two channel audio, but not necessarily multichannel.

Agreed on all counts.

Sometimes I think people mistake failed ABX tests (which if successful, prove there IS a difference), with proof that there is NO difference - which is clearly not the case.

Anyone into Reason? - Might I suggest a project of ABXing with certain fully capable sound cards and a 44.1khz 16bit render, and a 24bit 96khz render direct from Reason? All soundwaves are then regenerated to actually use the accuracy available from the end format. Its hard to find CDs that havent been through a 48khz 16bit step at some point for example - this will eliminate that kind of variable compeltely - while unfortunately dislocating the tests from a real world model to some extent .... I think it would be a worthy project.

I have not abxed it but my subjective tests indicated a perceptible warmth from the 96khz render that was missing from the 44khz render. Some scientific results would be beneficial.

I have not abxed it but my subjective tests indicated a perceptible warmth from the 96khz render that was missing from the 44khz render. Some scientific results would be beneficial.

Well, I have tried to ABX with good equipment and failed miserably, as have many others I would suspect. If you think you can hear a difference, why not try yourself. You might reach an interesting conclusion...

I have not abxed it but my subjective tests indicated a perceptible warmth from the 96khz render that was missing from the 44khz render. Some scientific results would be beneficial.

Well, I have tried to ABX with good equipment and failed miserably, as have many others I would suspect. If you think you can hear a difference, why not try yourself. You might reach an interesting conclusion...

its not a conclusion though - I made that point before. you can only conclude positive results from ABX, not the other way around.

Anyone into Reason? - Might I suggest a project of ABXing with certain fully capable sound cards and a 44.1khz 16bit render, and a 24bit 96khz render direct from Reason? All soundwaves are then regenerated to actually use the accuracy available from the end format. Its hard to find CDs that havent been through a 48khz 16bit step at some point for example - this will eliminate that kind of variable compeltely - while unfortunately dislocating the tests from a real world model to some extent .... I think it would be a worthy project.

I have not abxed it but my subjective tests indicated a perceptible warmth from the 96khz render that was missing from the 44khz render. Some scientific results would be beneficial.

I have been using Reason on and off but cannot test myself as my card only goes up to 48khz/24bit.

--------------------

"Have you ever been with a woman? It's like death. You moan, you scream and then you start to beg for mercy, for salvation"

You can never prove a negative. The point you are trying to make is made often before... but it doesn't help anyone any further.

If you think there is an audible difference, please prove it under controlled circumstances.

Otherwise, if it seems no one yet (here) can prove to hear a difference, wouldn't it be more reasonable to assume that the difference is really inaudibble than to just keep assuming that it is, just because an abx-test can only prove positives?

If you would have seen a 1000 specimens of the same type of bird, and they were all black, would you assume the next one you see would also be black, or would you think it would be white?

You can never prove a negative. The point you are trying to make is made often before... but it doesn't help anyone any further.

If you think there is an audible difference, please prove it under controlled circumstances.

Otherwise, if it seems no one yet (here) can prove to hear a difference, wouldn't it be more reasonable to assume that the difference is really inaudibble than to just keep assuming that it is, just because an abx-test can only prove positives?

If you would have seen a 1000 specimens of the same type of bird, and they were all black, would you assume the next one you see would also be black, or would you think it would be white?

You have a good reasoning technique - however assumption does not belong in my scientific vocabulary.

subjective tests have more value than assumption in my book. Even with all those horrible random variables.

BUT NEVERTHELESS

I will prepare some samples and get ABXin.Unfortunately reason is not installed and i cant find my discs so.....

If you turn up your volume loud enough, you can always hear some noise. It would be more interesting to use some extremely dynamic samples, and play them in a silent environment at the loudest possible volume for music listening. This would simulate the "worst case" situation.

Studying the advantage of high sample rates is extremely difficult.

First, one needs "super tweeters" in order to play ultrasonic content. Few speakers are capable of playing back sounds above 20 kHz.Then, in the last version of his paper, David Griesinger reports that intermodulation distortion at high frequencies, that can make a "counter-difference" between 44100 Hz and 96000+ Hz (96 khz sounding worse), occurs mainly in amplifiers, rather than in speakers.Even with super tweeters, an amplifier may then produce distortion when fed with ultrasonic content.On the other hand, there are evidences that ultrasonic intermodulation occurs in air : http://www.atcsd.com/hss.htmlHowever, it occurs at high sound pressure. Around 130 dB, for example. Ultrasonic content in musical instrument is below 60 dB. Moreover, in order to produce audible ultrasonic intermodulation, focused sonic beams are used, while a music instrument radiates sonic energy in all directions. It seems that ultrasonic intermodulation in air should be inaudible, but I don't know about studies on this.

The first one shows positive results, but give absolutely no details on how the statistic confidence have been evaluated. The result is even considered as minor, since it was not the goal of the experiment, that focuses on electro-encephalograms.The second link above is a similar experiment, that failed.

All these tests deal with files undergoing digital filtering near 22 kHz, and 16 bits rounding or dithering.One can argue that 44100 Hz and 16 bits are enough in order to acheive transparency on a decent hardware, and that the above tests use samples suffering from bad recording or processing. But I listened to the samples of the first comparison in the second link, and could not hear any difference. The guy who got 8/8 have got better ears than me.

So we can wonder if raising the resolution of the digital format could improve the sound quality in these cases. I cannot tests this hypothesis because my listening abilities are not good enough to make the difference between two CD players.

Well I agree with what Tristan said – For example on an old Led Zep, or to pick a rarer example Fleetwood Mac – Analogue Mix from 30 years ago, the dynamics were used in a more classical style.

Of course since many people listen in the car these days, or via the radio – most producers compress the music so this hasn’t much bearing on newer recordings.

By a classical style – I mean the overall volume level is in the lower regions of the dynamic range, saving the higher volumes for really powerful moments – Which is one of the benefits you get from Vinyl as well when compared to CD (or so enthusiasts would have you know). This would result in much less accuracy in the reproduction on 16/44 than 24/96 can afford.

So I think as long as the original mixdown is used and recorded direct to 24/96 – you might be able to ABX a quieter moment of a classic Fleetwood Mac record and have reasonable results. It’s a fair comment – scientifically.

HOWEVER

I tried that ABX from above and got only fractionally above complete chance. The results weren’t even worth noting down.So I humbly concede that in practical applications – in 2006 – The difference between 24/96 and 16/44, is scientifically negligible, even if provable.

I didn’t believe it – I didn’t want to - but I couldn’t tell the difference.

I tried that ABX from above and got only fractionally above complete chance. The results weren’t even worth noting down.So I humbly concede that in practical applications – in 2006 – The difference between 24/96 and 16/44, is scientifically negligible, even if provable.

I didn’t believe it – I didn’t want to - but I couldn’t tell the difference.

Regards

Jon

Understood, but consider a couple of things. The atmosphere itself puts something like 6dB SPL white noise at the eardrum.

16 bits up from that is 102dB.

Now, how often do you listen to peaks above 102dB?

Note, we have not even discussed, yet, room noise, hearing loss, etc. So that explains why when it's done right, 16 bits shouldnt' be too problematic. I do expect one might be able to design a signal that, in a quiet area, caused a problem. I wonder, however, if the average good loudspeaker or headphone could actually reproduce it with anything approaching "fidelity".

Now, to 44.1 vs. 96. Something you might try is to create some "dummy" data that might distinguish on very contrived signals. I can't dismiss that outright, but I'd suggest that you try some broadband stimulii created at 96, and then the same downsampled to 44.1. The bit depth doesn't matter for this exercise.

Said stimulii ought to be something with both tonal (i.e. sinusoidal) components and peaky components (for isntance a gaussian pulse centered at 15kHz that's down to -60dB at 30kHz and DC... Or something like that. Perhaps a center frequency for which the aliasing for the 44.1 case would be obnoxious.

WHEN COMPARING digital audio to digital photography,there is nothing wrong with matching :

• bitrate < = > color depth, and • sample-frequency < = > resolution.

With a higher bitrate, you can hear the tiny sounds-shadeswith more accuracy as result of more variation.Zoomed into the sound of, say, a cricket,in stead of hearing just a soft and distant '' c r i c k e t '', you can distinguish '' c r I c k ê t ''.More subtlety.

But, seeing the difference between light brown andvery slightly lighter brown is not easy.You would have to use great concentrationand training to do so, if capable at all.

You look at a digital photo of a piece of wood.

1) The smallest details are seen because you have a high resolution image (compare: the sample frequency). It's sharp.

2) Now, the colors. If you would use software to increase the contrast, the patterns of wood grains become more obvious. This would be comparable to turning up the volume when listening to a quiet, subtle sound.

The next argument is clearly:

Yes, all fine, but when listening to a music recording I cannot constantly change the volume.

Or --

There is just a limit to my hearing. I -can- hear that a 4 bit recording, with a mere 2^4=16 possible values between loud and silent, is -not- enough for hearing soft, subtle background sounds. But, my brain and ears are limited and need no more than 2^16=65536 possible values between loud and silent.

. . . Note: I am not even sure if 16 bit implies 2^16 values between +1 and -1, or between +1 and 0. Probably the former . . .

By typing the above paragraphs, I have notsucceeded at arguing one's mind doesappreciate more accuracy in amplitude.Nor: the hypothesis that a human mindis fast enough (read: fine enough) to prefera higher sample-frequency.

Besides that I am still convinced I personally cán hear (feel?) the difference between a normal audio cd and one of the newcoming higher quality digital recordings, the only comment I can make here is the following :

It does not suffice to blindfold people and ask ifthey notice any difference between two recordingsof the same Fleetwood Mac track, one in cd-audio qualityand one in higher quality (with good speakers).

Rather, I would choose to count how many peoplestart to dance in a club. Or, what time they go home.How much they enjoy it. ( <-- difficult to measureof course, but the first two not, if done on large scale )

A final note. After a digital signal is analogue again,it does not reach your ears as 'blocky' asthe digital wave form. The cone of the speaker that produces the sound has some weightand thus will round the sound. I know not enough about the kinetics andmaterials (and..., and...) to say anything more about this. Just that it is not to be overlooked.

Tristan Laurillard, Vancouver, Canada

PS. Not everything I wrote here is verified knowledge.Just my own understanding of these matters.