As a scientist (biophysics) and an audiophile, I have my feet in both worlds, and have thus enjoyed this discussion. Here are some thoughts:

I. Open Access.

Several have written it’s unfortunate that this paper, which is of general public interest, is not available to those without an AES subscription (that includes myself). Agreed; and note that the landscape for scientific publication has been changing: researchers are increasingly realizing the value of “open access” papers – ones available freely, without a subscription. Some scientific journals have gone entirely open access, while others (for an additional fee) offer the option of open access publication of your paper. Let me suggest that, if AES doesn’t offer this option, it should, and that those publishing papers of broad interest should endeavor (if financially possible) to avail themselves of this.

II. ABX testing.

I assume from the discussion that this paper makes use of sequential ABX testing (by “sequential” I mean: present A, then B, then X, in sequence). I’d like to see basic research done into sequential ABX testing itself. In particular, sequential ABX testing (double-blinding will be assumed throughout this discussion) is highly regarded for assessing perceptual discrimination, because (with proper statistical analysis) it eliminates false positives – subjects can’t “pass” (i.e., achieve successful discrimination in) a sequential ABX test unless they truly can reliably distinguish A from B. But what about the converse? Achieving accurate discrimination in sequential ABX testing is hard, because it requires accurate memory of both A and B when judging X. Therefore I’d like to offer the hypothesis that subjects can “fail” sequential ABX testing even when they can successfully discriminate between A and B, and suggest a way of testing this. Essentially, one would use simultaneous ABX testing to eliminate the memory requirement, and then compare the results to those for sequential ABX testing. I can’t think of how you’d do simultaneous ABX testing with hearing, but it could certainly be done with vision (using color tiles). The experiment would be divided into three phases:

1) Find the smallest color difference that can be reliably distinguished during simultaneous ABX testing (present all three tiles – A, B, and X – simultaneously, so that the subjects can line them up next to each other for comparison).2) Repeat the experiment, using that same color difference, with sequential ABX testing. 3) If the subjects fail phase 2 (thus confirming my hypothesis), find the smallest color difference that can be reliably distinguished using sequential ABX testing, and compare to the results of phase 1.

If it is found that subjects indeed fail to reliably distinguish colors when presented sequentially that they can reliably distinguish when presented simultaneously, this will suggest that sequential ABX testing may not be a good method for assessing our perceptual limits (i.e., for determining transparency) -- and not just for vision, but possibly for hearing as well.

Those who are dismissive of high-res audio based on theory alone typically cite the Shannon sampling theorem (typically misattributed to Nyquist), noting correctly that we can’t hear over 20 kHz (if that), and that we only need >40 kHz sampling to accurately reproduce this. But 2 x max. frequency is not the theorem’s only requirement. It’s my understanding that it also assumes an infinite signal, perfect sampling, and perfect interpolation. I’ve never seen any of these assumptions discussed in this context, so I’d like to ask how much practical effect these requirements would have on Redbook (16 bit/44.1 kHz) vs. high-res conversion:

1) Infinite signal. I assume we can effectively satisfy this with signal length >> 1/frequency, and that this is thus a non-issue.

2) Perfect sampling. Clearly, sampling need not be perfect, but simply close enough to perfect to be transparent in amplitude and time. Timing errors lead to jitter (right?). So (and this is an engineering question): what’s the relationship between sampling rate and how easy it is to eliminate audible jitter errors?

3) Perfect interpolation. Another engineering question: Naively, unless implementing near-perfect (i.e., transparent) interpolation is trivial, I would think it would be easier to achieve transparent interpolation with a higher sampling rate, because the points are more closely spaced. Is transparent interpolation so easily achievable that the sampling rate has no practical effect?

IV. The effect of mastering.

Many have mentioned that the reason high-res disks do indeed sound better than CDs of the same performance is that they are mastered differently – as labors of love, and without the usual commercial pressures to alter the sound. Given this, I think it would be a pubic service if someone could produce a Blu-Ray disk corresponding to the songs tested in this study, containing both the high-res and Redbook versions of each, and make it available for sale. That way people could easily experiment for themselves.

V. General thoughts.

I think the reason for the continued controversy about digital audio performance is that we don’t completely understand the biophysics of human hearing (which is why it continues to be an active area of research). If we did, we would know, a priori, what constitutes a complete specification set sufficient to determine transparency, and thus could engineer transparent electronic gear (I say electronic because I am excluding transducers) without listening to it. To the best of my understanding, this is not yet the case, since the errors that our auditory system is capable of detecting can be extraordinarily subtle, and what would constitute a complete set of scalar specifications sufficient to ensure transparency thus remains an open question.

This post has been edited by greynol: Jun 14 2013, 14:02

Reason for edit: Added link to original discussion from which this one was split for being off-topic.

I don't know what you mean by "sequential" ABX testing. Any ABX test that I am aware of allows the subject to listen to any of the three samples, in any order, as many times as he/she pleases until ready to make a choice.

Also, you cannot "eliminate" false positives just as you cannot eliminate false negatives. You can only make the null hypothesis statistically improbable.

I don't know what you mean by "sequential" ABX testing. Any ABX test that I am aware of allows the subject to listen to any of the three samples, in any order, as many times as he/she pleases until ready to make a choice.

Also, you cannot "eliminate" false positives just as you cannot eliminate false negatives. You can only make the null hypothesis statistically improbable.

The point is that it's sequential regardless of the order, and thus relies on memory -- you can't listen to all three tracks simultaneously. If you reread that section of my post carefully I think you'll see what I'm getting at.

And yes, I could (and probably should) have written that sentence less casually and said "reduce the probability of a false positive (where a false positive is defined in statistics as 'the incorrect rejection of a true null hypothesis'*) to a very low level" instead of "eliminate," but the difference between those two statements has entirely no bearing on any of the arguments I've made.

The point is that it's sequential regardless of the order, and thus relies on memory

Memory is always required in order to asses audible differences; not just for testing.

Except comparing two mono signals simultaneously, one fed to each ear via headphones. A niche pass-time, but possible.

...but we have plenty of ABX threads that theorist1 could read, where it's explained in painful detail how all the "problems" with ABX testing are either easily avoided, or also apply to any kind of listening test that involves humans listening.

Tests that don't involve humans listening may avoid such problems, but also have less relevance.

No need to listen to each sample from beginning to end (like with an orchestra audition). AFAIK in audio ABX tests the subject is allowed to switch freely between A, B and X as fast and as many times (s)he wants. Since our auditory memory is short, I think this is even a requirement, although I can imagine situations where it can't work, like different tempi.

QUOTE (2Bdecided @ Jun 14 2013, 10:38)

Except comparing two mono signals simultaneously, one fed to each ear via headphones.

I've thought about that as well. It's e.g. very handy for time-aligning two signals. But there's a risk of false positives, just imagine a 180° phase difference. When heard in isolation this is probably inaudible, but when compared directly with the original, it's evident

Achieving accurate discrimination in sequential ABX testing is hard, because it requires accurate memory of both A and B when judging X. Therefore I’d like to offer the hypothesis that subjects can “fail” sequential ABX testing even when they can successfully discriminate between A and B, and suggest a way of testing this.

Are you making the common error here of making a hypothesis and then asserting that your hypothesis must be disproven else ABX testing is flawed? Where is the evidence that someone can "successfully discriminate between A and B" but fail to ABX it? Because they "know" they heard a difference in sighted testing?

My recollection is that studies have shown audio memory starts to lose information at something on the order of 200ms, so ideally ABX tests try to keep switching below this (while not creating audible artifacts in the process). Search for "fast switching" - it has been discussed here.

QUOTE

3) Perfect interpolation. Another engineering question: Naively, unless implementing near-perfect (i.e., transparent) interpolation is trivial, I would think it would be easier to achieve transparent interpolation with a higher sampling rate, because the points are more closely spaced. Is transparent interpolation so easily achievable that the sampling rate has no practical effect?

Moving the sampling rate higher effectively shifts the interpolation errors to a higher frequency. This could indeed be beneficial if they are audible at a lower sampling rate. At HA, the TOS lay out certain requirements for discussing audibility.

QUOTE

Given this, I think it would be a pubic service if someone could produce a Blu-Ray disk corresponding to the songs tested in this study, containing both the high-res and Redbook versions of each, and make it available for sale. That way people could easily experiment for themselves.

The standard test for this sort of thing only requires a higher res source and SW to convert it to a lower res and back. Various SW is readily available that can do this. IOW, people can indeed easily experiment for themselves.

I will try to address the theory and reasoning of your audio related questions:

QUOTE (theorist1 @ Jun 14 2013, 05:32)

1) Infinite signal. I assume we can effectively satisfy this with signal length >> 1/frequency, and that this is thus a non-issue.

I don't recall if the sampling theorem talks about an infinite signal, but ... why would it be easier to sample a 2 hour video than a 3 minute song?.On the other hand, when doing a transform to the frequency domain, the signal is assumed to repeat itself infinitely on each side.

QUOTE (theorist1 @ Jun 14 2013, 05:32)

2) Perfect sampling. Clearly, sampling need not be perfect, but simply close enough to perfect to be transparent in amplitude and time. Timing errors lead to jitter (right?). So (and this is an engineering question): what’s the relationship between sampling rate and how easy it is to eliminate audible jitter errors?

We can be incorrect in two ways when sampling:1) The time we do the sampling2) The value we obtain for that sample.

Jitter affects 1 (concretely it means being inexact in determining the moment you have to sample, and how it varies between samples), and studies have demonstrated that nowadays jitter is a non-issue. Think about it... CD = 44Khz, DVD= 96Khz, PC computer > 1GHz. (A bad clock on a PC wouldn't mean that it swings its speed, but would completely break the expected response time of internal buses and update of RAM memory).It is much easier to get a very exagerated jitter with a Vinyl disc than with digital audio.

On 2, the effect is the noise floor of the signal. It basically causes noise (different types depending on how the incorrect value is obtained, but no more than that).

I assume you already understand the sampling rate. ( amount of samples obtained in a second, or its inverse, the period: how much time elapsed between getting each sample).

QUOTE (theorist1 @ Jun 14 2013, 05:32)

3) Perfect interpolation. Another engineering question: Naively, unless implementing near-perfect (i.e., transparent) interpolation is trivial, I would think it would be easier to achieve transparent interpolation with a higher sampling rate, because the points are more closely spaced. Is transparent interpolation so easily achievable that the sampling rate has no practical effect?

Interpolation only applies on resampling. What is done in a DAC is a reconstruction filter. A reconstruction filter can (and usually does) oversample to allow using a less complex filter with the same results, but has no other complication.

A perfect reconstruction filter would be a brickwall filter with the frequency just below samplerate/2 ( samplerate/2 in fact cannot be sampled perfectly by the theorem). Such a filter is also expected on the input, so what is sampled already is not a perfect signal due to the inability to have a perfect filter.

Said that, the not-exactness of the filters does not make invalid the theorem. What implies is that there is a band of frequencies that cannot be sampled and/or reconstructed exactly due to the progressivity of the filter ( a decaying line, instead of an abrupt cut). What is before the filter frequency (not all frequencies, but the ones where the filter starts) is just attenuated, while what is after the frequency is considered aliasing (Which represents a mirroring of the frequencies).

Since it's not difficult to find reconstruction filters working 20.5Khz on 44Khz sampling rate, and at 22Khz on 48Khz, we are talking about a reduced imperfection.

... The sampling theorem specifies a band-limited signal less than f/2, and I suspect there is some theory that says that a signal that has a beginning and an end must have components greater than f/2.

A signal with a beginning and an end will contain components in addition to those of the signal. They need not exceed f/2.

The point is that it's sequential regardless of the order, and thus relies on memory

Memory is always required in order to asses audible differences; not just for testing.

Not necessarily. In principle, one could play a pristine recording, and that same recording to which very low levels of a very irritating form of audio distortion had been added, while monitoring a specific physiological attribute. With a sufficiently large N, and with the right attribute, one might find significant differences in physiological response to the pristine and non-pristine versions, even if the subjects subsequently failed to distinguish them based on ABX testing.

Now some might dismiss this as too theoretical, and if you're doing engineering or applied research it might be; but for one with a basic research mindset, these questions are intriguing.

QUOTE (drewfx @ Jun 14 2013, 08:54)

QUOTE (theorist1 @ Jun 13 2013, 23:32)

Achieving accurate discrimination in sequential ABX testing is hard, because it requires accurate memory of both A and B when judging X. Therefore I’d like to offer the hypothesis that subjects can “fail” sequential ABX testing even when they can successfully discriminate between A and B, and suggest a way of testing this.

Are you making the common error here of making a hypothesis and then asserting that your hypothesis must be disproven else ABX testing is flawed?

No, I've done the opposite: I've made a hypothesis and then asserted that, if my hypothesis is confirmed, it suggests that "sequential ABX testing may not be a good method for assessing our perceptual limits (i.e., for determining transparency) -- and not just for vision, but possibly for hearing as well."

QUOTE (drewfx @ Jun 14 2013, 08:54)

Where is the evidence that someone can "successfully discriminate between A and B" but fail to ABX it? Because they "know" they heard a difference in sighted testing?

I never said that evidence exists -- I made a plausibility argument, and proposed an experiment that would test this.

QUOTE (drewfx @ Jun 14 2013, 08:54)

QUOTE (theorist1 @ Jun 13 2013, 23:32)

3) Perfect interpolation. Another engineering question: Naively, unless implementing near-perfect (i.e., transparent) interpolation is trivial, I would think it would be easier to achieve transparent interpolation with a higher sampling rate, because the points are more closely spaced. Is transparent interpolation so easily achievable that the sampling rate has no practical effect?

Moving the sampling rate higher effectively shifts the interpolation errors to a higher frequency. This could indeed be beneficial if they are audible at a lower sampling rate. At HA, the TOS lay out certain requirements for discussing audibility.

Interesting.

QUOTE (drewfx @ Jun 14 2013, 08:54)

QUOTE (theorist1 @ Jun 13 2013, 23:32)

Given this, I think it would be a pubic service if someone could produce a Blu-Ray disk corresponding to the songs tested in this study, containing both the high-res and Redbook versions of each, and make it available for sale. That way people could easily experiment for themselves.

The standard test for this sort of thing only requires a higher res source and SW to convert it to a lower res and back. Various SW is readily available that can do this. IOW, people can indeed easily experiment for themselves.

Yes, but it would broaden the number of people who would participate in such a test, which I find to have general public education value, since there are consumers that would be happy to buy the Blu-Ray and do the test, but that don't want to mess with downloading and learning how to use conversion software. Also, I understand down-conversion is not trivial, so the Blu-Ray eliminates the possibility that some will hear differences because they purchased flawed software or used it incorrectly. In addition, there are many consumers that own high-res capable Blu-Ray players but that don't have high-res capable outboard DACs, or players with that can act as such (and either of the latter are required for the approach that you suggest) .

I will try to address the theory and reasoning of your audio related questions:

QUOTE (theorist1 @ Jun 14 2013, 05:32)

1) Infinite signal. I assume we can effectively satisfy this with signal length >> 1/frequency, and that this is thus a non-issue.

I don't recall if the sampling theorem talks about an infinite signal, but ... why would it be easier to sample a 2 hour video than a 3 minute song?.On the other hand, when doing a transform to the frequency domain, the signal is assumed to repeat itself infinitely on each side.

This is entirely beyond the scope of my expertise, but the short response to your question about 2 hours vs. 3 minutes is that, based on the reading I just did, it appears not to be about length per se, but the extent to which a signal is time-varying (and essentially all music signals are time-varying). This nicely-written Wikipedia article helps (http://en.wikipedia.org/wiki/Time–frequency_analysis):

"The practical motivation for time–frequency analysis is that classical Fourier analysis assumes that signals are infinite in time or periodic, while many signals in practice are of short duration, and change substantially over their duration. For example, traditional musical instruments do not produce infinite duration sinusoids, but instead begin with an attack, then gradually decay. This is poorly represented by traditional methods, which motivates time–frequency analysis."

If you read further you will see that the various formulations used to obtain a time-frequency distribution function each have strengths and weaknesses--none is prefect. And if you read further still, the article seems to indicate that doing a Shannon reconstruction on a time-varying audio signal also requires implementing time-frequency analysis, which means that even with perfect sampling and a perfectly bandwidth-limited signal, we can't perfectly reconstruct the analog audio signal. So this would motivate the question: what is the effect of sampling rate and bit depth on the nature of the errors introduced by this time-frequency analysis?

QUOTE ([JAZ] @ Jun 14 2013, 09:32)

QUOTE (theorist1 @ Jun 14 2013, 05:32)

3) Perfect interpolation. Another engineering question: Naively, unless implementing near-perfect (i.e., transparent) interpolation is trivial, I would think it would be easier to achieve transparent interpolation with a higher sampling rate, because the points are more closely spaced. Is transparent interpolation so easily achievable that the sampling rate has no practical effect?

Interpolation only applies on resampling. What is done in a DAC is a reconstruction filter. A reconstruction filter can (and usually does) oversample to allow using a less complex filter with the same results, but has no other complication.

A perfect reconstruction filter would be a brickwall filter with the frequency just below samplerate/2 ( samplerate/2 in fact cannot be sampled perfectly by the theorem). Such a filter is also expected on the input, so what is sampled already is not a perfect signal due to the inability to have a perfect filter.

Said that, the not-exactness of the filters does not make invalid the theorem. What implies is that there is a band of frequencies that cannot be sampled and/or reconstructed exactly due to the progressivity of the filter ( a decaying line, instead of an abrupt cut). What is before the filter frequency (not all frequencies, but the ones where the filter starts) is just attenuated, while what is after the frequency is considered aliasing (Which represents a mirroring of the frequencies).

Since it's not difficult to find reconstruction filters working 20.5Khz on 44Khz sampling rate, and at 22Khz on 48Khz, we are talking about a reduced imperfection.

Sorry, I'm afraid I don't understand your response here. It appears to be addressing bandwidth limitations and aliasing, with is a separate issue from interpolation. Also, regarding your statement that "interpolation only applies on resampling": From what I've read, the "perfect interpolation" requirement of the Shannon theorem is indeed referring to D->A reconstruction. From http://en.wikipedia.org/wiki/Nyquist–...mpling_theorem:

"Methods that reconstruct a continuous function from the x(nT) sequence are called interpolation. As will be shown below, the mathematically ideal way to reconstruct x(t) involves the use of sinc functions.... Each sample in the sequence is replaced by a sinc function centered on the time axis at the original location of the sample (nT), and the amplitude of the sinc function is scaled to the sample value, x(nT). Then all the sinc functions are summed into a continuous function. A mathematically equivalent method is to convolve one sinc function with a series of Dirac delta pulses, weighted by the sample values. Neither method is numerically practical. Instead, some type of approximation of the sinc functions, finite in length, has to be used. The imperfections attributable to the approximation are known as interpolation error.Practical digital-to-analog converters produce neither scaled and delayed sinc functions nor ideal Dirac pulses. Instead they produce a piecewise-constant sequence of scaled and delayed rectangular pulses, usually followed by a "shaping filter" to clean up spurious high-frequency content."

So I suppose the bottom line is that, even with perfect sampling and a perfectly bandwidth-limited signal, the Shannon theorem (and the attendant engineering practicalities) tells us there are unavoidable distortions introduced by both the time-varying nature of the music signal (which necessitates the implementation of time-frequency analysis) and by interpolation errors. drewfx tells us that higher sampling rates shift the latter to higher frequencies. So that leaves me wondering what the effects of sampling rate and bit depth would be on the former.

If it is found that subjects indeed fail to reliably distinguish colors when presented sequentially that they can reliably distinguish when presented simultaneously, this will suggest that sequential ABX testing may not be a good method for assessing our perceptual limits (i.e., for determining transparency) -- and not just for vision, but possibly for hearing as well.

This is, I fear, preposterous. One can put colors next to each other, because vision is a spatial sense. Sound is a time-domain sense that is perceived in the frequency domain by the ear. The proper analogy to colors next to each other is SOUNDS ADJACENT IN TIME, which is exactly what ABX testing does.

There is rather some research on discrimination with time lapse, and it is quite true that a break between A/X or B/X (or even A/B) will disrupt the subject, so time proximity WITHOUT glitches is absolutely requisite. This is because partial loudnesses, rather than time domain waveforms, is what must be recalled, and the first level of memory for such is under 200 milliseconds. So proximate IN TIME presentation is the relevant, germane thing to do.

Since people do perform with such testing down to physical limits, I don't think there is a great deal of room left there for problems.

QUOTE

III. High-resolution audio and the Shannon sampling theorem.

1) Infinite signal. I assume we can effectively satisfy this with signal length >> 1/frequency, and that this is thus a non-issue.

Yep.

QUOTE

2) Perfect sampling. Clearly, sampling need not be perfect, but simply close enough to perfect to be transparent in amplitude and time. Timing errors lead to jitter (right?). So (and this is an engineering question): what’s the relationship between sampling rate and how easy it is to eliminate audible jitter errors?

There's an oldish AES paper that addresses this very nicely. It's not just the amount of jitter, it's also the bandwidth of the jitter, i.e. the spectrum of the deviation from the mean sampling rate, that matters. It's much like FM modulation, only where all sidebands modulate back down into the baseband. It's not trivial, but it's entirely possible to make sure jitter isn't an issue, and most hardware has (finally!) killed this problem dead.

QUOTE

3) Perfect interpolation. Another engineering question: Naively, unless implementing near-perfect (i.e., transparent) interpolation is trivial, I would think it would be easier to achieve transparent interpolation with a higher sampling rate, because the points are more closely spaced. Is transparent interpolation so easily achievable that the sampling rate has no practical effect?

This is strictly a question of filtering. Filter design for this problem is accomplished, and in fact most 44/16 DAC's use a very high sampling rate with a low-bit-count DAC and a lot of digital filtering for this reason. While I have seen some very bad filters (um, in computer audio chains for instance) the "how" here is well known, it's a question of good engineering practice, and the "how" goes all the way back to Crochiere and Rabiner.

QUOTE

IV. The effect of mastering.

Many have mentioned that the reason high-res disks do indeed sound better than CDs of the same performance is that they are mastered differently – as labors of love, and without the usual commercial pressures to alter the sound. Given this, I think it would be a pubic service if someone could produce a Blu-Ray disk corresponding to the songs tested in this study, containing both the high-res and Redbook versions of each, and make it available for sale. That way people could easily experiment for themselves.

I've seen some recordings that are two-layer, where the high-rez layer wasn't compressed to (*&(*& and the redbook layer was. 'nuff said?

QUOTE

V. General thoughts.

I think the reason for the continued controversy about digital audio performance is that we don’t completely understand the biophysics of human hearing (which is why it continues to be an active area of research). If we did, we would know, a priori, what constitutes a complete specification set sufficient to determine transparency, and thus could engineer transparent electronic gear (I say electronic because I am excluding transducers) without listening to it. To the best of my understanding, this is not yet the case, since the errors that our auditory system is capable of detecting can be extraordinarily subtle, and what would constitute a complete set of scalar specifications sufficient to ensure transparency thus remains an open question.

Actually we understand the sensitivities of the hearing apparatus quite well, but they are not easily mapped into simplistic things like frequency response or signal to noise ratio.

I gave a plenary talk to InfoComm this Monday on that very subject. It's much like the Heyser Lecture I gave at last fall's AES. You can find a link to it at www.aes.org/sections/pnw if you want to know more.

So this would motivate the question: what is the effect of sampling rate and bit depth on the nature of the errors introduced by this time-frequency analysis?

The majority of filters work in the time domain. The fourier transform (and so, the frequency domain) is not necessarily involved. As I wrote, i knew that fourier transform expects an infinite and continuous signal (That's why windowing becomes necessary, but that's another subject).

But since you ask: The sampling rate affects the bandwith that a specific transform will show. You can do an FFT of 1024 samples on a 11Khz signal and a 44Khz signal. Both will return 512 bands (and 512 phases).The FFT of the former is about 100milliseconds and the highest band is half of samplerate (5.5Khz). On the latter it's about 25ms and the highest band is half of samplerate (22Khz).And I already wrote about bit depth. You would see the effects of bit depth as the spectrum bottom (going up or down depending if bits decrease or increase).

QUOTE ([JAZ] @ Jun 14 2013, 09:32)

Also, regarding your statement that "interpolation only applies on resampling": From what I've read, the "perfect interpolation" requirement of the Shannon theorem is indeed referring to D->A reconstruction.

I sometimes make differences in words that don't really have a difference. I meant interpolation as in the process of generating a different signal out of the original one (thinking only in the digital domain).You quoted interpolation as in the process of making a continuous signal out of a sampled signal. Sure, that's also interpolation, but that's what the DAC does, and it mostly implies the filter that i talked about. And yes, nowadays this is done beyond the threshold of audibility.

There is a sampling/interpolation/hearing issue that can only be resolved by studying the ear's response, namely to what extent the hair cells work so much as idealized strings that the sine functions are the appropriate basis. Such studies can, for all that I know, have been carried out explicitely; otherwise, a layman's gut feeling is that the effect is at most worth a slight miscalibration or margin of conservatism. I would be grossly surprised if this could possibly increase the 20 kHz figure by those ten percent required to break through the CD limit.

Anyway, the argument is that the canonical choice of sine functions is due to the wave equation, deduced by a 'spherical cow in vacuum' theoretical ideal string, which the hair cells are not. The periodic function that, around the 20 kHz mark, is "least painful given the hearing threshold" is likely not exactly the sine, but likely so close to that it is nothing to worry about for the purpose of the "20" figure.

Anyway, the argument is that the canonical choice of sine functions is due to the wave equation, deduced by a 'spherical cow in vacuum' theoretical ideal string, which the hair cells are not. The periodic function that, around the 20 kHz mark, is "least painful given the hearing threshold" is likely not exactly the sine, but likely so close to that it is nothing to worry about for the purpose of the "20" figure.

I have no idea whatsoever you're talking about here. There is no "canonical" choice for a transform, and a Discrete Fourier Transform uses both sines and cosines, and this has nothing to do with an idea string at all.

The basis vectors for an FFT have to do with mathematics, not string vibrations.

The hair cells, which work somewhat differently, have nothing to do with "least painful given the hearing threshold", either, that I can think of.

The clock responsible for the jitter level is an independent hardware device (provided by the soundcard or provided by the motherboard for the USB controller for USB adaptive mode), irrelevant to the CPU clock. CPU cannot influence the final clock directly (except for some hardly predictable induced power supply noise due to CPU consumption fluctuations).

So this would motivate the question: what is the effect of sampling rate and bit depth on the nature of the errors introduced by this time-frequency analysis?

So that we're crystal clear, time-frequency analysis does not apply to sampling and subsequent reconstruction.

Assuming your signal is band-limited below half the sample frequency, the only errors are due to quantization, imperfect clocking and an imperfect response of the reconstruction filter. These days this can be accomplished without any audible degredation fairly cheaply.

--------------------

Breath is found in waveform and spectral plots;DR figures too, of course.

Scientific curiosity aside:If it was found that some audiophile remedy (96kHz sampling, fancy cables, whatnot) produced audible differences if compared on shorter time-scales than 200ms, or when compared simultaneously via 2-ch headphones, what would the practical consequence be?

For me to enjoy expensive speakers over inexpensive speakers, there must be some long-term memory effect, or at least some change in mood, happiness, etc. If not, then my enjoyment (aside from "owners happiness" etc) would be exactly the same post purchase as pre purchase of those expensive boxes, at least 200ms after installing the new ones. What is the fun in that?

If you want to explore new (?) subjective testing areas, I would vote for long-term testing of (sub) conscious "happiness" when blinded to make etc when installing audio component A over audio component B. Because that would be the best indicator on choices such as "is it worth it?", "should I spend my money on cables or flowers for my wife" kind of questions, assuming that people don't really like paying for brand, looks, marketing etc.

Does speakers with a smooth and wide frequency response/polar pattern improve sleep, reduce depression, make people argue less with their spouse, reduce suicide rates and stuff like that? My guess is that they do, for a small selection of people, and perhaps to such a degree that we will never be able to realistically measure it?

If [...] produced audible differences if compared on shorter time-scales than 200ms, or when compared simultaneously via 2-ch headphones, what would the practical consequence be?

For me to enjoy expensive speakers over inexpensive speakers, there must be some long-term memory effect, or at least some change in mood, happiness, etc. If not, then my enjoyment (aside from "owners happiness" etc) would be exactly the same post purchase as pre purchase of those expensive boxes, at least 200ms after installing the new ones. What is the fun in that?

If you want to explore new (?) subjective testing areas, I would vote for long-term testing of (sub) conscious "happiness" when blinded to make etc when installing audio component A over audio component B. Because that would be the best indicator on choices such as "is it worth it?"

Long-term, as in a six hour listening session, or as in living with it for a month? The latter would be akin to ordinary drug testing, and ... hm, what is the disease? Has anyone ever gotten positive results measuring happiness out of any reasonably comparable setup?

You seem to have the idea that I actually did write choice of 'transform' ...

A 19 kHz symmetric triangular and a 19 kHz sine are different. What does that difference mean in practice? Likely you would get one of two answers: (i) nothing, or (ii) if anything, the triangular is worse, as its useless higher order components are good for nothing and potentially bad for something. Without choice of basis, it is absolutely no reason to claim that the latter has higher order components at all. It would not be so if we used triangular functions ( http://dx.doi.org/10.1016/S0898-1221(99)00075-9 - I am somewhat surprised that this had publishable news value as a research article as late as 1999 ...). We could do that, but there is a good reason why we don't. (Here's a lecturer who has or at last has a fond hope to have, students bright enough to spot that it isn't obvious: http://www.cv.nrao.edu/course/astr534/FourierTransforms.html .)

What does that difference mean in practice? Likely you would get one of two answers: (i) nothing, or (ii) if anything, the triangular is worse, as its useless higher order components are good for nothing and potentially bad for something.

If I make a transform from triangle waves, which can by done by integrating a Hadamard transform, there will be no higher order components.

The ear, on the other hand, has a very strong resonant response, as in "coupled second order sections" with some delayed feedback. So these "higher order components" that one would see in a triangle wave as analyzed by a Fourier basis will simply not be captured by the ear unless they are within the bandwidth of the ear.

So, having said that, what's your point? I still see no point beyond a fallacious appeal to ignorance.

Long-term, as in a six hour listening session, or as in living with it for a month? The latter would be akin to ordinary drug testing, and ... hm, what is the disease? Has anyone ever gotten positive results measuring happiness out of any reasonably comparable setup?

As a consumer, audible differences between e.g. mp3 and CD that is only audible on a timescale of 200ms is irrelevant. What is relevant is whether those differences are perceived (or leads to a changed state of mind) on a timescale of months and years.

If I am to buy $1000 loudspeakers, I would like the experience when listening to Miles Davis this august to be "better" than if I had bought $100 loudspeakers. If it is not, then I would rather spend those money elsewhere.