As a scientist (biophysics) and an audiophile, I have my feet in both worlds, and have thus enjoyed this discussion. Here are some thoughts:

I. Open Access.

Several have written it’s unfortunate that this paper, which is of general public interest, is not available to those without an AES subscription (that includes myself). Agreed; and note that the landscape for scientific publication has been changing: researchers are increasingly realizing the value of “open access” papers – ones available freely, without a subscription. Some scientific journals have gone entirely open access, while others (for an additional fee) offer the option of open access publication of your paper. Let me suggest that, if AES doesn’t offer this option, it should, and that those publishing papers of broad interest should endeavor (if financially possible) to avail themselves of this.

II. ABX testing.

I assume from the discussion that this paper makes use of sequential ABX testing (by “sequential” I mean: present A, then B, then X, in sequence). I’d like to see basic research done into sequential ABX testing itself. In particular, sequential ABX testing (double-blinding will be assumed throughout this discussion) is highly regarded for assessing perceptual discrimination, because (with proper statistical analysis) it eliminates false positives – subjects can’t “pass” (i.e., achieve successful discrimination in) a sequential ABX test unless they truly can reliably distinguish A from B. But what about the converse? Achieving accurate discrimination in sequential ABX testing is hard, because it requires accurate memory of both A and B when judging X. Therefore I’d like to offer the hypothesis that subjects can “fail” sequential ABX testing even when they can successfully discriminate between A and B, and suggest a way of testing this. Essentially, one would use simultaneous ABX testing to eliminate the memory requirement, and then compare the results to those for sequential ABX testing. I can’t think of how you’d do simultaneous ABX testing with hearing, but it could certainly be done with vision (using color tiles). The experiment would be divided into three phases:

1) Find the smallest color difference that can be reliably distinguished during simultaneous ABX testing (present all three tiles – A, B, and X – simultaneously, so that the subjects can line them up next to each other for comparison).2) Repeat the experiment, using that same color difference, with sequential ABX testing. 3) If the subjects fail phase 2 (thus confirming my hypothesis), find the smallest color difference that can be reliably distinguished using sequential ABX testing, and compare to the results of phase 1.

If it is found that subjects indeed fail to reliably distinguish colors when presented sequentially that they can reliably distinguish when presented simultaneously, this will suggest that sequential ABX testing may not be a good method for assessing our perceptual limits (i.e., for determining transparency) -- and not just for vision, but possibly for hearing as well.

Those who are dismissive of high-res audio based on theory alone typically cite the Shannon sampling theorem (typically misattributed to Nyquist), noting correctly that we can’t hear over 20 kHz (if that), and that we only need >40 kHz sampling to accurately reproduce this. But 2 x max. frequency is not the theorem’s only requirement. It’s my understanding that it also assumes an infinite signal, perfect sampling, and perfect interpolation. I’ve never seen any of these assumptions discussed in this context, so I’d like to ask how much practical effect these requirements would have on Redbook (16 bit/44.1 kHz) vs. high-res conversion:

1) Infinite signal. I assume we can effectively satisfy this with signal length >> 1/frequency, and that this is thus a non-issue.

2) Perfect sampling. Clearly, sampling need not be perfect, but simply close enough to perfect to be transparent in amplitude and time. Timing errors lead to jitter (right?). So (and this is an engineering question): what’s the relationship between sampling rate and how easy it is to eliminate audible jitter errors?

3) Perfect interpolation. Another engineering question: Naively, unless implementing near-perfect (i.e., transparent) interpolation is trivial, I would think it would be easier to achieve transparent interpolation with a higher sampling rate, because the points are more closely spaced. Is transparent interpolation so easily achievable that the sampling rate has no practical effect?

IV. The effect of mastering.

Many have mentioned that the reason high-res disks do indeed sound better than CDs of the same performance is that they are mastered differently – as labors of love, and without the usual commercial pressures to alter the sound. Given this, I think it would be a pubic service if someone could produce a Blu-Ray disk corresponding to the songs tested in this study, containing both the high-res and Redbook versions of each, and make it available for sale. That way people could easily experiment for themselves.

V. General thoughts.

I think the reason for the continued controversy about digital audio performance is that we don’t completely understand the biophysics of human hearing (which is why it continues to be an active area of research). If we did, we would know, a priori, what constitutes a complete specification set sufficient to determine transparency, and thus could engineer transparent electronic gear (I say electronic because I am excluding transducers) without listening to it. To the best of my understanding, this is not yet the case, since the errors that our auditory system is capable of detecting can be extraordinarily subtle, and what would constitute a complete set of scalar specifications sufficient to ensure transparency thus remains an open question.

This post has been edited by greynol: Jun 14 2013, 14:02

Reason for edit: Added link to original discussion from which this one was split for being off-topic.

There is a sampling/interpolation/hearing issue that can only be resolved by studying the ear's response, namely to what extent the hair cells work so much as idealized strings that the sine functions are the appropriate basis. Such studies can, for all that I know, have been carried out explicitely; otherwise, a layman's gut feeling is that the effect is at most worth a slight miscalibration or margin of conservatism. I would be grossly surprised if this could possibly increase the 20 kHz figure by those ten percent required to break through the CD limit.

Anyway, the argument is that the canonical choice of sine functions is due to the wave equation, deduced by a 'spherical cow in vacuum' theoretical ideal string, which the hair cells are not. The periodic function that, around the 20 kHz mark, is "least painful given the hearing threshold" is likely not exactly the sine, but likely so close to that it is nothing to worry about for the purpose of the "20" figure.