The test question was "can we hear any effect of A/D/A conversion at 24/96 performed with a studio recorder". Frankly, both the test organizer and me (let's say, the test statistician), were pretty much convinced that no difference would be detectable.

The test music was played using a gramophone (Bergman Audio Sindre with Air Tight PC-1 cartridge), gramophone preamp (Air Tight ATE-2005), * , amplifier (Soulution 720 / 710), and loudspeakers (Hansen Audio Prince V2). In “analog” trials, the path was as described above. In “digital” trials, a studio recorder (Tascam DV-RA1000) set into “monitor” mode (i.e., A/D then D/A, at 24 bit and 96 kHz) was inserted in place of the asterisk.

There were 10 listeners (neither the test organizer nor me participated; he was switching the connections, I was several thousand kilometers away). All listeners listened together, being in the same room. They left the room for connection switching. During listening, the test organizer remained in the back of the room, invisible to the listeners. The listeners are (and me and the organizers) are members of a small Polish “sensible audiophile” internet forum.

There were 13 test trials, in 7 seven of them there was A/D/A conversion (D), in 6 purely analog path was used (A). The order of D and A trials was random.

Prior to the test, the listeners familiarized themselves with the supposed difference in A vs. D sound and with the recording, which was played a few times in A and D configuration. The listeners received answer cards on which they marked the trials as A or D. They were asked to answer in each trial, even if they were unsure. Prior to the test, they also provided answers to three questions: “Do you think that the effect of digitalization will be audible (yes/no)?”, “Do you consider yourself an experienced listener of vinyl records, using high-quality equipment (yes/no)?”, “How much of your listening time is spent listening to vinyl records (in %)”.

The results were analyzed in two main ways. In the individual analysis, we checked if any of the participants identified A vs. D at a statistically significant level. With one-way binomial test with Šidák correction (due to multiple listeners. i.e., multiple tests) we determined that at most one error (12/13 correct) is allowed to pass this test (p=0.017; for 11/13 p=0.107).In group analysis, we converted the results to proportion correct (e.g., 8/13=0.615) and used one-way Wilcoxon one-sample test to determine if the median of proportion correct was significantly higher than 0.5. (Additionally, we calculated one-way one-sample t-test, however, due to the small sample size, the normality assumption could not be reliably tested and we consider the results of t-test to be less trustworthy.)

The results table, a plot of proportion correct for each listener, and a more detailed description of the analysis are provided at http://www.audio.e-snp.net/wyniki.php (FYI, in the result table A stands for “analog” trials, C stands for “digital” trials, column 1 is listener number, columns 2-4 are answers to the three questions (NIE=no, TAK=yes), the bottom row shows what actually happened in the trials.)No listener answered with 0 or 1 error, which was required for statistically significant outcome of the individual analysis. There was one 11/13 (0.846) result, and two 10/13 (0.769) results. No one scored below 6/13 (0.462).

The interesting thing is, that average proportion correct was 0.631, and it was significantly higher than chance (p=0.0322, Wilcoxon test; possibly unreliable t-test: p=0.0093). My interpretation is that the A/D/A process done with the Tascam recorder did audibly influence the signal.No association of answers to any of the three questions with the proportion of correct answers was found.

Did you carefully match the level of the system with and without the Tascam inserted i.e. are you sure that the Tascam had exact unity gain? Listeners can hear can minute level differences and usually perceive them as qualities other than different level.

Thanks everyone for you comments. And sorry for my slow response, I have to discuss the details with the other test organizer, and being on different continents and in different time zones, we need time for this.

Wouldn't it require using measurement equipment that has better specs than the Tascam ?

QUOTE (DonP @ Jun 19 2011, 06:50)

Levels set so clipping might have happened (like when the stylus hit the record)?

The preamplifier was muted before lowering the stylus and unmuted afterwards. It's a slow, ~2s mute/unmute. Placing the stylus and muting/unmuting was done by a person unaware whether the current trial was A or D.

QUOTE (DonP @ Jun 19 2011, 06:50)

Sometimes with equipment like that "monitor mode" is meant for monitoring, not an output used for production, and is not up to the full specs.

We used the regular Analog Out RCA outputs. As far as we know, the procedure is identical to normal "production" recording except that there is no recording.

QUOTE (DonP @ Jun 19 2011, 06:50)

What sort of time delay is there in the A/D/A process? Could there be some crosstalk or bleed through that would give a subtle pre-echo of the original analog?

We'll look into the possibility of measuring this delay. On the other hand, Tascam specs say that crosstalk is <-97 dB

At this point we have to admit that we have found a potential confound, although it might be irrelevant. We ensured that the levels were below clipping by observing Tascam's clipping indicators during pre-test runs. Everything seemed fine. But we also recorded the test music using the same configuration and settings, and there was tiny amount of slight clipping in the recorded file, namely 21/98 samples in L/R channels, for a total of >17 million samples in each channel. We are not sure if this can be audible/significant, but it is a methodological flaw, and we'll try to find a way of avoiding it, should we re-do the test, or do similar tests.

So, it's not the worst implementation found in Tascam but, isn't it just as using a 18 -bit A/D stage that supports 24-bit resolution (though, would that extra 2-3 bit range you could get by using ADCs I mentioned have much mean in this type of test?).

So, it's not the worst implementation found in Tascam but, isn't it just as using a 18 -bit A/D stage that supports 24-bit resolution (though, would that extra 2-3 bit range you could get by using ADCs I mentioned have much mean in this type of test?).

I don't know what you're trying to say with "18 -bit A/D stage that supports 24-bit resolution", but according to Google its a 24 bit converter.

So, it's not the worst implementation found in Tascam but, isn't it just as using a 18 -bit A/D stage that supports 24-bit resolution (though, would that extra 2-3 bit range you could get by using ADCs I mentioned have much mean in this type of test?).

I don't know what you're trying to say with "18 -bit A/D stage that supports 24-bit resolution", but according to Google its a 24 bit converter.

I think this is nit-picking anyway. Remember that the analogue source for this test is a record from 1975. It would be spectacular enough if if would suggest that a properly done 16bit A/D/A conversion were audible.

isn't it just as using a 18 -bit A/D stage that supports 24-bit resolution (though, would that extra 2-3 bit range you could get by using ADCs I mentioned have much mean in this type of test?).

Yes, the noise performance of the ADC side of the DV-RA1000 is about 18 bit equivalent. I measured input SNR at 106dB or so, unweighted, 22kHz, IIRC. I only ever measured one ADC better under the exact same conditions; that one was 19 bit equivalent.

Interesting that no one did worse than 6/13. Just by intuition I find that to be potentially informative.

But the multiple listeners issue is bothersome. Is there any possibility that the listeners influenced each other?

Also, as far as applicability, we don't tend to listen to our hi-fi that way, with all that acoustic interference. I don't know if that makes the test more or less difficult or doesn't matter, but it might matter.

Interesting that no one did worse than 6/13. Just by intuition I find that to be potentially informative.

Doing 6/13 is not an indicative of being on the good track of things. 6/13 (and 7/13) is an indicative of chance. If you flip a coin, there's as much possibilities to get any of each sides. If you flip it twice, the randomness would suggest you get different sides each time, so achieveing a 1/2.

That's why here at hydrogenaudio we try to make people aware of ABX, and the way to understand the values ( asking to reach 95% or in some cases even 99% of success).

In fact, there is as much possibilities to reach 16/16 than to reach 0/16. 0/16, if not done by chance, would suggest that a difference was clearly noticed, but the user misinterpreted which was which when answering.

About the test itself, I am not knowledgeable enough to find what would make it incorrect, but having 3 out of 10 listeners with a result of 10/13 or better is something to try to understand. There is no proof (they didn't pass the test) that they could hear a difference, but the results imply that there might have been something.

Interesting that no one did worse than 6/13. Just by intuition I find that to be potentially informative.

Doing 6/13 is not an indicative of being on the good track of things. 6/13 (and 7/13) is an indicative of chance. If you flip a coin, there's as much possibilities to get any of each sides. If you flip it twice, the randomness would suggest you get different sides each time, so achieveing a 1/2.

That's why here at hydrogenaudio we try to make people aware of ABX, and the way to understand the values ( asking to reach 95% or in some cases even 99% of success).

In fact, there is as much possibilities to reach 16/16 than to reach 0/16. 0/16, if not done by chance, would suggest that a difference was clearly noticed, but the user misinterpreted which was which when answering.

About the test itself, I am not knowledgeable enough to find what would make it incorrect, but having 3 out of 10 listeners with a result of 10/13 or better is something to try to understand. There is no proof (they didn't pass the test) that they could hear a difference, but the results imply that there might have been something.

I disagree. If the listeners are guessing, you would expect a normal distribution centered at 50/50. That we have people who got 10/13 would be expected. That no one got 3/13 or 4/13 or even 5/13 is what stands out about the distribution. It is also, ultimately, why the overall mean was statistically significant. If bad guessers balanced out good guessers the mean would be 50/50. They didn't.

We ensured that the levels were below clipping by observing Tascam's clipping indicators during pre-test runs. Everything seemed fine. But we also recorded the test music using the same configuration and settings, and there was tiny amount of slight clipping in the recorded file, namely 21/98 samples in L/R channels, for a total of >17 million samples in each channel. We are not sure if this can be audible/significant, but it is a methodological flaw, and we'll try to find a way of avoiding it, should we re-do the test, or do similar tests.

I don't have any backing for this, so please just overlook if it is too far out.

I've been told that my MH ULN8 should operate at -6db (if I'm not wrong) at the input for the A/D converters to prove their best. Is that just a «fairytale»?; and if not, is this something that is common to A/D converters and could also apply to the Tascam unit? (Nothing said about the input in this test, as I can see.)

We ensured that the levels were below clipping by observing Tascam's clipping indicators during pre-test runs. Everything seemed fine. But we also recorded the test music using the same configuration and settings, and there was tiny amount of slight clipping in the recorded file, namely 21/98 samples in L/R channels, for a total of >17 million samples in each channel. We are not sure if this can be audible/significant, but it is a methodological flaw, and we'll try to find a way of avoiding it, should we re-do the test, or do similar tests.

I don't have any backing for this, so please just overlook if it is too far out.

I've been told that my MH ULN8 should operate at -6db (if I'm not wrong) at the input for the A/D converters to prove their best. Is that just a «fairytale»?; and if not, is this something that is common to A/D converters and could also apply to the Tascam unit? (Nothing said about the input in this test, as I can see.)

Apology if I'm breaking any rules here.

Professional recording is usually done at -24 dBFS or so. As has apparently been demonstrated here, it does not pay to be stingy with headroom. Watch your meters. There is no standard for what a clipping indicator means.

In case it isn't clear, leaving headroom when recording is to prevent unexpectedly high input levels from clipping. It has nothing to do with the quality of the A to D (barring some unusual ADC), or how the converter operates. As long as there is no clipping. the input can be extremely near, or even at, 0dBfs.

leaving headroom when recording is to prevent unexpectedly high input levels from clipping.

Under studio conditions it's very well possible to make an educated guess about the max spl and take some risk. And if clipping happens it's often no problem to record that part again. Live recording is different, so a larger headroom margin can be required. Last weekend I recorded airplanes on an airshow and even with plenty of headroom the large 3-engine airplane took me by surprise. 24-bit ADC is no luxury under these conditions because the quieter parts will require at least 20dB gain during post production.