I've noticed an increase in forum debate about the validity of transferring the credibility of ABX from the physical domain to perception testing. I'm wondering if anyone has found a way past this issue?

The purpose of blind testing is to subtract subjectivity from the effect of - for instance - a drug trial: to assess a medication's impact on a subject's physiology with interference from their psychology. But what about when the purpose of a test is subjective perception? How do we then subtract the effect of the method to arrive at a meaningful outcome?

While we would like to remove expectation bias from the equation, if the conditions under which this is done also change the perceptive state of the listener, the test is invalidated as surely as they would be by tissue sample contamination.

Recent large scale public experiments by Lotto Labs (http://www.lottolab.org/) demonstrated that perception acuity is dramatically altered by test conditions: for instance, that time contraction/dilation effects are experienced when exposed to colour fields. In one experiment, two groups were asked to perform an identical fine-grained visual acuity test. One group was pre-emptively 'manipulated' by filling in a questionnaire designed to lower their self-esteem. This 'less confident' group consistently performed worse on the test that the unmanipulated one: their acuity was significantly impaired by a subtle psychological 'tweak' that wasn't even in effect during the test.

It seems undeniable that the much grosser differences between the mental states of sighted and 'blind' listening - considered generously - cast serious doubt on the results thus obtained.

The harder line is that blind perception tests are a fundamental misappropriation of methodology. In psychology it's axiomatic that for many experiments the subject must be unaware of the nature of the test (see Milgram). If a normalised state is not cunningly contrived, results are at best only indicative of what a subject thinks they should do; at worst, entirely invalid.

Probing hearing, the point is that a test must not change the mental state of the listener.

The contrast between outcomes of sighted and listening tests is as stark as those demonstrating suggestibility (see McGurk), but giving too much credence to such an intrinsically unsound experimental approach (not spotting this difficulty) does no favours to our credibility at all.

The only way past the dilemma seems to be direct mechanical examination of the mind during 'normal' listening to explore why the experiences of sighted and unsighted listening differ. This seems to be an interesting question.

In the meantime, the idea that - despite the method problem - results from blind ABX are valid is at least supported by the majority of data derived from home testing, Audio DiffMaker et al, so we needn't get hung up on it.

Please enlighten me. I am not a scientist nor have I had any training or study in this area. What positive and negative controls would one use to avoid this particular problem of bias I just mentioned? Please be specific, thanks.

Please enlighten me. I am not a scientist nor have I had any training or study in this area. What positive and negative controls would one use to avoid this particular problem of bias I just mentioned? Please be specific, thanks.

Start with two signals that are by any reasonable measure vastly different. Then slightly less different. Then slightly less different again. Repeat in increments until the listener starts 'guessing' or their 'bias toward no difference' kicks in.

Seriously, this is a non-issue for most listeners. In most cases I've read about where the DBT result was 'no support for the null H', the listener *believes* they hear a difference both before and *during the test* as well. In other cases they complain that the difference that they thought they heard 'sighted' suddenly seems harder to hear whne they're listening blind. In neither case is 'bias towards not hearing' a credible factor.

Please enlighten me. I am not a scientist nor have I had any training or study in this area. What positive and negative controls would one use to avoid this particular problem of bias I just mentioned? Please be specific, thanks.

Start with two signals that are by any reasonable measure vastly different. Then slightly less different. Then slightly less different again. Repeat in increments until the listener starts 'guessing' or their 'bias toward no difference' kicks in.

Good example. Thanks. You then cherry pick the test subjects to get rid of the bad apples, I guess.

Can't say I recall ever reading of any DBT in the audio press which ever did this pre-screening you've just described, but I hope it or some similar procedure to preclude this particular bias is standard procedure in the academic world. [I can't edit my original post at this late date, however I didn't stress enough that the "mischievous" behavior of the listener may be (possibly) at a subconscious level. He/she would pass a lie detector test that they were "Doing their best", that is; they aren't "frauds".]

QUOTE

Seriously, this is a non-issue for most listeners.

So rather than applying the time consuming control you just described, we could simply ask potential participants if they might be biased on a conscious or subconscious level, instead. Ha-ha!

Rule 1 : It is impossible to prove that something doesn't exists. The burden of the proof is on the side of the one pretending that a difference can be heard.If you believe that a codec changes the sound, it is up to you to prove it, passing the test. Someone pretending that a codec is transparent can't prove anything.

So even though random results don't prove anything, which I agree is correct, you seem to think that statistical analysis may be applied to them?! Huh? That's what I don't get. If no firm conclusion can be drawn, one way or the other, how on earth can one describe the probability/certainty of "no evidence/proof found, at least in this instance". NOTHING was established in the first place and nothing was proven so how can you describe the certainty of this "nothing", this randomness, as a percentage?! For all we know the randomness is caused by problems with the test design, such as A, B, and/or C , all things which can be immediately ruled out if the results go the other way however (where the listener can successfully hear a difference most/all of the time), and we have no way of knowing the "percentage of likelihood" of problems such as A, B, or C. [But I do like that control you suggested for B. Bravo!]

Please enlighten me. I am not a scientist nor have I had any training or study in this area. What positive and negative controls would one use to avoid this particular problem of bias I just mentioned? Please be specific, thanks.

Start with two signals that are by any reasonable measure vastly different. Then slightly less different. Then slightly less different again. Repeat in increments until the listener starts 'guessing' or their 'bias toward no difference' kicks in.

Good example. Thanks. You then cherry pick the test subjects to get rid of the bad apples, I guess.

um....no. Unless by 'bad apples' you mean people with significant hearing loss, which this control will indeed identify. Look, you've already admitted you have no knowledge of how the science is done. I was merely telling you what Woodinville meant by 'positive control', and he's right, a rigorous experiment typically employs a positive as well as negative control.

QUOTE

Can't say I recall ever reading of any DBT in the audio press which ever did this pre-screening you've just described, but I hope it or some similar procedure to preclude this particular bias is standard procedure in the academic world. [I can't edit my original post at this late date, however I didn't stress enough that the "mischievous" behavior of the listener may be (possibly) at a subconscious level. He/she would pass a lie detector test that they were "Doing their best", that is; they aren't "frauds".]

Do you recall the audio press DBT subjects ever NOT claiming they heard a difference, sighted? I don't.

So, against the body of perceptual psychology data, tallying the ways humans are alert for 'difference' whether it exists or not, you posit a population who consciously assert things sound different, yet unconsciously think they sound the *same*.

Please enlighten me. I am not a scientist nor have I had any training or study in this area. What positive and negative controls would one use to avoid this particular problem of bias I just mentioned? Please be specific, thanks.

Start with two signals that are by any reasonable measure vastly different. Then slightly less different. Then slightly less different again. Repeat in increments until the listener starts 'guessing' or their 'bias toward no difference' kicks in.

Good example. Thanks. You then cherry pick the test subjects to get rid of the bad apples, I guess.

um....no. Unless by 'bad apples' you mean people with significant hearing loss, which this control will indeed identify.

Yes, that's essentially what I meant. The bad apples you weed out in pre-screening are the ones who aren't showing an ability to differentiate between the two sources with a pre-established small difference which should be audible to most.

You don't know if it is due to their poor hearing, malicious intent to skew the test, a lack of understanding how to vote, or what they are to do, or a preconceived notion (a bias) that it is ridiculous to think that power cords on CD players make an audible difference, so they are lackadaisically selecting A vs B and not really giving it their all [even though unbeknownst to them they haven't even started to hear those pairings yet, since they are still in a pre-screening stage looking to weed the biased people out!]

QUOTE

Look, you've already admitted you have no knowledge of how the science is done.

No, I said I wasn't a scientist. I do, however, understand that in a good scientific experiment there should be ways to prevent all forms of bias, even if that bias may be at a subconscious level and/or one thinks "that sort of bias isn't likely to have an impact on the results". There may be unforeseen reasons why it does have an impact that haven't been thought through. The test subjects not giving it their all, 100% focused attention, or rushing to give answers compared to the rest, was just an example.

The whole reason why we do double blind testing, not single blind, is for this exact same reason. There's no reason to think that a competent test administrator passing out the test forms and pencils to the test subjects would act or speak in a way that would influence the subjects, or give away the identity of A or B, however to be absolutely sure there's nothing we may have overlooked, we make them blinded too!

We don't just get rid of the forms of bias we suspect might have an impact, we get rid of ALL biases as best we can.