I've noticed an increase in forum debate about the validity of transferring the credibility of ABX from the physical domain to perception testing. I'm wondering if anyone has found a way past this issue?

The purpose of blind testing is to subtract subjectivity from the effect of - for instance - a drug trial: to assess a medication's impact on a subject's physiology with interference from their psychology. But what about when the purpose of a test is subjective perception? How do we then subtract the effect of the method to arrive at a meaningful outcome?

While we would like to remove expectation bias from the equation, if the conditions under which this is done also change the perceptive state of the listener, the test is invalidated as surely as they would be by tissue sample contamination.

Recent large scale public experiments by Lotto Labs (http://www.lottolab.org/) demonstrated that perception acuity is dramatically altered by test conditions: for instance, that time contraction/dilation effects are experienced when exposed to colour fields. In one experiment, two groups were asked to perform an identical fine-grained visual acuity test. One group was pre-emptively 'manipulated' by filling in a questionnaire designed to lower their self-esteem. This 'less confident' group consistently performed worse on the test that the unmanipulated one: their acuity was significantly impaired by a subtle psychological 'tweak' that wasn't even in effect during the test.

It seems undeniable that the much grosser differences between the mental states of sighted and 'blind' listening - considered generously - cast serious doubt on the results thus obtained.

The harder line is that blind perception tests are a fundamental misappropriation of methodology. In psychology it's axiomatic that for many experiments the subject must be unaware of the nature of the test (see Milgram). If a normalised state is not cunningly contrived, results are at best only indicative of what a subject thinks they should do; at worst, entirely invalid.

Probing hearing, the point is that a test must not change the mental state of the listener.

The contrast between outcomes of sighted and listening tests is as stark as those demonstrating suggestibility (see McGurk), but giving too much credence to such an intrinsically unsound experimental approach (not spotting this difficulty) does no favours to our credibility at all.

The only way past the dilemma seems to be direct mechanical examination of the mind during 'normal' listening to explore why the experiences of sighted and unsighted listening differ. This seems to be an interesting question.

In the meantime, the idea that - despite the method problem - results from blind ABX are valid is at least supported by the majority of data derived from home testing, Audio DiffMaker et al, so we needn't get hung up on it.

There is not a big distinction between consciously selecting random results (acting unethically/ in bad faith) and simply not trying very hard because one thinks, perhaps at least subconsciously, that A and B *should* sound alike so they don't bring their "A game" and simply "phone it in". That's another form of expectation bias and we don't have a good way to preclude it. This is why applying statistical analysis to such results seems unsettling to me. You never know for sure why the results are random.

Here's an example, for all: If asked to participate in a DBT of "the bass response of aftermarket power cords", all of adequate gauge thickness to conduct the current required by the CD player, how many of you would bow out on the grounds that you wouldn't be a good test subject because you find the premise laughable and you'd therefor be biased? If you were to participate, do you really, honestly think you'd be giving it your best effort possible and that there's no way your bias could be influencing your selections, at least at a subconscious level?

Here's an example, for all: If asked to participate in a DBT of "the bass response of aftermarket power cords", all of adequate gauge thickness to conduct the current required by the CD player, how many of you would bow out on the grounds that you wouldn't be a good test subject because you find the premise laughable and you'd therefor be biased?

Sure. Turning to medicine: what if we were to test the effect of homeopathy? I would have a fairly negative expectation bias, especially if I was told it was actually done the homeopathically “proper” way. This and the particular case you mention, could have been mitigated by not telling what specifically were tested. I assume you really shouldn't.

And if you have anything such at hand, introducing a third thingy with known effect could help the analysis. I.e., if you have A, B and C where the difference between A and C is well-established and quantified, and the listeners are biased as you describe (or merely not sufficiently randomly drawn – in practice you would have to deal with self-selection) then you might check if they can distinguish A and C better or worse than “the known average”. That could have been done in the homeopathy case as well. Problem is, it only tells you that you have no test.