I cannot conclude that there is no difference in first example in number ordering, because I made test too hard by using people’s memory.

This is not like that. The kind of double blind tests I'm thinking of involves rapidly switching (at one's will) between one and the other by clicking on a button or flicking a switch, with uninterrupted playback. It seems that is not what you have in mind. We might be arguing over a misunderstanding. If that is the case, download foobar2000 with the ABX component and you'll see what I'm talking about.

!. Listen to a recording of the numbers1-10 in random order, then listen to them again with one of numbers moved to another position in the ladder of numbers, do this 3 times back to back.

2. Listen to same recording using 2 different speakers playing recordings at same time and do it 3 times back to back. Notice how much easier it is to spot which number has moved in ladder.

Most people will only get the first attempt correct in first experiment, while everyone will pass the second experiment with flying colors. I cannot conclude that there is no difference in first example in number ordering, because I made test too hard by using people’s memory. Because memory is not consistent, unreliable, varies with age etc, and not accurate. The test cannot accurately represent the truth, while the second test is much more accurate because memory wasn’t involved.

If I understand you correctly, most ABX testing is more like #2, where two options are directly contrasted.

Memory is flawed. Well designed ABX and double blind testing are specifically designed to help reduce the effect that can have on the results.

And yes, you can only focus on one picture at a time. The "high resolution" portion of your vision is a very small area. Put two side by side 4x6 images on a table in front of you. You must shift focus back and forth to review and compare details. Your brain does this almost instantaneously, giving you the illusion of being able to "see" much more than you actually do. Your ears do the same thing.

It is perfectly true that memory is fallible and acoustic memory except for very gross changes is extremely short.

This is why DBTs are best when using short segments with frequent repeats between A/B, A/X and B/X - but we know that some members here can successfully DBT two codecs even small differences such as high quality lossy vs lossless in some situations with some samples. I can reliably and consistently DBT between two of my CD players, there is nothing magical about this , one is about 1.6db louder than the other. I can do this even when I use 2nd hand recordings of the 2 as samples, in short segments 1.6db is a lot !

Tom Nousaine tested the difference between long term and short term testing where he gave listeners a box which might have had 2.5% distortion added or might not. The long term listeners were hopeless at it, scoring 50% - then he used the same box but with rapid DBT switching for listeners and lo and behold the difference was much easier to detect. The irony is that many of the hard core anti-dbt folks maintain with unshakeable fervour that they can remember in great detail the differences between their unmodded players and now modded players when the modding took 3 weeks and involved sending the player away

I can reliably and consistently DBT between two of my CD players, there is nothing magical about this , one is about 1.6db louder than the other. I can do this even when I use 2nd hand recordings of the 2 as samples, in short segments 1.6db is a lot !

Showing why careful level matching is important. :)

Quote:

Tom Nousaine tested the difference between long term and short term testing where he gave listeners a box which might have had 2.5% distortion added or might not. The long term listeners were hopeless at it, scoring 50% - then he used the same box but with rapid DBT switching for listeners and lo and behold the difference was much easier to detect. The irony is that many of the hard core anti-dbt folks maintain with unshakeable fervour that they can remember in great detail the differences between their unmodded players and now modded players when the modding took 3 weeks and involved sending the player away

Exactly. The reviews based on week or months old contrasts of products (or even decades old - I've actually seen "compared with X turntable I sold 15 years ago..." used in earnestness), system changes and mods, of the golden ear crowd make me want to cry.

Indeed - if you're arguing that the human memory is no good at remembering sound quality difference over the tiny timescales involved in ABX testing, then you might be right, but you're also saying that there is no possible way that someone can tell a difference between two interconnects when in between listening, they've gone behind their amp, unplugged the old one, and plugged the new one in. If the only way you can tell a difference is with one in one ear and one in the other, then surely in order to appreciate that difference in your system you'd need to do that at all times. Otherwise you might turn on your music and love the sound, without ever realising that someone swapped your expensive kit for cheap stuff. What with your memory being so unreliable and all.

This happens a lot with arguing against ABX tests. You pick them into tiny, tiny pieces, without realising that as you do so, you are proving that if the difference cannot be detected in this situation, it probably can't be heard in the more informal situation when placebo and bias aren't eliminated. And that if the difference does exist, it doesn't matter.

Same as when people say, "sure, you can't hear the difference, but you can perceive it". Right, but if you still can't tell the difference when you don't know which is which, how important is that "perceived" (or more correctly, "not perceived") difference anyway? I mean, are you really willing to spend thousands of dollars on stuff whose presence in your system you can't actually detect without looking? Well, yes, obviously you are. But you might be being silly.

Back in the 90's, Pepsi ran a marketing campaign called the Pepsi Taste Challenge. Time after time, on blind tests, random test subjects consistently chose Pepsi over Coke; so much so that Coke introduced a new version flavour. The "New Coke" flopped. The reason? What tastes good with one sip, may not be enjoyable by the time you're halfway through a can.

Coke followed up the flop of "New Coke" by doing whole can tests - giving people the full serve rather than just sips. Overwhelmingly, people preferred Coke by the time they drank a significant amount.

I don't want to start a Coke vs Pepsi war here, that's not the point of the story. The point of the story is that what is enjoyable when we flick between sources (sound, taste, texture, any sensory input) may be quite different to what is enjoyable when we stick with that same source for a period of time.

In auditioning, this is also true. If you listen to a high quality mid-focussed speaker/headphone and then switch to a high quality bright, sparkly speaker/headphone, you will probably find one seems more enjoyable to you (based on your tastes), but if you stick with one or the other for a period of time, you will likely learn to enjoy what it offers. In other words, A/B testing is very valuable to determine differences, but it may not be a good/reliable reference for determining true preferences. (E.g. a bright, sparkly headphone may seem amazingly revealing and resolving in a quick A/B, but may become more fatiguing with prolonged listening)

To me, it's all about balancing what tests you use for what ends. Lossless vs lossy is definitely about differences, but it's also about storage capacity for many people. Long term testing will show you if you can still enjoy lossy formats in "real life" even though you may know (from A/B testing) that there is a difference.

I don't want to start a Coke vs Pepsi war here, that's not the point of the story. The point of the story is that what is enjoyable when we flick between sources (sound, taste, texture, any sensory input) may be quite different to what is enjoyable when we stick with that same source for a period of time.

The issue at hand is not one of preference. It is one of difference. If people can't reliably hear a difference (any difference, preferable or otherwise) with fast switching, there's no reason they'd reliably hear one with long tests. Likewise, if we assume people can't tell a difference between Coke and Pepsi with small sips, there's no reason they're going to tell a difference drinking whole cans.

Back in the 90's, Pepsi ran a marketing campaign called the Pepsi Taste Challenge. Time after time, on blind tests, random test subjects consistently chose Pepsi over Coke; so much so that Coke introduced a new version flavour. The "New Coke" flopped. The reason? What tastes good with one sip, may not be enjoyable by the time you're halfway through a can.

Coke followed up the flop of "New Coke" by doing whole can tests - giving people the full serve rather than just sips. Overwhelmingly, people preferred Coke by the time they drank a significant amount.

Wrong. On a couple of counts, really. The "New Coke" experiment was in the mid-1980s, for one. And the root of the whole "New Coke" failure was the Coke people making the assumption that soda drinkers made their purchasing decisions based on taste alone. The outcome of the Pepsi Challenge was perfectly valid, and there were never any valid blind tests performed that proved that Coke would somehow win the battle were the quantity consumed greater. In a blind test, people preferred Pepsi, though the instant the test cups were labeled, Coke would suddenly seize the day. (Malcolm Gladwell at one point theorized that Coke might have won the blind challenge were people given whole cans to drink, but that was never put to the test. He's kind of infamous for thinking up untested theories posing as simple answers to grand questions.)

People didn't, and still don't make purchasing decisions based on taste. People had a natural affinity for the Coke brand that made the whole Coke experience subjectively more enjoyable than that of Pepsi. This translated to how the drink "tasted" as part of a whole soda-drinking experience. Coke changing the formula changed the experience in subtle ways that added unfamiliarity, which made people uneasy. It should be noted that when "New Coke" first came out, the Coke people touted their own blind tests which showed that the new formula bested Pepsi in blind tests. Again, this result is valid, but it doesn't capture the whole soda-consuming experience.

The whole story of the Pepsi Challenge is a fascinating case study in how brand preference and expectation bias shapes our tastes and preferences. It in no way invalidates the blind testing method. If anything, it reaffirms the power of suggestion.

The issue at hand is not one of preference. It is one of difference. If people can't reliably hear a difference (any difference, preferable or otherwise) with fast switching, there's no reason they'd reliably hear one with long tests.

The issue at hand is not one of preference. It is one of difference. If people can't reliably hear a difference (any difference, preferable or otherwise) with fast switching, there's no reason they'd reliably hear one with long tests. Likewise, if we assume people can't tell a difference between Coke and Pepsi with small sips, there's no reason they're going to tell a difference drinking whole cans.

The issue at hand is one of the arguments against double-blind testing and this is predominantly being discussed as a "switch-at-will" back and forth comparison to minimise the limitations of our memory. With that in mind, an argument against this style of testing is that it removes the influence of long-term preference versus instant reaction. My Shure 535 Ltds sounded awful the other day after listening to some cheap Audio Technicas for a moderate period of time. In an instant I would have said the 535s sounded bad, but I quickly adjusted back to the 535s as the superior sounding phone.

The study showed that people could tell the difference and that people's rapid reaction was favourable to Pepsi, but the proposed suggestion (danroche may be accurate about it never being formally tested, I can't be sure) was that after drinking a larger quantity (e.g. >50% of a can), the Coke would become favourable due to it's different flavour profile.

Of course, if you're looking at lossless vs lossy formats, only double blind is necessary because long term perception is irrelevant. If you were blind testing speakers though it would be naive to go on first impressions alone because there are subtleties in any sensory experience that may not present until you relax and get used to a different style of sensory input.

Quote:

Originally Posted by danroche

Wrong. On a couple of counts, really. The "New Coke" experiment was in the mid-1980s, for one. And the root of the whole "New Coke" failure was the Coke people making the assumption that soda drinkers made their purchasing decisions based on taste alone. The outcome of the Pepsi Challenge was perfectly valid, and there were never any valid blind tests performed that proved that Coke would somehow win the battle were the quantity consumed greater. In a blind test, people preferred Pepsi, though the instant the test cups were labeled, Coke would suddenly seize the day. (Malcolm Gladwell at one point theorized that Coke might have won the blind challenge were people given whole cans to drink, but that was never put to the test. He's kind of infamous for thinking up untested theories posing as simple answers to grand questions.)

People didn't, and still don't make purchasing decisions based on taste. People had a natural affinity for the Coke brand that made the whole Coke experience subjectively more enjoyable than that of Pepsi. This translated to how the drink "tasted" as part of a whole soda-drinking experience. Coke changing the formula changed the experience in subtle ways that added unfamiliarity, which made people uneasy. It should be noted that when "New Coke" first came out, the Coke people touted their own blind tests which showed that the new formula bested Pepsi in blind tests. Again, this result is valid, but it doesn't capture the whole soda-consuming experience.

The whole story of the Pepsi Challenge is a fascinating case study in how brand preference and expectation bias shapes our tastes and preferences. It in no way invalidates the blind testing method. If anything, it reaffirms the power of suggestion.

Sorry about the date error, but compeletely irrelevant to the point.

Completely agree about the brand/packaging/appearance influence on perception. That's an argument for double-blind testing, but this thread is about the arguments against double-blind testing and a key argument against it is the limitation of using a back-and-forth switching approach without taking time to consider the influence of "brain burn-in" and the long term effects of a particular sound signature (for example).

If you look at my post, I clearly said that there are truths both ways. Rather than looking for all of the errors, think about the benefits of all methods. As I said:

Quote:

To me, it's all about balancing what tests you use for what ends. Lossless vs lossy is definitely about differences, but it's also about storage capacity for many people. Long term testing will show you if you can still enjoy lossy formats in "real life" even though you may know (from A/B testing) that there is a difference.

Sometimes you need to know exactly what you're listening to in order to make a fully informed decision. That doesn't mean you shouldn't also do blind test, just use what you need - there's no need to focus on right and wrong because there's no such thing in every circumstance.

Thank you, for even considering that I might be right. Most people here don’t want to even consider that by depending on their memory they actually invalidate their test results.

“Indeed - if you're arguing that the human memory is no good at remembering sound quality difference over the tiny timescales involved in ABX testing, then you might be right,”

You have a point. However, I was very clear in what I stated. We only bother to do these tests on things we are not sure about (10%-20% gain on audio quality for example). When the difference is arguably undetectable or not as noticeable, for example If someone replaced your 1k+ cims with a 5$ pair. You would definitely notice even if you were blind, and you wouldn’t need to do a test to confirm it. You also, wouldn’t need to do test to see if you feel pain every time someone pinches you.

Quote:

Originally Posted by jumblejumble

Indeed - if you're arguing that the human memory is no good at remembering sound quality difference over the tiny timescales involved in ABX testing, then you might be right, but you're also saying that there is no possible way that someone can tell a difference between two interconnects when in between listening, they've gone behind their amp, unplugged the old one, and plugged the new one in. If the only way you can tell a difference is with one in one ear and one in the other, then surely in order to appreciate that difference in your system you'd need to do that at all times. Otherwise you might turn on your music and love the sound, without ever realising that someone swapped your expensive kit for cheap stuff. What with your memory being so unreliable and all.

You maybe be correct, but this definitely does not belong in “What are the arguments against double blind tests (incl. ABX)?” Let me put it this way if you listen to something, and then listen to it again while watching me munching on potato chips for example. Even if you can’t hear me they will both sound different to you. They sound different because your brain believe it or not doesn’t have infinite resources. Furthermore, for a normal human being, your brain allocates most of its resources in processing what you see. Furthermore, this is also true when it comes to short and long term memory. You can easily test this by trying to remember what you heard and saw today.

If any scientist declares that he/she the results of a test depending on human memory is absolute fact. Almost every engineer and scientist out there would laugh at him or her and would probably think very little of them. Yet, people on this site keep insisting that using short term memory or long term memory is still a valid scientific approach. I really have nothing else to add except “whatever”.

Quote:

Originally Posted by jumblejumble

This happens a lot with arguing against ABX tests. You pick them into tiny, tiny pieces, without realising that as you do so, you are proving that if the difference cannot be detected in this situation, it probably can't be heard in the more informal situation when placebo and bias aren't eliminated. And that if the difference does exist, it doesn't matter.

Same as when people say, "sure, you can't hear the difference, but you can perceive it". Right, but if you still can't tell the difference when you don't know which is which, how important is that "perceived" (or more correctly, "not perceived") difference anyway? I mean, are you really willing to spend thousands of dollars on stuff whose presence in your system you can't actually detect without looking? Well, yes, obviously you are. But you might be being silly.

It doesn’t matter if you can perceive a difference or not. The main argument here is that any test that depends on short term memory even if only a second passed is not a valid scientific test. Thus, findings from such tests should not be taken seriously.

Quote:

Originally Posted by jumblejumble

Same as when people say, "sure, you can't hear the difference, but you can perceive it". Right, but if you still can't tell the difference when you don't know which is which, how important is that "perceived" (or more correctly, "not perceived") difference anyway? I mean, are you really willing to spend thousands of dollars on stuff whose presence in your system you can't actually detect without looking? Well, yes, obviously you are. But you might be being silly.

If you can't tell the difference over the extremely short timespan of an ABX test (rapidly switching back and forth), then you can't tell the difference period. If you're right, what could possibly validate the claims that something makes a difference (if you can even tell in an ABX test)?

In other words, wouldn't you say that an ABX test (which suffers the least from memory-induced limitations) is superior than any other kind of comparison?Edited by skamp - 3/14/12 at 5:05am

The issue at hand is one of the arguments against double-blind testing and this is predominantly being discussed as a "switch-at-will" back and forth comparison to minimise the limitations of our memory. With that in mind, an argument against this style of testing is that it removes the influence of long-term preference versus instant reaction. My Shure 535 Ltds sounded awful the other day after listening to some cheap Audio Technicas for a moderate period of time. In an instant I would have said the 535s sounded bad, but I quickly adjusted back to the 535s as the superior sounding phone.

The study showed that people could tell the difference and that people's rapid reaction was favourable to Pepsi, but the proposed suggestion (danroche may be accurate about it never being formally tested, I can't be sure) was that after drinking a larger quantity (e.g. >50% of a can), the Coke would become favourable due to it's different flavour profile.

Once again, we aren't testing preference. We aren't asking "Which do you like more"? That alone introduces so many biases that the results would be meaningless beyond "X% of the people tested prefer Y". It tells nothing about the magnitude of difference, which is what we want to know. ABX tests as they're used to test amps and cables, etc. are about identifying any difference. In this case, and you admit yourself with your mention of the Shures and Audio-Technicas, short term tests are more likely to reveal differences. We aren't talking about "bad" vs. "good", we are talking about "Are they different at all?"

Your switching between the Shures and Audio-Technica revealed a difference. That's all we want to know from an ABX. We don't want to know which one you like more. We just want to know if, given a third somehow unidentifiable headphone of one of the brands, could you name which brand it is just by listening?

As such your Coke/Pepsi analogy is flawed because there is a clear quantifiable difference between the two, and the test was of preference and not the identification of that difference. The purpose of the test is entirely different. You can't use problems with choosing preference to argue against fast switching ABX, because that's not the point of such a test. That's a strawman.