TNT,
This topic was a short lived one but sums up all the truth about speakers.

Difference b/w speakers do exist.
Differences b/w well made speakers are often subtle and not gross (Alan's description with the phrase "comparably good" is succinct).
Determining preferences must be done blind.
Price does not equate to better sound.

However from some of the earlier notes, conclusions must be carefully considered.

In reply to:

In such cases, the rankings we gave would seldom differ by more than a fraction--something like 7.9 out of 10 vs. 8.0 out of 10. This was statistically insignificant after many rounds of listening. Often the rankings would shift a bit if we moved into a different seat in the room

Statistically insignificant within a pool of people? Or insignificant from multiple trials with the same individual?
In this case, if an individual continually ranks speaker A over speaker B even if by fractions of whatever scoring mode was used (note the relative judgement scoring in Dr. Toole's paper on pg11 linked by JohnK), then that person has still found a speaker they prefer over another. As such, would it matter then if that speaker cost $1000 more?
Those who seek audio nirvana would say no. Those who are budget conscious or simply don't care as much would say yes. Ultimately in a pool of human subjects you still have subjectivity thus making a statistical analysis very scattered in its response. The sample numbers would have to be quite high to have reasonable power to determine strong certainties. There are also statistical means by which one can sort together groups of like individuals. It would be interesting to poke through some of this old data and revisit ideas that may not have been reviewed.

_________________________"Those who preach the myths of audio are ignorant of truth."

I have only one graduate course in statistical analysis, however, I do know that the standards at the NRC were quite rigorous and I suspect that Dr. Floyd Toole could mount a strong defense.

I would point out these tests involved, typically, four or five panel members, and tests were repeated in 25-minute sessions over the course of, usually, three days (if we were reviewing four different speakers). This was done so that every speaker was auditioned by each panel member sitting in five different chairs in the room, and each speaker was placed in four different locations to randomize out the effects of room vs. speaker location vagaries as well as seating location.

Note also that these tests were first done in mono with single speakers, which very quickly lets you isolate glitches and non-linearities. When the tests were repeated in stereo, the individual rankings never changed. The absolute scores went up even for bad speakers, but the relative rankings remained the same.

With excellent speakers, the "comparably good" line became common when it became impossible to rank one good speaker ahead of the other because slight personal preferences would change with each track of music.

Home listening tests are rarely ever performed in mono with a single speaker and even if you do the test blind in stereo, I believe that biases shift because of the shift in soundstage that occurs with stereo tests. It's beyond the capability of enthusiasts to set up turntables that position stereo pairs in the identical spots in the room, which is the only fair way to do it. Turning one pair off, getting up and moving the other pair into the same spot and repeating the test simply doesn't cut it.

I do know that the standards at the NRC were quite rigorous and I suspect that Dr. Floyd Toole could mount a strong defense.

I would point out these tests involved, typically, four or five panel members, and tests were repeated in 25-minute sessions over the course of, usually, three days (if we were reviewing four different speakers)

Alan, an experiment can involve extremely tight protocols as you describe those which had occurred at the NRC, but a panel of only 5 people is the weak point of the experiment. Granted it is difficult to setup large scale experiments that can encompass hundreds of samples (e.g. hundreds of people), but any statistician could more than easily refute data based on low numbers regardless of protocol. I have at least two very strong publications in my files that demonstrate how effectively a low sample set can derive incorrect conclusions based on accurate results. This is not to say that the past audio science is useless but each conclusion drawn from it may be interpreted slightly differently or viewed with exceptions depending on whether those papers have proven their point convincingly enough to the reader.
Inevitably there are other ways to look at the data and re-examine some approaches that may show something different or tack on a new approach. Every scientist knows there is always something further to find from one's datasets. There is never any end to looking them over and over again. This was my point in regards to the grouping idea that may occur. Of those 5 panel reviewers, did 2 of them consistently provide data that showed they prefer heavier bass {enter other descriptor here} speakers?
Would a sample size of 100 people have shown 20% of that set tend to rate a certain speaker character higher than others? Obviously the term 'neutral' is the one that has come forth time and again to describe the overall sound these panels prefer, but there are still variations in a microscale within that sampling. Were any within group ranks viewed? Again the conclusions might not be evident considering the small numbers. Still, as an example: person A may have preferred the Paradigm Studio series consistently in their rankings while person B in the same 5 person panel may have consistently picked out the Axiom M60 yet both speakers sound so very similar, very neutral. The point is, although both speakers are "comparably similar", there is still a preference being made on an individual basis and cost aside, many audiophiles will still go with their preference.
Perhaps with a sample size of 50 people, there would have been a more clear pattern showing a grouping choosing one speaker over another, even if they were "comparably similar". With a large enough sample size, what was previously statistically insignificant, now may become statistically significant as the numbers maintain a greater power of detection when evaluating the hypothesis.

All the listening tests i do at home are in mono mode with single speakers. The last switching test i did sometime last year (Tannoys and Axioms) showed some pictures of that setup. The variables i worked towards keeping the same were seating position rather than multiple spots around the room, speaker SPL and speaker position (but both were swapped left and right and placed in exact locations using floor markers). It is true i don't have a fancy turning table to setup the stereo switching sessions so my home enthusiast attempts are limited in that regard. Perhaps if future NRC research (or Axiom's) decides to expand to include a larger survey, home enthusiasts such as myself could participate. I would be interested in the process.
That being said, i know the A/B switching tests i've done at home are honest and as objective as i can make them. The first Tannoy vs Axiom tests i did had me completely fooled. I could not tell which speaker was playing (12' sitting distance), did not know which one was on the left or right anyway and simply starting listening for the usual characters with the test songs. The speakers were both good, excellent sound reprodcution but they were different. I was certain that the speaker i kept picking out as my preference was the M60. It was only after i asked my 'tech' and followed the connection wires myself did i believe the opposite.
No one can convince me that my A/B setup introduces an easy-to-pick bias after that test although i will not likely be able to convince anyone otherwise unless they tried it themselves. After all, i'm only a sample of one.

_________________________"Those who preach the myths of audio are ignorant of truth."

You boys are hilarious.
That trademark sounds like it would make a neat book for the non-fiction, abstract scribblings section.

Just to note, if you read carefully i never refuted any science but rather asked more questions about it. I've been doing way too many peer reviews lately. I'm stuck in edit and critique mode.
Peter, pull up those pants, pull up that zipper and for the love of St. Patrick, quit showing off!!

_________________________"Those who preach the myths of audio are ignorant of truth."