Double-blind listening tests were performed on six popular circumaural headphones to study the relationship between their perceived sound quality and their acoustical performance. In terms of overall sound quality, the most preferred headphones were perceived to have the most neutral spectral balance with the lowest coloration. When measured on an acoustic coupler, the most preferred headphones produced the smoothest and flattest amplitude response, a response that deviates from the current IEC recommended diffuse-field calibration. The results provide further evidence that the IEC 60268-7 headphone calibration is not optimal for achieving the best sound quality.

------------------------

I'm hoping this will shake up everything we know about headphones and start the path towards better headphone science.

To clarify, it's currently assumed that an equalization curve (IEC 60268-7 or others) should be applied to a headphone's frequency response before interpretation. Innerfidelity.com's measurements are an example. This paper suggests otherwise.

So you can use whoever you want to listen, mostly. And for higher statistical power, greater repeatability, it's probably better to use trained listeners, as suggested above. Then again, that apparently doesn't seem to matter much, so it's not a big deal. If the results are statistically significant, then it should be okay regardless.

The point here seems to be that the standard (well, IEC standard) diffuse-field equalization for headphones is not what people prefer, so another definition or equalization of "flat" is better than that one.

It's interesting for a few reasons. Important audio measurements were derived by listening. Not the other way around yet it's now measurements that trump listening for some odd reason. This is an example of a past derived measurement not being optimum. This measurement is a bit more complex than distortion threshold for instsnce but that will also vary by frequency. There's always something not covered when music is playing and all the static measurements and others not covered are playing at the same time. Your ear is great way to put it all together.

So you can use whoever you want to listen, mostly. And for higher statistical power, greater repeatability, it's probably better to use trained listeners, as suggested above. Then again, that apparently doesn't seem to matter much, so it's not a big deal. If the results are statistically significant, then it should be okay regardless.

I agree with this. There's always exceptions but I do believe a majority of folks will get it right if it's presented to them in a meaningful fashion.

When the authors present some kind of new or non-standard metric and plot it**, you need to know the context and what it means—information that is in the paper but not on that graph. What's a 25 on that scale mean, anyway? You see this kind of thing all the time in research papers.

**that's not to even mention the number of papers with new metrics that are pretty much garbage, don't mean anything significant.

What's there is the percentage of the F-statistic for that group relative to the F-statistic for the trained group (so by definition, trained group is identical to itself so it gets 100). For those who need a quick primer... When doing the ANalysis Of VAriance (ANOVA), a higher F-statistic value means that the result is more statistically significant. The statistic value is a ratio between, very roughly speaking, (1) how far the responses deviate from the mean due to the treatment effect (e.g. ratings of sound quality when you're switching between one speaker and another) to (2) how far the responses deviate from the mean due to just the randomness of the results and chance. Suffice to say, that difference is huge, but a previous intuitive interpretation of 30 vs. 100 on that graph may not really relate to the actual data being shown.

More or less, what it's saying is that peoples' given responses to the same stimulus are inconsistent; with training, they become significantly more consistent.