10 November, 2007, 12:59:47 PM

I recently did some ABX tests of music encoded in FLAC levels 0-8.

To my surprise I could tell the difference more often than not.

Yet I couldn't help but wonder, what does this really prove?

It seems that if an audible difference exists between two tracks, that difference wouldn't necessarily indicate anything as to which track is of higher quality, but rather, that those two tracks are merely different from one another...

well, if you should have proven to listen a difference between these files, then you might suffer at strong imagination or your abx test was not blind, but nevertheless don#t conclude from your personal test to ABX tests.it depends always on the specific setup, what an abx test can prove.

An ABX test tells you how likely it is that you can tell the difference between two tracks. If by "more often than not" you mean that you got it right more than 50% of the time, that doesn't prove anything. If you get significantly above 50% over a large number of trials (more than 20) and it's repeatable then that might imply that there is something wrong with the FLAC encoder. Do you have the ABX log saved to post so I can see what's happened?

And yes, you're right, ABX can never tell you which is "better", only that there is a difference. So it's only useful for tuning something for transparency or to test for problems with encoding, not really many other things.

No, you're not.If there is a difference, and the only source of differences is an encoding (because we encoded the same track), than it must be the encoding.And we "know" (this isn't neceseraily true) that the encodings have a (mathematical) scale of their quality.Therefore the lower quality one must be "worse".

No, you're not.If there is a difference, and the only source of differences is an encoding (because we encoded the same track), than it must be the encoding.And we "know" (this isn't neceseraily true) that the encodings have a (mathematical) scale of their quality.Therefore the lower quality one must be "worse".

No?

If you had a poorly tuned 320kbps MP3 and a well tuned VBR MP3 from a different encoder, and you ABXed the two then if you could tell a difference there's not necessarily a way of telling which is better without referencing the original uncompressed version.

There are other tests to rank differences. The point of ABX is to determine if you can distinguish a difference.

This is more along the lines of what I was asking. It seems that ABX only helps you determine whether there is or is not a difference between two tracks encoded from the same source, and not necessarily whether those differences make one track of higher quality than another...

An ABX test cannot prove that there isn't a difference. It only has the ability to demonstrate that the specific person taking the test was able to determine that there was a difference. These are not the same thing.

Last Edit: 10 November, 2007, 01:44:30 PM by greynol

Is 24-bit/192kHz good enough for your lo-fi vinyl, or do you need 32/384?

I concur with the first response you were given, btw. Telling us that you can tell an audible difference between flac compression levels brings your methodology into serious question.

To clarify, I didn't mean to say I could tell the difference between FLAC 0-8 more often than not...

Rather, that of some of the tracks that I tested, which I had previously encoded from the same source (using the same FLAC 1.2.1 codec), difference to my own ear was indicated to me (and only me) by a probability that was statistically significant.

HOWEVER, I did NOT mean to start some sort of argument as to the means by which I performed the tests or for that matter, anything to do with my testing at all (I probably shouldn't have even mentioned those tests to begin with, it was just for context).

I was just trying to frame a much more general question, pertaining to the difference between two tracks sounding different as opposed to one track sounding better than another...

It seems that if an audible difference exists between two tracks, that difference wouldn't necessarily indicate anything as to which track is of higher quality, but rather, that those two tracks are merely different from one another...

This is absolutely true.

But you shouldn't ever hear a difference between two lossless files, because regardless of the type or amount of compression, they are always decoded (bit identical to original) at playback. So if a difference IS audible, it must be due to improper test methodology or buggy software or somesuch reason...

difference to my own ear was indicated to me (and only me) by a probability that was statistically significant.

Wait, so which is it?

The ABX tests which I performed were on tracks which were contained within the set: FLAC levels 0-8.

If I performed an ABX test, the files tested were limited to FLAC 1,2,3,4,5,6,7,8.

That does NOT indicate that I tested ALL levels 0-8.

OF THE TRACKS WHICH I DID TEST, I was able to discern a difference more often than not.

Statistically significant is that which is unlikely to occur by chance.

A statistically significant difference in this case, merely implies that statistical evidence indicates that a discernible difference has a greater probability of existing than it does a probability of not existing.

Whether that difference is due to errors in encoding, playback, or whatever, is entirely irrelevant.

In the laboratory, and in most audiophile living rooms, people can be psyched into discerning a difference where none exists. The only thing that accounts for that is belief or expectation creating abnormal conditions inside the listener's head. A statistically significant score, even one approaching certainty, does not make any actual difference come into existence.

FLAC encoding/decoding can be shown to produce zero differences, and often has been so shown, both mathematically and empirically. Proper ABX tests disallow any possibility of expectation or belief effecting test scores. With no data differences, and no opportunity for perceptual bias, any indicated differences, no matter what the test score, indicates something wrong, something not normal in the data or the process.

I propose that this guy submit samples and an ABX log indicating that he can distinguish a difference or the thread be closed.

audioslut512:It is universally accepted that foobar2000 is able to perform proper ABX tests. Make sure you use it and spare us the fluffy language about statistical evidence which is ringing quite hollow.

You can't DO that! FLAC is a lossless format - That means the source are NOT degraded, no matter which level of compression you choose. The only differences the levels of a lossless encoder produce, is encoding and decoding speeds.

If you try to ABX these, you are actually comparing the same stuff, YES that's why you can't distinguish them.

You will usually use ABX tests to determine if you are able to distinguish a lossy encoded file against the original. It's very useful to make people shut up when they believe they can tell high bitrate encoded lossy files from the source

OF THE TRACKS WHICH I DID TEST, I was able to discern a difference more often than not.

Statistically significant is that which is unlikely to occur by chance.

May I ask you the score? « More often than not » is not very precise. I can (and everybody should achive the same performance) also reach some 55% or 60% good trials while ABXing lossless from time to time. It also work with WAV vs WAV ABX test and even without listening to anything. You just need to be a bit lucky during the first trials and limit them to ~20...30, not more. The overall score will naturally tend to 50% on long term but if you only keep the first ones you may sometime get a probablity higher than 50% and maybe the illusion of success.

That's why I'd like to have some precisions: what are the score of all tests (not only the most favorable session)? Did you fix the number of trials before starting?

I still don't really understand what you did. Could you just post the logs?

I would if I had them. All I did was ABX Radiohead's Kid A, encoded in FLAC level 4 and level 8, both directly from the CD. I matched the correct tracks to one another 9 of the 10 times I ran it. That's all. Maybe I screwed something up while encoding them that caused one to sound different from the other. Ultimately I had no idea which one was which, I just managed to notice the differences and match the tracks accordingly. Hence my post...