After encoding the files were decoded using the same proggie that encoded them and burned in random order to an audio CD. The CD contained the original uncompressed wave and decoded FLAC for reference. The track order and relevant info was kept on a separate piece of paper.

We listened to Q0, 64kbps, 128kbps Mp3 and the Mp3 Vbr medium, but after seconds into the tracks we determined it was a waste of time. The quality degradation was too obvious and in our opinion not worth using for high quality audio.So, this resulted in 9 tracks to compare (Wave and FLAC not counted).

The Testing StagesThe test was taken in 3 stages. In the first stage the uncompressed wave was tested against the original CD. During the second stage the uncompressed wave was used as a reference and played on request of the listener to do a back to back comparison of a compressed track. In this stage the listener was informed that the reference track was playing. Never did the listener know which type of compression was used on the test track.

The reason for this is simple. We had too many tracks to compare and after switching several times the listener becomes fatigued. Even the best trained ear will become confused. Your brain will start to ‘correct’ what you hear, much like with vision. Put on glasses that turn everything upside down and after a while your brain will correct the image and turn it back. Similar things happen with hearing. Therefore, a reference point is needed for the listener during the second stage.During the third stage, the best Ogg and Mp3 track are played with the reference track. This time however the listener will not know which track is what. He is left completely in the dark to determine which track sounds best.

During every stage, the display of the CD player is not visible to the listener, and the player is controlled by a person not taking part in the test.

Also, the tests were done for each listener separately and have not been able to influence each other.

Stage 1, comparing the uncompressed wave to the original CD

I am glad I did not skip this step because I took for granted that the uncompressed wave and FLAC track on my CD-R were identical to the original. It was almost impossible to believe and a great disappointment for me, but confirmed by all listeners during several blind tests; the uncompressed wave and the FLAC track on the CD-R did NOT sound identical to the original CD. This was a great setback as I hadn’t expected this. I attempted to do binary comparisons of the tracks but I couldn’t figure out what to do as we didn’t have the tools or the time to get into this. I had three guys breathing in my neck to listen to the music and not watch me operate a mouse and keyboard. So we went back to the sofa to do another test to determine if we could continue with the reference track as it was. One of the guys reminded us that several weeks before the laser was replaced in the CD player. Knowing a thing or two about CD players, he explained that it could be possible that the laser was not focused correctly and therefore was not able to read the CD-R correctly. The players’ error correction would be unable to keep up with the errors resulting in the audible difference. We also noted that if the CD-R was playing for a couple of minutes, we could hear static artifacts like it was skipping. The player was ripped apart and we experimented with the focus of the laser. We realize this can’t be done optimally without the proper tools, but after some experimenting we actually found a good setting to continue with. The static was gone and the listeners agreed that the reference track sounded identical to the original CD so we could continue the test.

The Second Stage

What I found most convincing about this test is that throughout this stage, the results were very consistent among the listeners.

7 Mp3 Vbr Extreme Near identical to the original, Less Dynamic, slightly sticky, different but hard to define, less open and vague positioning high sounds artificial

8 Mp3 Cbr 192 Missing detail in low and high, Image is flat, sticky, very messy, midrange empty Considered similar to 1 and 6

9 Mp3 Cbr 320 Near identical to original Slightly uncontrolled, compressed, Considered less than 7

Mp3 track 7 and Ogg track 5 were selected for the blind test in Stage Three.

The Third Stage

This didn’t take long. The Mp3 track was consistently identified and the Ogg track was considered identical to the original.

Because of the interesting results with the Ogg tracks, the listeners agreed to test Ogg tracks again against the original reference, in a blind test. Q8 and Q10 were considered identical to the original and interestingly, Q4 was considered better than Q6.

What is also interesting to note is that Mp3 has been consistently considered less dynamic as Ogg, and sounds messy and uncontrolled. High has consistently been identified as ‘artificial’.

Because of the interesting results with the Ogg tracks, the listeners agreed to test Ogg tracks again against the original reference, in a blind test. Q8 and Q10 were considered identical to the original and interestingly, Q4 was considered better than Q6.

This is one reason that legitimate the need for ABX tests (and not only a single blind comparison). It's a bit hard to believe that the listeners really heard a difference at -q6 if they totally missed the existing ones at -q4. Probability to find the good one was 50%.

Did you read Pio2001's explanation on ABX tests? It's very interesting and it should help to understand why the procedure used in this test can't lead to valid conclusions. And some conclusions are indeed questionable. The encoding process usually don't lower the dynamic (not at high bitrate at least), and if loss in high frequencies is audible, I never heard serious reports about issues in low frequencies (comment on file#8, CBR@192: "missing detail in low and high").

This is one reason that legitimate the need for ABX tests (and not only a single blind comparison). It's a bit hard to believe that the listeners really heard a difference at -q6 if they totally missed the existing ones at -q4. Probability to find the good one was 50%.

Did you read Pio2001's explanation on ABX tests? It's very interesting and it should help to understand why the procedure used in this test can't lead to valid conclusions. And some conclusions are indeed questionable. The encoding process usually don't lower the dynamic (not at high bitrate at least), and if loss in high frequencies is audible, I never heard serious reports about issues in low frequencies (comment on file#8, CBR@192: "missing detail in low and high").

I hadn't read pio's post before, it's interesting indeed. We did a similar thing however but nos as scientific, a matrix sheet was used to keep track of the results. In case of Q4, we switched back to that track several times from the original and from the Q6/Q8 track to cross-test the results.

On 'dynamic', its hard to translate, and different listeners will use different descriptions for the same issue. In this case, dynamics could be described as a combination of the liveliness and positioning. Similary, one of the listeners kept calling the high in the mp3's 'artificial'. It's not my choice of wording, other listeners noted the high was 'sticky'. It's hard to say but maybe they meant the same thing. It always remains subject to interpretation.

I realize this makes testing questionable, but this was of course never meant to be a 'definitive' or 'exhaustive' test. We are simply a bunch of audiophiles who wanted to do these tests. They should be questioned as we are not an authority on the matter. I guess every one should do his own testing and if you like, keep these results in mind.

On 'dynamic', its hard to translate, and different listeners will use different descriptions for the same issue. In this case, dynamics could be described as a combination of the liveliness and positioning. Similary, one of the listeners kept calling the high in the mp3's 'artificial'. It's not my choice of wording, other listeners noted the high was 'sticky'. It's hard to say but maybe they meant the same thing. It always remains subject to interpretation.

I agree. Hearing a difference is something; analysing this difference is a completely different story: it's rarely obvious. I already used the same concepts to describe some problems I was unable to describe more precisely (and my english doesn't help). What really matters isn't really the words describing the difference but the reality of this difference. That's where ABX scores are really needed.

QUOTE

I realize this makes testing questionable, but this was of course never meant to be a 'definitive' or 'exhaustive' test. We are simply a bunch of audiophiles who wanted to do these tests. They should be questioned as we are not an authority on the matter. I guess every one should do his own testing and if you like, keep these results in mind.

Of course And I thank you for sharing your results! Both tests and publishing are time consuming - and even if your test should be performed a second time, this effort still counts as a contribution to the community.

It is rather surprising that a "been a lurker for some time" who is interested in testing of Ogg Vorbis audio codec is not aware of aoTuV.

Also, a decoded FLAC file is the same as the original wave file. That can be easily verified by bit comparing the tracks. Two identical wave files cannot sound different. It is completely unnecessary to include a decompressed FLAC. A wave file that is burned as an audio CD vs. Original CD Audio can be debated if a faulty device is involved like in your case.

The "test" report as described is nonsense. The test cannot be reproduced and verified by following your description even the same equipment would be available. It lacks proper notes about each listening stage.

The fact that you state the FLAC and WAVE burnt to CDR were distinguishable from the original CD surely invalidates all the further test you have done?? They will be the exactly the same unless you did something wrong or the hardware is ****!? If this was the case you shouldn't have been carried on any of the tests, and looked to correct your error or replace the hardware with something better/more reliable/fixed.

One: double blind test. This test of yours is not even interesting except to see how much time can be wasted to develop invalid conclusions. Quite a bit it seems.

Second, I have a feeling you can also hear differences in speaker wire, but not in a double blind test of speaker wire. Maggies are nice, for sure. You have an interesting system, but without a valid testing protocol it seems you are just reinforcing your prejudices.

Let us know how you do when and independant party puts you through your paces in a true double blind test.

Few things I have learned in my life are to never trust anyone who has expensive equipment and golden ears, and here I learned that I don't have a pair of them, too. Do ABX test, post the results, please. This testing, where you could play original on demand is worth nothing, I'm afraid.OF COURSE you can hear difference then

the uncompressed wave and the FLAC track on the CD-R did NOT sound identical to the original CD.

I guess it could be interpreted in two ways, but the FLAC and WAVE (which sounded the same) did not sound identical to the original CD. I think I gave a valid description of why that was. I also believe we solved the issue. So, it's NOT that the FLAC and the WAVE on the CD-R sounded different from each other.

Having been a lurker it may surprise you that I dont know what aotuv is, but I did know I was up for some harsh comments. It's good, and I don't take offense. I only think its good that every one is critical and make up his own mind. And if you think the test is nonsense, move on and read something else

QUOTE

This testing, where you could play original on demand is worth nothing, I'm afraid.OF COURSE you can hear difference then

uhm, don't forget the third stage. We understand that of course, but you can't ignore listening fatique either. Hence the third stage.

Maybe, one day we will do ABX testing. But I'm afraid its not going to make much difference because you can't satisfy everyone. There is always gonna be people who will find fault in what you are doing. I know this for a fact and I didnt do the tests to convince YOU. I know what I heard and how we cross-tested and I trust that we tested sufficiently.

Maybe, one day we will do ABX testing. But I'm afraid its not going to make much difference because you can't satisfy everyone. There is always gonna be people who will find fault in what you are doing. I know this for a fact and I didnt do the tests to convince YOU. I know what I heard and how we cross-tested and I trust that we tested sufficiently.

Firstly, thanks for doing a test, it's further than most people get. Unfortunately, your flawed test protocol makes the results of your test suspect. In particular, your methodology opens your results up to the ravages of confirmation bias. I would really encourage you to do another test with a more solid protocol and post your results.

I know what I heard and how we cross-tested and I trust that we tested sufficiently.

As soon as you posted here, however, you became subject to TOS 8:

QUOTE

All members that put forth a statement concerning subjective sound quality, must -- to the best of their ability -- provide objective support for their claims. Acceptable means of support are double blind listening tests (ABX or ABC/HR) demonstrating that the member can discern a difference perceptually, together with a test sample to allow others to reproduce their findings. Graphs, non-blind listening tests, waveform difference comparisons, and so on, are not acceptable means of providing support.

This rule is the very core of Hydrogenaudio, so it is very important that you follow it.

Just a minor thing. You are aware that a stereo sound is technically always 1D (left-right) but your mind can be tricked into thinking something is 2D (left-right+near-far = depth)? A lack of depth means it's 1D because 2D introduces depth and 3D (left-right+near-far+above-below) introduces height.

All members that put forth a statement concerning subjective sound quality, must -- to the best of their ability -- provide objective support for their claims. Acceptable means of support are double blind listening tests (ABX or ABC/HR) demonstrating that the member can discern a difference perceptually, together with a test sample to allow others to reproduce their findings. Graphs, non-blind listening tests, waveform difference comparisons, and so on, are not acceptable means of providing support.

This rule is the very core of Hydrogenaudio, so it is very important that you follow it.

Hehe ... fair enough. Can't really argue with that.

I will try and do the test again because first and foremost I wanted to get the issue clear for myself. If indeed our testing was flawed we are fooling ourself.

Having said that, don't hold your breath because we live 2,5 hours apart from each other and have full time jobs. In the mean time I will be reading up on how to actually take ABX testing.

During the second stage the uncompressed wave was used as a reference and played on request of the listener to do a back to back comparison of a compressed track. In this stage the listener was informed that the reference track was playing. Never did the listener know which type of compression was used on the test track.

For me, this sounds enough like blind-test ( ABX, ABC-HR? ) we can argue that it is not double-blind, but definitely *NOT* something to throw TOS#8 to it.

Oh.. and for those that just stopped at "the uncompressed wave and the FLAC track on the CD-R did NOT sound identical to the original CD" ... Well.. go do something else, if you can't bother reading.

' date='Oct 2 2006, 12:19' post='437132']Oh.. and for those that just stopped at "the uncompressed wave and the FLAC track on the CD-R did NOT sound identical to the original CD" ... Well.. go do something else, if you can't bother reading.

In ABC-HR you have a choice of three signals, R, A and B. R is the reference and you know it is the reference. For A and B, one of them is the reference and the other the processed one but you don't know which is which.

From what I understood, the listener knew at all times if he was listening to the reference or the compressed one, which leads to placebo effect.

Oh.. and for those that just stopped at "the uncompressed wave and the FLAC track on the CD-R did NOT sound identical to the original CD" ... Well.. go do something else, if you can't bother reading.

In ABC-HR you have a choice of three signals, R, A and B. R is the reference and you know it is the reference. For A and B, one of them is the reference and the other the processed one but you don't know which is which.

From what I understood, the listener knew at all times if he was listening to the reference or the compressed one, which leads to placebo effect.

Well, what Jaz pointed out, is that that sentence there was part of an anecdote by OP, which told about a faulty CD player. OP went on to describe what they do later on, and finally all the testers agree that the uncompressed wave and FLAC sounds exactly the same with the original CD.

Moral of the story Jaz's posting: Please read any post to the end, so you can be sure that the OP is not doing the unforgivable sin

In ABC-HR you have a choice of three signals, R, A and B. R is the reference and you know it is the reference. For A and B, one of them is the reference and the other the processed one but you don't know which is which.

Mmmm.. well.. now that i read it.. it might not be ABC-HR, since the reference was always indicated, so the comparison was a direct AB. yet, the described procedure is similar, and they didn't know which codec/setting was which, ABC-HR is more than that, anyway.

Moral of the story Jaz's posting: Please read any post to the end, so you can be sure that the OP is not doing the unforgivable sin

Well, a faulty cd player "fixed" to a level of transparancy agreed by the testers, but possibly/probibly still faulty!

I have dealt with a few cd players that have had a weakinging lense, out of focus, and they generally present seek/skipping problems, but at I have never noticed an audible degredation in any other way.

The fact they "fixed" it without testing iit properly for BER etc. could have meant it was still faulty but out of the listeners perception.

This could have had a detrimental affect on the other lossy tests rather that may have accentuated the originally percieved effect.