It may be interesting to observe the performance's evolution of various formats according to the collective tests already performed. A rigorous comparison wouldn't make sense (samples and listeners aren't the same), but some tendency may appear.

Summer 2003:

Spring 2004:

Beginning 2006:

• AAC (iTunes): it was always excellent, but not systematically on top. After the second test, it appeared that iTunes was definitely inferior to the young aoTuV encoder for the group of listeners. Now iTunes is back on top. iTunes really progressed between the 2nd and the 3rd listening test.

• MP3 (LAME): by far the most noticeable progress. On the first test, MP3 (3.90.3 ABR) appeared as definitely outdated. Well, it's MP3 and it can't compete with other formats. Like many people on this board, I also thought that. On the second test, LAME (3.95 VBR) made the big jump and sounded competitive with iTunes but worse than aoTuV. Now LAME (3.97 VBR) sound competitive with all other competitors, not only for quality but also for speed (I recall that --vbr-new was tested for the first time). LAME highly progressed between the 1st and the 2nd test, and continued to fill the gap with other competitors between the 2nd and the 3rd.

• Vorbis: half-disappointing on 2003 with the official and stagnant encoder, this format appeared as a big champion once resurrected by Aoyumi, but also by QuantumKnot and Nyaochi. The beginning of the year 2004 was very fertile for the format. Now Vorbis is still on top. Vorbis shows the biggest progression after the first test.

• WMAPro: this format wasn't tested once (2nd test). On 2003 WMAPro (9.0 ABR) was on top, with the same mark as Vorbis and a slightly lower one than iTunes. On 2006, 9.1 VBR appears as fully competitive with both iTunes and aoTuV - and of course also LAME MP3. WMAPro silentely progressed with all other competitors.

It's usually very hard to detect audible improvement when a new encoder is released: they are often very subtle thus not really enjoying nor even convincing. But version after version, little progress after little progress, the difference becomes more obvious. The listening tests evolution is apparently illustrating this very well. Progress are not only made at ultra-low bitrate but also occurs at higher one. At ~130 kbps quality has reached a new level we never heard in the past. I must also confess that I'm amazed by the perceived quality of this encoders. They're reaching transparency on more and more material (count the total number of 5.0 in my full test... ) and for more and more people. When HA.org was found in 2001, such quality at this bitrate was only a dream; four years later it becomes our reality. I take my hat off to all developers

Thanks for working on the test, Sebastian.As for me, I can hardly ABX most modern codecs at this bitrate from the original. And can not ABX one from another.So it really seems that it is time to lower a bitrate a bit in a such public tests

By the way... People who did not enter their (nick)name in ABC/HR (like sehested) are going to have anonXX in front of their result.Please don't ask me who has which number since I don't know it by heart.

One more thing - IIRC, a tester entered his name for all results except one. The respective result is also marked as anonXX since the tester might've had a reason for not disclosing his name for that single result.

btw can you create a final zoomed in plot without the anchor and without the nero results plz, its nicer to point people to (most newbies will propably not read or understand the whole nero explanation)

btw can you create a final zoomed in plot without the anchor and without the nero results plz, its nicer to point people to (most newbies will propably not read or understand the whole nero explanation)

Wouldn't that mislead beginners even more since the point that none of the tested encoders is proved to be better than another?

(I don't mind if you do, just wanted to point out that there might be a risc...)

btw can you create a final zoomed in plot without the anchor and without the nero results plz, its nicer to point people to (most newbies will propably not read or understand the whole nero explanation)

The final zoomed plot does not contain the anchor and I am not going to remove Nero since I see no point in doing so. If people want to read about the Nero problem, they can follow the link from the plot.

your statement doesnt make any sense for me. what does the fact that the encoders are on par have to do with that the nero results are not really comparable

sebastian wrote

QUOTE

Because of the mentioned problems (unfairness, no real-life relevance...) and after discussing the issue with Francis, Roberto Amorim (rjamorim on Hydrogenaudio Forums) and Darryl Miyaguchi (ff123 on Hydrogenaudio Forums) thoroughly, I decided, against Ivan's and Juha's suggestion, to exclude Nero from the test.

because of this exclusion i think there should be also a final plot provided that doesnt mention nero

edit:

QUOTE (Sebastian Mares @ Jan 15 2006, 04:12 PM)

QUOTE (bond @ Jan 15 2006, 02:52 PM)

thx again for this interesting test, sebastian!

btw can you create a final zoomed in plot without the anchor and without the nero results plz, its nicer to point people to (most newbies will propably not read or understand the whole nero explanation)

The final zoomed plot does not contain the anchor and I am not going to remove Nero since I see no point in doing so. If people want to read about the Nero problem, they can follow the link from the plot.

so the exclusion of nero from the test is not reason enough to provide a final plot without the excluded nero?

your statement doesnt make any sense for me. what does the fact that the encoders are on par have to do with that the nero results are not really comparable

I believe that Nero's overall result (and only overall's one) is purely indicative. People have participate to this test, and it would be frustrating to not see any indication about the quality of the disqualified encoder. Of course, nobody should claim that Nero is as good as encoder x or y according to this test: the tested samples are giving a wrong and probably overrated image of the real performances of Nero Digital AAC. That's why results are put on red, outside from the main area, and without any confidence interval bar.

your statement doesnt make any sense for me. what does the fact that the encoders are on par have to do with that the nero results are not really comparable

I believe that Nero's overall result (and only overall's one) is purely indicative. People have participate to this test, and it would be frustrating to not see any indication about the quality of the disqualified encoder. Of course, nobody should claim that Nero is as good as encoder x or y according to this test: the tested samples are giving a wrong and probably overrated image of the real performances of Nero Digital AAC. That's why results are put on red, outside from the main area, and without any confidence interval bar.

yeah, thats why i meant there should be both, a final plot with nero (as currently available) and one without, that can be thrown on newbies without making them think nero performed as its shown on the plot (even if its red and with the link to the explanation)

your statement doesnt make any sense for me. what does the fact that the encoders are on par have to do with that the nero results are not really comparable

I believe that Nero's overall result (and only overall's one) is purely indicative. People have participate to this test, and it would be frustrating to not see any indication about the quality of the disqualified encoder. Of course, nobody should claim that Nero is as good as encoder x or y according to this test: the tested samples are giving a wrong and probably overrated image of the real performances of Nero Digital AAC. That's why results are put on red, outside from the main area, and without any confidence interval bar.

yeah, thats why i meant there should be both, a final plot with nero (as currently available) and one without, that can be thrown on newbies without making them think nero performed as its shown on the plot (even if its red and with the link to the explanation)

your statement doesnt make any sense for me. what does the fact that the encoders are on par have to do with that the nero results are not really comparable

I believe that Nero's overall result (and only overall's one) is purely indicative. People have participate to this test, and it would be frustrating to not see any indication about the quality of the disqualified encoder. Of course, nobody should claim that Nero is as good as encoder x or y according to this test: the tested samples are giving a wrong and probably overrated image of the real performances of Nero Digital AAC. That's why results are put on red, outside from the main area, and without any confidence interval bar.

yeah, thats why i meant there should be both, a final plot with nero (as currently available) and one without, that can be thrown on newbies without making them think nero performed as its shown on the plot (even if its red and with the link to the explanation)

Happy now?

Damn, you were faster!

lol yeah happy now

tough i would be even more happy if it would be shown on the results page too