I analyzed my results and the ranking of the encoders is different for each sample. So indeed there's no undisputed winner here. Though I have Helix at the first place in some samples. I would have never expected that! Nice surprise Another surprise to me is that, on some samples, I found LAME 3.97 worse than Fhg or iTunes.And finally, I don't have any results where LAME 3.97 is better than 3.98.2.

QUOTE (DigitalDictator @ Nov 24 2008, 23:56)

This is indeed surprising. I'm sure I've seen smaller, recent, ABX-tests where Lame has outperformed Helix quite clearly. I think Guruboolez and maybe also Halb27 have done a couple, but I might be mistaken.

Last time I've seen Francis doing an MP3 listening evaluation with LAME and Helix is on this post.

QUOTE (Squeller @ Nov 25 2008, 08:46)

Is this claim correct? There has been no improvement on the Helix encoder since after 2005?

It's correct. Here's the latest compile (v5.1 2005.08.09) used in this test.

Zoomed view is formally correct, but has a tendency to have an incorrect emotional impact on the reader as it emphasizes differences. In its extreme form it can give the picture of extreme differences where in fact differences are not worth mentioning.In case the confidence interval were not given in this test's zoomed view, only the averages, we would have this extreme form here.

Information at a glance, that's what graphs are for. They easily give a wrong impression if they're are not 'ground-based' but have a basis high in the air just a small step below the lowest results.

Zoomed view is formally correct, but has a tendency to have an incorrect emotional impact on the reader as it emphasizes differences. In its extreme form it can give the picture of extreme differences where in fact differences are not worth mentioning.In case the confidence interval were not given in this test's zoomed view, only the averages, we would have this extreme form here.

Information at a glance, that's what graphs are for. They easily give a wrong impression if they're are not 'ground-based' but have a basis high in the air just a small step below the lowest results.

That's why I would prefer if there was no 'zoomed' view.

Basically you are right, but: People with dysfunctional brains who don't find this out themselves aren't the target audience of HA I guess

Just confused. Helix worse than lame, Helix better than lame, Fraunhofer better than 3.97??I'm only waiting that someone says "lossless is lossy", then my confusion is completed.

There has always been a tendency at HA that Lame is expected to be seriously superior as compared with other encoders. And listening tests have always been taken too much of a 'proof' for this whereas they contribute experience with encoders in a pretty objective way but only within the restrictions of the samples tested and the listening abilities of the participants. It's the best we can do, but has its restrictions.

Why worry? Isn't it a good thing that all the encoders perform very well on the samples?As for Lame 3.98.2: isn't it a good thing that it scores so well? All we have known so far is that that it brings improvement over 3.97 for certain classes of problems where 3.97 had a rather weak quality. We did not have a lot of experience that there is no serious regression with 3.98 which is possible. Now we have reason to beleive that this is not the case, we can expect from 3.98 with good reason that 3.98 is a real progress.

"Test Results" are the results of all participants. "My Average" is a simple linear average of the results as I don't remember how to do other type of analysis (too long ago ). Taking out the highest and lowest result of all encoders produces a similar result as presented above. If anyone can tell me which formula to use in MS Excel to get error margin please do.

I'm really surprised an encoder that hasn't been tuned since 2005 gets these good results. I have more samples Helix doing better than Lame 3.98.2 than the other way around allthough differences are small. When doing the Test I noticed clearly 2 encoders were better than the rest and I thought they were the Lame ones

...Why worry? Isn't it a good thing that all the encoders perform very well on the samples?...

I'm worried now not because of Helix being very competitive with Lame 3.98.2 with respect to quality but because Helix encodes so much faster and that's very usefull when I encode albums from my lossless archive to take them on the road.

I wonder why Lame doesn't do better compared to Helix having 3 years more of development on its back. I just have included Helix in my foobar2000 Converters list and will play with this one in my preferred bitrange (160-220kbps).

All results are available for download already so you can calculate whatever you wish. Tukey HSD is something around 0.5 IIRC (I'm at work right now and don't have access to the exact value) so the tolerance bars are around 0.25 in each direction.

If anything the test shows samples where LAME needs improvement at 128 kbps.

I think we should analyze the results sample by sample and discuss about the severity of the found problems. It would be useful to find out if certain obvious problems with certain encoders were apparently confirmed by the majority of the testers.

In general, I found the choice of the low anchor a bit problematic. The encoder is clearly badly broken. Obviously the 0.99 alpha version is not the version that was involved when the 128 kbps MP3 = CD quality myth was created. In my experience the release version was already a lot better.

A too bad low anchor can have an adverse effect to the rating scale the testers choose to use. It can make the differences between the contenders appear to be less significant.

/mnt told me that Helix is not gapless, which is to me a serious shortcomming. Another thing is that Helix is not that robust as LAME is. But what is stunning people here is the encoding speed of a encoder it hasn't been worked on for 3 years, while latest fresh LAME is so so much slower to encode!