As Sebastian Mares have practically finished accepting test results from participants I would like to offer for all who care to take part in alternative listening test with the same codec contenders but different sound samples. Testing methodology is also a bit different and makes testing much easier because sound artifacts are clearly audible in most cases. The main task of a listener is to grade annoyance of those artifacts.

So the test is based on samples with amplified artefacts? Is that fair to compare codecs that way?

I mean, some artefacts may be more audible than others after the amplification treatment. And depending on the listener the presence of a specific amplified artefact may cause a much worse rating than the original encoded sample.

So the test is based on samples with amplified artefacts? Is that fair to compare codecs that way?

I can’t answer definitely YES or NO. The methodology is pretty new, but all preliminary tests and research show it is promising. Comparison with standard listening tests is very important in this sense.

QUOTE

I mean, some artefacts may be more audible than others after the amplification treatment. And depending on the listener the presence of a specific amplified artefact may cause a much worse rating than the original encoded sample.

Actually the methodology is more complicated than just amplifying artifacts. At least three artificial test stimuli are generated for each codec and sound sample. Final rating is computed on basis of these stimuli, graded by listeners.

Interesting to check out the comparisons at other bit rates... QDesign's mp2 encoder seems to beat LAME -V 0 --vbr-new in the 224 kbps range. But the diagram might be a bit misleading, I don't know how many people have taken the test.

Interesting to check out the comparisons at other bit rates... QDesign's mp2 encoder seems to beat LAME -V 0 --vbr-new in the 224 kbps range. But the diagram might be a bit misleading, I don't know how many people have taken the test.

Now it is definitely misleading due to insufficient results returned. All statistics will be available soon by clicking appropriate rating bar. Now some information about reliability of ratings can be derived from figures in brackets inside bars, which show in percents width of error tube for last 10 values of a rating.

I'm not surprized, if mp2 at 224kbps be better than mp3. It has to be so in theory.

"What theory would that be?"

MP3 has a much more efficient encoding structure than MPEG Layer 2. This doesn't magically change at high bitrates.

What about pre-echo then?It was often said that layer 3 isn't the most efficient encoding layer for high bitrate. An example with Frank Klemm comments on this subject, here:

QUOTE

Above 256 kbps MPEG-1 Layer 3 makes no sense. If you needsuch bitrates, the reason for this high bitrate are flaws introduced with Layer 3. Often Layer 2 performs much betterthan Layer 3 at the same bitrate. And note that Layer 2also supports 384 kbps.

For Fatboys also MPEG-1 Layer 1 performs better than Layer 2at the same bitrate.

Note the parts "256kbps", "Often" and "note that Layer 2 also supports 384 kbps.". You have to go high enough for MP2's inefficiency compared to MP3 to be balanced by MP3's limitations (maximum bitrate of 320kbps is one of them). MP3 being more efficient isn't going to help it when it's coding at 320kbps and MP2 is coding at a higher rate

There's no "theory" which says that at 224kbps layer 2 "must" be better than layer 3.

Frank neglected to tell which layer 2 encoder he was thinking of (remember Musepack is based on layer 2)

Note the parts "256kbps", "Often" and "note that Layer 2 also supports 384 kbps.". You have to go high enough for MP2's inefficiency compared to MP3 to be balanced by MP3's limitations (maximum bitrate of 320kbps is one of them). MP3 being more efficient isn't going to help it when it's coding at 320kbps and MP2 is coding at a higher rate

There's no "theory" which says that at 224kbps layer 2 "must" be better than layer 3.

Frank neglected to tell which layer 2 encoder he was thinking of (remember Musepack is based on layer 2)

I noted it. But Klemm clearly stated that above 256 kbps layer 2 is better than MP3 at the same bitrate. Is talking about MPEG layer 2, not musepack. You maybe remember than Frank talked once about releasing a MPEG layer 2 encoder using different tunings used with Musepack.

But the most important part is that Frank seems to disagree with you about "MP3 has a much more efficient encoding structure than MPEG Layer 2. This doesn't magically change at high bitrates."MP3 has known flaws. You recently mentionned one of them in a recent topic with pre-echo. MP3 can't solves this. MP2 is better:http://www.personal.uni-jena.de/~pfk/mpp/timeres.html

As far as I recall Klemm actually considered to move mpc to a transform codec (like mp3) exactly for the reason Garf stated... He seemed to think that properly tuned transform codec should outperform current mpc. I agreed though that that oppinion is not reflected in the quote you found.

If you spend more bits on the right coefficients, the time smearing gets less. This is true even if the time resolution is low. Since MP3 is more efficient, it can afford to spend more bits to do this (remember GTune Vorbis?). That's why it's not so clear-cut.

On the page you link though, there's this:

QUOTE

To my mind only MPEG-4 AAC is capable to eliminates all disadvantages of the additional frequency resolution. The result is transparent coding at data rates around 120...130 kbps (instead of 170...180 kbps as MPEGplus). But a

* high quality MPEG-4 AAC Encoder is much much more difficult to program and to tune than a MPEGplus encoder

I agree strongly with this (it's also true for MP3 vs MP2) and it points that the "theorethical" advantages any format may have can be problematic since they must be used fully and correctly. We have not reached this point with MP3 and it will take much longer with AAC.

That's why I also disagree with the statement that format X must be better than format Y. The implementation is the limiting factor.

That's why I also disagree with the statement that format X must be better than format Y. The implementation is the limiting factor.

I also agree with it, of course It's just that Frank Klemm comments are making sense. It's like He-AAC or Parametric Stereo: it outperforms LC-AAC or joint stereo for efficiency at low bitrate. But at higher ones, these tools are leading to artifacts or distortions you won't get with less efficient tools. This sudden change has nothing to do with magic. A sprinter is always a poor 10.000 meters runner.There are several samples which may be transparent with layer 2 at high bitrate, even if the mp2 encoder is not intensively tuned - or may at least be less distorted than a very high quality implementation of MP3 such LAME. I can ABX castanets at 640 kbps with LAME freeformat, but I'll probably fail with most other formats at half this bitrate.Implementation has a big part in quality, but some inherent flaw in design could definitely handicap a format. Klemm often criticized layer 3 (and not transform encoders by themselves) design in the past.

Actually, I woudn't exchange one LAME encoded file for 10 mp2 (toolame...) files at 224 kbps. There's maybe less pre-echo (it's not even sure), but there are several other form of distortions.

The problem of PS and SBR is that they are parametric tools. You cannot add bits to the SBR or PS layer and expect it to (keep) improve(ing). They are hard limited.

Normal LC AAC doesn't have such a problem, until it hits the hard bitrate ceiling (500 and something kbps), and MP3 should be the same. I don't like the analogy because of this reason; AAC and MP3 have such a wide usable bitrate range because they work like this and not like PS/SBR.

It's possible that at some point solving MP3 preecho by adding bits is so terribly inefficient that MP2 surpasses it. But I have my doubts that it happens consistently at 224kbps, if only because time resolution is one of the very few areas where MP2 is better and for a lot of things MP3 isn't limited by it.

The testing procedure of this second listening test puzzled me.I downloaded a file, and it appears that I can only rate one encoder for a given sample. It's a single A-B procedure. There's nothing wrong with that. What perplexed me is the role of the anchor. Isn't it intended to prevent the listener from temperamentic rating? It implies that the listener could access to the anchor while testing the other contenders. But in your testing procedure, you can only access to one encoded file and the hidden reference; the anchor, like all other contenders, are not accessible. The anchors can't therefore plays any role - at least not the anchor's one. It's just an additional contender.

It also mean that the listener can't rate all competitor in a same row. I can download:- LAME => hearing a distortion => give it the note of 2/5- then the ANCHOR => finding it awful => give the note of 1/5- then download again LAME => hearing the same distortion as before => give the note of 4 because I would consider is as much better than the anchor quality I still have in mind.

It's clearly recommended to evaluate all encodings in a same raw, and to compare them each others before rating them all. It's like ABCHR softwares are working. Or at least to have the possibility to rank all encodings in a short amount of time. With SoundExpert's procedure, it looks impossible and the ranking could vary according to the testing mood, leading to uncoherent results.

What perplexed me is the role of the anchor. Isn't it intended to prevent the listener from temperamentic rating? It implies that the listener could access to the anchor while testing the other contenders. But in your testing procedure, you can only access to one encoded file and the hidden reference; the anchor, like all other contenders, are not accessible. The anchors can't therefore plays any role - at least not the anchor's one. It's just an additional contender...It's clearly recommended to evaluate all encodings in a same raw, and to compare them each others before rating them all. It's like ABCHR softwares are working. Or at least to have the possibility to rank all encodings in a short amount of time. With SoundExpert's procedure, it looks impossible and the ranking could vary according to the testing mood, leading to uncoherent results.

Yes. The absence of low anchor really increases dispersion of results but it has to be compensated by broad participation of testers. Target audience of SoundExpert is completely unprepared and in most cases has no idea of what listening tests are. Instead of educate and train them (which are hard and thankless in real world) I decided to offer the listening procedure as simple as possible. It utilizes basic skills of an average listener – just “like” and “dislike” with a few intermediate states. As artifacts are clearly audible the influence of “temperamentic rating” is not high indeed. Each person has its own “inborn scale of annoyances” and in this case it’s better just use it but not build to this particular listening test.

Off course, it is a compromise between simplicity of procedure and scientific significance of its results. SoundExpert is highly experimental research project and up till now it shows that this compromise works. I think more fruitful discussion will be possible when raw stats are available.

And now I just ask for volunteers to download and grade a test item. Indeed it’s more like a fun than a listening test. And as you see the results are pretty close to Sebastian’s ones.

I understand. The procedure is indeed very easy to understand. It's really important if the purpose is to reach a wide audience through the web.The testing procedure discard all relevancy to the concept of anchor; so I wondered about the point of using Shine in your test. I suppose that yours was to fully mimic Sebastian's test, right?

No, it’s not true. And now it’s not true twice. I was going to change version of Nero in future to the one which will contain the dll (or ABR part at least) used in test. Now I’m not sure - either to continue testing with explanation of the problem with Nero AAC or to exclude it from testing (or to include real AAC encoder from the latest release instead).

QUOTE

Also, how did you get MAD to decode the Shine samples? It always failed on my side telling me that it cannot decode Dual Channel or something. I had to use LAME for decoding.

As I'm considering using bitrates lower than I did before Vorbis comes to my mind again (on my iRiver H140 battery drain with Vorbis unfortunately is rather high but I can compensate for it a bit by using lower bitrates). Can you say something towards aoTuv 4.51 pre-echo behavior for -q7 or -q6?

I'm not surprized, if mp2 at 224kbps be better than mp3. It has to be so in theory.

"What theory would that be?"

MP3 has a much more efficient encoding structure than MPEG Layer 2. This doesn't magically change at high bitrates.

Sorry for being late with the answer, but now after guruboolez posts I could only add the citation from well known paper “MP3 and AAC Explained”:

CODE

5.6. Bit-rate versus qualityMPEG audio coding does not work with a fixed compression rate. The user can choose the bit-rate and this way the compression factor. Lower bit-rates will lead to higher compression factors, but lower quality of the compressed audio. Higher bit-rates lead to a lower probability of signals with any audible artifacts. However, different encoding algorithms do have ”sweet spots” where they work best. At bit-rates much larger than this target bit-rate the audio quality improves only very slowly with bit-rate, at much lower bit-rates the quality decreases very fast. The ”sweet spot” depends on codec characteristics like the Huffman codebooks, so it is common to express it in terms of bit per audio sample. For Layer-3 this target bit-rate is around 1.33 bit/sample (i.e. 128 kbit/s for a stereo signal at 48 kHz), for AAC it is around1 bit/sample (i.e. 96 kbit/s for a stereo signal at 48 kHz). Due to the more flexible Huffman coding, AAC can keep the basic coding efficiency up to higher bit-rates enabling higher qualities. Multichannel coding, due to the joint stereo coding techniques employed, is somewhat more efficient per sample than stereo and again than mono coding...

MPEG layers were designed to provide sufficient sound quality and the same time to be efficient in their area of application. For the reason sweet spot of mp1 (384-448 kbps) provides better audio quality than sweet spot of mp2 (192-256 kbps), which is better again than sweet spot of mp3 (112-160 kbps). Due to slow improvements in sound quality at bitrates above sweet spots, there are points where previous layer begins to provide better audio quality (AAC is going to be an exclusion from this rule – time and tests will show). For mp3/mp2 such point is between 224 and 256 kbps.So, using layers at higher than sweet spot bitrates is inefficient and could be reasonable for compatibility only.