Hello, Now when it’s sure that Apple encoder represents AAC standard very well the next step will be a public multiformat test. No particular date or plan. It’s a general discussion with the purpose to hear the opinions, points of view, suggestions.

3. Bitrate: VBR 96 kbps. The last AAC public test was at 96 kbps (well, it was actually 100 kbps). Also I think it will be more interesting to compare MP3 128-135 kbps and AAC/AoTuV 96-100 kbps. Probably a lot of people are interesting to trade off between compability/compression efficiency

4. Samples. Last time we have applied the technic of random selection of the samples. 20 samples.

The developers are very welcome to participate in this topic.

This test should be more interesting as it will include interesting (for use) codecs.

There is a first issue if MP3 will be tested at 96 kbps. MP3 has a sample rate 32 kHz at 96 kbps while AAC, AotuV, Opus - 44.1 kHz. Sampling issue.

Possible solutions are resampler or higher bitrate for MP3.

I'm really not a fan of testing with multiple things unequal.

The reason for this is that the results lose their meaning and mean whatever the reader wants them to mean... or at least only one side is meaningful. If AAC with 96 beats MP3 at 128, thats potentially interesting— but if they tie or MP3 wins then "of course, it's running at a higher rate, what did you expect?".

Of course, this is also true if the rate changes the bandpass "It wasn't really better, people just preferred the other lowpass setting".

Bitrate cures a lot of sins too. A little extra rate can be the difference between transparency and not transparency.

What I would recommend is that the rates for the samples under test be run as close as possible to equal so long as there is still a good chance of any of them being a winner or loser. E.g. if we think mp3 will very likely lose if run at 96k, then we should use a higher rate— but only enough to correct that. If we run it at 192 and find that it does best, we'll have learned nothing from that effort.

Another solution would be to match bitrates as has been mentioned. We *know* that the samplerate will be different - however those rating the files will rate the files as they hear them without forewarning as to the samplerate of the individual file. Of course, it could be an obvious difference....

Not all cards support automatic switching between sample rates. For example my E-MU Pre Tracker doesn't. And some onboard cards have a distortion when you switch between sample rate. For example my onboard soundcard (ALC272). It's easy to see with udial.wav sample.

The fact that Lame resamples to 32 kHz at 96 kbps shouldn't worry too much IMO. Maybe the lack of HF is easily noticeable, but at 96 kbps there is worse to be frightened of for mp3. Lacking the extreme frequency range doesn't really hurt much though it's often audible.

Whether it's useful that mp3 participates in a 96 kbps test is a question of sample selection. With problem samples, especially pre-echo stuff, mp3 won't have a chance, and participance of mp3 doesn't make sense.So if mp3 should participate we'd better use only regular music, best chosen by chance.

On the other hand I wouldn't mind if we forget about mp3 at 96 kbps. It's really no candidate for the winner here.

The fact that Lame resamples to 32 kHz at 96 kbps shouldn't worry too much IMO. Maybe the lack of HF is easily noticeable, but at 96 kbps there is worse to be frightened of for mp3.

QuickTime at CVBR96k surely doesn't do sample rate conversion by default, but LPF at 15k-16k or so is applied anyway. AFAIK FhG @ VBR q3 do the same.In other words, "losing highs" is neither specific to LAME nor important.

As for QuickTime true VBR mode, sample rate conversion to 32k is done at quality 54 by default.Therefore, when the test is done at quality 54, the decision must be made also for QuickTime, whether to use sample rate conversion or not.

Some feedback was received on #Hydrogenaudio and I would like to bring some information and answers basing my opinion on past public tests.

1. Can we go for higher bitrate (>96 kbps)? There were 531 results for 64 kbps test and only 280 results for 96 kbps AAC test while it was open during more time. 280 results were just fairly enough to close the test. 64 kbps test has continued during 20 days, 96 kbps – 35 days (hence more difficult).It’s not recomended to keep the test open for more than one month or so.

So it’s hard to imagine a public test at higher bitrate.

2.a) Will be it more reasonable don’t discard all results of a listener who submit more invalid results that it’s allowed by rules? Keep “good” results and discard only erroneous results?First of all it worth to mention that this question isn’t new. The problem is that it’s impossible to know if it was a really “good” result or it was a “lucky guess”. So it’s not accepting the “good” results and discarding the “bad” results but rather accepting “lucky guessing” results (which is still invalid one) and discarding “not lucky guessing”.

Answer: it’s more correct to discard all results of the listener who has past the limit of the invalid results (submit too much of them). Also see (1º)

2b) One would say then we can check if the results of this one particular listener correlates enough well with the results of other listener then it’s ok to accept ¨only good results ¨. The problem that later developers and memebers will complain why the rules weren’t applied homogeneously (to all listeners in the same manner). Then the simple and effective decesion was made:The rules were _strictly_ applied (as of previous public tests 64 amd 96 kbps). No context. You follow the rules – your results are in. If not – then no but you can start from zero. Simply as that.

Also my observation about placebo effect and the interpretation of the results during the public listening test.

The score 4.0 means “perceptible but not annoying”. And how should the results of the last AAC test be interpreted in case of 4.1 4.2 4.3 etc...? 2-3 steps (0.2-0.3) is considering like a “very few ones” (in general interpretation. It’s psychology so I won't opine here ). But more than 2-3 steps aren´t “very few” . So one could say that 4.2-4.3 is a limit score where the high quality begins. I don´t know if it´s coincidence but this score (4.2-4.3) is also where placebo effect starts to reveal itself more. (º1) Another interesting observation is that placebo leads to lower score than normal. There were quite enough results where the listener has put lower score if it was actually placebo. It leads to “flipped results” . It means if average results were Codec A > B > C > D then flipped ones (with placebo effect) will be D>C>B>A.

(oh lord, my english )

Speaking of this test we can go for both MP3 96 and 128 kbps. It's a period to throw the ideas.

+1 for keeping the number of codecs small. Consider only three perhaps? Four is stretching it. I really struggled with the five in the last test (AAC @ ~96 kbps [July 2011]).+1 for using a low anchor which doesn't fall so far behind in quality (like in the last test). It was an understandable choice back then, but I think it's quality was so low, that it didn't help to put things into perspective. (edit: never mind the next part, quality is probably too good to serve as a low anchor even at the lowest setting) How about using lossywav at rather extreme settings? Though I'm not sure if it is a good idea if the low anchor has completely different types of artifacts.

Very low quality low anchor permits to analyse the results very fast.There were listeners who couldn't identify the low anchor in last test. Pretty strange. So it isn't that bad to have bad low anchor but, yes, I agree that low anchor should have a little bit more quality.

Any chance to see Musepack competing against current encoders? Has been a while since it has been tested. Though it probably didn't change much regarding performance, I think it should be tested at lower bitrates than before, too.

USAC has been "released" with the confusing name "Extended HE-AAC." As one might guess from the new name, I don't think USAC is intended to replace AAC (LC or main profile) at higher bitrates; things I've read have seemed to imply that USAC (or at least its sweet spot) is supposed to top out at 64kbps.

Once somebody can get their paws on a USAC encoder I really think we should do another 48kbps (or possibly as low as 32kbps) test; I'd imagine HE-AAC encoders have come a long way in the >5 years since the last 48kbps test, and of course Opus is an important addition. Perhaps that could be the next one after the 96kbps test?

Go ahead and include Opus without waiting for a 1.0 release. The 1.0 release will likely include few changes; the reason it's long in coming has more to do with the drawn-out formalities of standard bodies than with the technical readiness of the format.

Vorbis too, partially because of its use for html5 audio.

I strongly feel that an MP3 encoder at the target bitrate should be included whenever possible, even when the target bitrate means we're confident it will be blown out of the water by the competition. My main reason is that this makes the test results more accessible to a wider audience; people are familiar with MP3 and having it as a point of comparison helps them have a better perspective on what the results mean. It's also simply interesting to see the progress made both in newer formats and in MP3 encoders. It'd be nice to include higher-bitrate MP3 as well; the space-compatibility tradeoff is real, and while "96kbps AAC=128kbps mp3" is probably roughly accurate, it's odd that a claim that's been made so frequently doesn't seem to have been definitively put to the test.

Note that Sebastian's test of MP3 encoders at 128kbps 3 1/2 years ago ended with "The quality at 128 kbps is very good and MP3 encoders improved a lot since the last test. This was the last test conducted by me at this bitrate. It's time to move to bitrates like 96 kbps or 80 kbps." Anybody who thinks that MP3 should only be tested at 128kbps or above needs a reality check; if we test at a bitrate where a significant number of codecs are practically transparent then testing is very difficult for the participants and in the end we get no worthwhile results.

As far as the LAME sample rate thing goes, I'd say we want to provide encoders the latitude to make any decisions they want to as they try to optimize audio quality. The only real reason to worry about LAME resampling the audio is questions of how people's sound cards may deal with the difference-- so just use a high-quality resampler to make all the encoders' decoded samples the same sample rate.

That's quite a lot to expect from one test. How many people would rate all 140 samples?

Keep in mind that there will be extra listener fatigue from trying to distinguish between encodes which are pretty transparent, especially for those of us without stellar golden ears. A 96kbps test is going to be considerably harder on people than a 64kbps test, and even in that test, which only had five encoders, only ten people submitted results for all samples.

FhG lost the public AAC 96kbps test to Apple last year by a statistically significant margin. Yes, it's been updated in the meantime and may be competitive with Apple's encoder at 96kbps now, and maybe it's noticeably better than Apple's for HE-AAC and <80kbps bitrates, but I don't think it's an important enough reference point at 96kbps to justify its inclusion at this stage.

Not only is it too early for USAC (no quality stereo encoder publicly available yet), but everything I see supports the idea that it's not really targeted at this bitrate anyway.

Let's just be sure to plan on another lower-bitrate test in the near future -probably 64kbps again, though I'd like to see one at ~48kbps too- to look at updated HE-AAC encoders, improvement in opus since the last test at that bitrate, and USAC.

I think it would be a good idea to make another AAC listening at first (Maybe just QT vs. FhG) since the development of the Winamp encoder has been quite active.

If there was a chance that any other encoder already performs as good or better than Apple AAC at 96 kbps there would be a lod of people talking about that (me too). It's not the case.

We have dedicated a complete test to AAC format last time. if I will conduct the next public test it will be multiformat (96 kbps). We had a large discussion about that here. Nothing will change since then. Period.