11 July, 2004, 07:50:54 PM

• My access to internet is now very limited. Therefore, the encoder I’m using for my tests are not necessary the most recent available on the web. Here, tests were done when vorbis 1.1 RC1 was released, but I didn’t have access to this information…

• This test is something like a work-in-progress. I plan to add more results with time.

[span style='font-size:14pt;line-height:100%']I. PURPOSE OF THE TEST[/span]

Like many people of this board, my principal motivation for audio encoding lie in the possibility to listen and enjoy music in high quality directly from computer, which allows a very fast browsing and the access to an entire record collection. High quality encoding is a requirement, security a need. I used successively lame mp3, musepack audio and now lossless, which offer the security of identical digital-data with CD.Nevertheless, lossy encoding is still interesting: modern hard disks are not necessary big enough for all collections, and I think that there’s some benefits to feed expensive digital jukebox with “better than just good” quality audio encodings, like AAC/Vorbis 128 – fine but perfectible.

The choice of the best lossy encoder isn’t really problematic. Musepack (mpc) is still winning most approvals, and is considered as fully transparent with --standard preset. Some elements encouraged me to seriously question this leading position of mpc.

• 1/ by testing occasionally the standard preset of mpc, I discovered that small differences are sometimes audible with usual music. Now if mpc isn’t fully transparent at 175 kbps, this format is definitively comparable (it doesn’t mean “equal”) to other lossy solution, which are suffering from the same report.

• 2/ the leading position of mpc was admitted long time ago. It was defined as “best lossy format” when challengers where not very strong: beta of vorbis, lame < 3.90, suboptimal aac encoders. But now, there are powerful vorbis encoders (the recent “megamix” merging looks like a serious challenger), optimized AAC encoders (QuickTime CBR and Nero VBR), and mature MP3 solutions (VBR presets of lame). The leading position must therefore be questioned again, at least by people able to detect differences.

• 3/ This challenge becomes necessary with the growing numbers of device supporting new audio formats like AAC and Vorbis. MPC is still confined to computer, or in best case on PDA – and is maybe doomed to this limited usage.

In consequence, I’ve tried to oppose to mpc --standard other serious encoding solutions, in order to have a better, modern and personal idea of the relative quality of this encoder compared to modern and convenient challengers.

[span style='font-size:14pt;line-height:100%']II. CHALLENGERS[/span]

Against musepack --standard, I decided to oppose two formats: MP3 with lame 3.97a3 and OGG VORBIS with the recent combined encoder named “megamix”. Explanations.

• first, no [span style='font-size:12pt;line-height:100%']AAC[/span] encoder in the arena. I was tempted to use [span style='font-size:12pt;line-height:100%']Nero AAC[/span], but the last version I have (2.6.2.0) have some recognized quality problems and is promise to an imminent conceptual death, with the Third version of Ivan Dimkovic encoder. No need to test something outdated… I was also tempted to take[span style='font-size:12pt;line-height:100%'] QuickTime AAC[/span], though it’s not VBR and not very flexible (nothing between 160 and 192 kbps: annoying for fair comparison with MPC --standard). But this encoder is not really suitable in my opinion for HQ listening, at least when user is found of opera and when most of his CD absolutely need a real gapless playback. AAC will be add later, but for now, it’s absent from this test.

• the choice of [span style='font-size:12pt;line-height:100%']lame MP3[/span] version is highly problematic too. Three choices are possible: the last “tested” release (3.90.3), the last gold release (3.96) or the last alpha release (3.97 alpha 3). I’ve decided to not use [span style='font-size:12pt;line-height:100%']3.90.3[/span]. I know that for some people this encoder is the best mp3 codec ever released; I also know that for historical reason 3.90.3 is probably the safest choice. But the difference between 3.90.x dead branch and the active 3.9x one is not only related to quality: 3.9x are much faster (not a luxury considering slowness of 3.90.x presets), more complete (full and redesigned VBR preset scale: the nice –V 5 used in Roberto listening test is for example a new feature inaccessible for 3.90.x), and last but not least in perpetual evolution. There’s nobody to correct flaws on 3.90.x, whereas bug audible with 3.9x could be corrected or lowered by Gabriel, Robert, Takehiro and other developers.I definitively forget the choice of 3.90.x for another important reason: there’s no VBR preset corresponding to the MPC –standard bitrate. –alt-preset standard is clearly too high, --medium too low, -Y switch a hack, and ABR is probably not efficient enough. With 3.9x branch, there’s an existing preset between –standard and –medium: -V 3. And –V 3 average bitrate should be close to the MPC –standard one.

Then: [span style='font-size:12pt;line-height:100%']3.96 “gold” or 3.97 alpha[/span]? I’ve decided for the alpha release. I know the risks (for regression but also for progress). But I also know that 3.96 is buggy on –fast mode: it decides me to use a corrected release, even if the test doesn’t concern the fast mode of lame.

• the choice of [span style='font-size:12pt;line-height:100%']vorbis[/span] version is less problematic. Recent tests were done. CVS/GT3b2 couldn’t resists against aoTuV/GT3/QK32 dream team (aka [span style='font-size:12pt;line-height:100%']megamix[/span]), at least up to 5,99. And even higher, GT3b2 (previous reference encoder for high bitrate) doesn’t really sound superior (except maybe for one family of problems: micro-attacks). I also recall that I’ve began this test by being unaware of the release of 1.1 RC1. This last encoder nevertheless seems to be inferior to “megamix” (the essential but maybe ‘excessive’ tuning of Garf, used at bitrate > -q 5,00, are apparently missing from this RC1 version). The use of “megamix” is therefore pertinent, and my test is probably not outdated by this enjoying pre-release of oggenc 1.1

• I don’t forget the promising [span style='font-size:12pt;line-height:100%']WMApro[/span]: I was really pleased and even enthusiastic by the quality reached by this format with classical music at mid-bitrate. Nevertheless, I didn’t include this format in the test. First, I had to limit the number of competitors. Then, I’m not familiar with this encoder and don’t know what setting is the best (which VBR mode? And is WMApro VBR implementation reliable, or isn’t ABR 2-PASS preferable, etc…). Last: still no hardware device for WMApro (though it’s not a reason to exclude an audio format from a test including MPC, it’s a disappointing situation).

[span style='font-size:14pt;line-height:100%']III. SAMPLES[/span]

Mid/High bitrate tests are, for me at least, especially painful. It doesn't mean that I hate them, quite the opposite. ...Samples only concern « classical music », with one exception. I deliberately limited my choice on the music I like. It's not by snobbism; and it's not an egocentric attitude: other music is much harder for me to ABX, and my motivation would quickly disappear with music I don't really like. In other word, the impact of these results is VERY LIMITED: they concern my subjectivity (and only mine), and a particular genre of samples (natural instruments, recording according to high-fidelity principles - and not to the marketing “loud” one).There are solo instruments (organ with Dom Bedos; harpsichord with Fuga; trombones with Orion II), instruments with small accompaniment (cymbals with Krall and Marche Royale, drums with Marche Royale, 2nd part), orchestra (Weihnachts-Oratorium and Platée), chorus (Ich bin der Welt abhanden gekommen) and voice ( “Dover, giustizia, amor” ). Additional information (artist, performer…) are available on file tags.

[span style='font-size:14pt;line-height:100%']IV. SETTINGS[/span]

Comparing VBR encoders/settings is problematic. The ideal thing is to fix a target bitrate, and then to find the corresponding preset for each encoder. I followed the usual (and IMHO the best) methodology: the setting must be related to a wide selection of music, and not to the selected samples.The targeted bitrate is the average bitrate of MUSEPACK --standard preset. The average bitrate can’t be evaluated precisely: it’s something comprise between 170 and 180 kbps. 175 kbps approximately. I have verified this value with classical music library, and people have reported similar value with completely different music.The remaining task is now to find the corresponding VBR settings for LAME MP3 and Vorbis “megamix”.The problems are beginning…

• The biggest problem lies in the average bitrate’s difference of vorbis, occurring at the same setting, depending on the kind of encoded music. Classical is bitrate friendly compared to most other stereo and modern material. With CVS encoder, I estimated this difference at 10…15 kbps on average for –q 5…6. With “megamix” (or other GT3b2 based encoders), this difference might reach 25…30 kbps for the same setting. I don’t know what to do…- by testing vorbis with a –q value corresponding to 175 kbps for classical but 200…210 kbps on pop/rock… people may blame me for opposing to musepack an advantaged vorbis challenger.- by testing vorbis with a –q value corresponding to a 175 kbps for pop/rock but 140 kbps on classical, the test will be pointless for me (the winner between mpc@175 and vorbis@145 isn’t very hard to guess…).- by testing vorbis with a half-baked –q value, I fear that the test won’t corresponding to neither of both situation.

• The second big problem is related to vorbis rupture in the linearity of the quality scale. Between -5,99 and -6,00, there’s a consequent bitrate difference (~10 kbps), also corresponding to a serious quality difference, at least with vorbis 1.00 – 1.01 (including GT3b2). aoTuV (and therefore “megamix”) is based on the same code, but the tuning tried to correct or to minimize the quality gap between the two settings. I discovered that for classical music, the fair vorbis setting is very close to this 5,99 value. 6,00 is slightly to high, and I could disadvantage mpc by comparing it to vorbis –q 6,00. On the other side, I have the feeling that -q6,00 would show the full potential of vorbis, and that the extra 8…10 kbps could be worth for daily use. Would someone renounce to the correction of a quality bug at low prince (+5% increase in filesize), especially with archiving in mind? Seriously, I don’t think so.

For all these reasons, I’ve decided to put in the arena vorbis megamix at three different settings:[span style='font-size:12pt;line-height:100%']-q 6,00[/span]: clearly to “heavy” compared to mpc --standard with non-classical music, but interesting to test against -q 5,99 (to see if the frontier between these two settings still exists with aoTuV/Megamix/1.1)[span style='font-size:12pt;line-height:100%']-q 5,99[/span]: the corresponding setting for a matching bitrate with mpc –standard for classical music (still too heavy with other music), but maybe suboptimal quality for vorbis[span style='font-size:12pt;line-height:100%']-q 5,50[/span]: more universal setting for acceptable test against mpc --standard. It would be interesting to compare the quality difference between 5,50/5,99 and 5,99/6,00. I suspect (and fear) a much greater jump between the last pair than with the first one.

I discovered that bitrate of [span style='font-size:12pt;line-height:100%']–V 3[/span] preset (lame 3.97a3) is really close to the average bitrate of mpc --standard. This applies at least for classical music (I don’t have enough material to measure average bitrate on other musical genre). –V 3 will therefore be tested.I’ve also decided to add [span style='font-size:12pt;line-height:100%']–V 2 (--preset standard)[/span]. The bitrate is higher, but I really want to see if this historical leading preset of lame MP3 is competitive against musepack. It would also be interesting to see how will perform lame –V 2 compared to vorbis megamix, also playable on portable player, but with bad consequences on battery life.

First comment: I've add 10 points to each note. I had to find a solution to prevent misinterpretation of notes which could first appear as excessively severe. I didn't use low anchor for this test, and slight flaws sometimes appear as terribly annoying on such tests, lowering very much the notes. By artificially changing all notes, I also had in mind to disconnect the notation I used from the EBU scale (4= "perceptible but not annoying"; 3 = "slightly annoying", etc...).

With 10 results only, I couldn’t make strong conclusions. But some elements of conclusions are now appearing:

• [span style='font-size:12pt;line-height:100%']MPC –standard[/span] has serious chance to be the best of the three competitors. Eight time on the first place, one time second, and never on the last. Very good performances. We could also note that –standard setting wasn’t sufficient for reaching the “transparency” level (except for the organ sample, with negative ABX tests). Nevertheless, I could seriously expect full transparency with higher setting: none of this sample (except maybe the chorus one) showed severe artifacts, but just slight differences. It’s typically the kind of “problems” that disappear with a higher bitrate. Anyway, I’m impressed, because I didn’t thought that MPC –standard was so in advance...

• [span style='font-size:12pt;line-height:100%']LAME MP3[/span] has few chances in my opinion to compete with vorbis and musepack at ~175 kbps. The new –V 3 setting sit on the last place eight times: too much… even with a limited set of samples. It doesn't mean that -V 3 sounds bad, but it's just inferior to modern lossy format at similar bitrate. But with improvements, who knows...But the –V 2 setting (aka –alt-preset standard) is apparently competitive, and could fight (and sometime win) with vorbis “megamix” –q5.50 and –q5.99. Only problem: bitrate is not the same anymore (195 kbps vs something comprise between 162 and 180 kbps, but with classical music only). But it’s imperative to precise that LAME –V2 and –V3 suffers from huge artifacts (the harpsichord and the organ samples are severely wounded to my ears), whereas vorbis artifacts were never so bad (except, maybe, with Orion II sample – micro-attack problems).To be short, LAME –V2 (--preset standard) is apparently competitive with VORBIS “megamix” –q 5,99, at least with classical music. It would be interesting to see how will perform both contenders with other kind of music at the same setting, which implies a completely different bitrate range (+10..15% with vorbis, and maybe – x% with lame).

• I expected a lot from the [span style='font-size:12pt;line-height:100%']vorbis mixture[/span]. The progress between “megamix” and CVS are really impressive compared to CVS encoder, and I really wondered how it’ll perform against other challengers. I’m finally disappointed. For some reasons: - First, the coarse sounding problem of the format is still audible with “megamix” up to 5,99. No need to suspect any of GT3b2 or QK32 tuning to ruin the benefits of original aoTuV in this area: the noise problem is particularly audible on “tonal” moments, encoded with pure aoTuV code (bit to bit identical samples between aoTuV encoder and megamix one). This additional noise is probably not too disturbing on daily listening, but on direct comparison with other challengers, the contrast is still annoying. The problem not really lies on noise, but on coarse rendering of voice or instruments: lack of subtlety, fat texture… I think that this problem is a legacy of internal change occurring during RC3 development of Vorbis, in spring 2002. I think I’ve established this fact at ~128 kbps some months ago (correct me if I’m wrong), and I suppose that’s still true at ~160…170 kbps, even with aoTuV (based on the same buggy “final” CVS code). - Second reason to be disappointed: due to this remaining coarse problem occurring up to –q5,99, there’s still a consequent quality gap between this setting and the rounded -q 6,00. It’s my fault: I’ve expected from aoTuV tuning to erase the existing frontier between –q 5,99 and –q 6,00: this encoder only reduced the gap. There are ~10 kbps difference between 5,50 and 5,99 but few quality improvements. There are also 10 kbps difference between 5,99 and 6,00, but huge quality progress are audible. For a daily use of vorbis encoder, there’s no real problem with this difference: the 10 additional kbps of –q6,00 are obviously worth if someone is looking for high quality or archiving, and there’s no need to hesitate. But for my test or any similar one, this difference is much more problematic. On one side, I couldn’t oppose mpc –standard to megamix –q 6,00 on fair bases (average bitrate doesn’t match anymore). And on the other side, it’s pointless to compare mpc –standard to an handicapped vorbis setting (5,99). It’s like using musepack at –quality 4.99, which also suffers from problems (and bitrate gap) that don’t exist anymore at –quality 5.00. Cruel dilemma… - Third reason to be disappointed: even at –q 6,00 (and 10 exceeding kbps), megamix couldn’t apparently reach the quality of musepack –standard. More samples are of course needed to enforce this beginning of conclusions, but I really fear the solution doesn’t lie on a selection of samples, but rather on further development.

As I said it at the very beginning, I consider this test as a first step. Additionnal results should and will normally complete this first phase. I expect a quick release of Nero AAC encoder in 3.0.0.0 version to add some spice to the test. External test, opposing vorbis megamix to the new 1.1 must also be done, in order to be sure that megamix is the best vorbis encoder at this bitrate.

I'd also like to see this test followed by other people. It would help to compare different HQ encoders on empirical bases. Feel free to post some results, even for one sample, on this topic.

I've upload all samples on a temporary link. I couldn't keep them on-line too long. So don't wait if you're planning to do personal tests. ABX logs are available in each archive. Samples are in OptimFROG lossless audio format.

The problem not really lies on noise, but on coarse rendering of voice or instruments: lack of subtlety, fat texture… I think that this problem is a legacy of internal change occurring during RC3 development of Vorbis, in spring 2002.

Outstanding work as usual guru. I don't know if it would really solve the fairness issue, but could you increase the mpc setting to 5.1 or 5.2 or something to make it the same bitrate as megamix at -q 6? The bitrates could be matched that way without having to put either codec on the bad side of one of those annoying "thresholds".

I am *expanding!* It is so much *squishy* to *smell* you! *Campers* are the best! I have *anticipation* and then what? Better parties in *the middle* for sure.http://www.phong.org/

From what I can understand this this problem disappears after q5.99.[a href="index.php?act=findpost&pid=225055"][{POST_SNAPBACK}][/a]

Yes, the complete range between -q -1 and -q 5,99 is affected by this phenomenon. It's easy to notice with CVS encoders (except 1.1). RC3 and inferior release are probably free of this problem, and aoTuV/1.1 lower the amplitude of coarsness.It's a great shame that this quality frontier is located so high in the bitrate scale. At -q4 or -q5, it would be less annoying. But here, this fat sound also affect encoding at 170...210 kbps, HQ setting which should be free of this kind of problem.

Outstanding work as usual guru. I don't know if it would really solve the fairness issue, but could you increase the mpc setting to 5.1 or 5.2 or something to make it the same bitrate as megamix at -q 6? The bitrates could be matched that way without having to put either codec on the bad side of one of those annoying "thresholds".[a href="index.php?act=findpost&pid=225056"][{POST_SNAPBACK}][/a]

It's a solution, but I don't like it. It's not to the reference to be fit to the challengers, but the opposite. Most people are using --standard preset with mpc. They won't use --quality 5.2 or 5.4 and wasting bits.The first step of excellence for mpc is --standard, which correspond to ~175 kbps on average with 1.14 encoder. If the first step of excellence for vorbis megamix lies on -q6, which correspond to 185...210 kbps, it's a vorbis handicap (developers choice - good or wrong, I can't say), which proves that the first encoder is more efficient than the second one. In other words, there's an advantage of using mpc: the optimal quality is accessible on lower bitrate. A test shouldn't break this balance.

Anyway, even with lower bitrate, mpc seems to maintain some distance with megamix -q 6,00. I don't expect great changes by using a slightly higher setting for musepack.

Anyway, even with lower bitrate, mpc seems to maintain some distance with megamix -q 6,00. I don't expect great changes by using a slightly higher setting for musepack.

Quote

We could also note that –standard setting wasn’t sufficient for reaching the “transparency” level (except for the organ sample, with negative ABX tests). Nevertheless, I could seriously expect full transparency with higher setting: none of this sample (except maybe the chorus one) showed severe artifacts, but just slight differences.

I don't expect great changes by using a slightly higher setting for musepack.

Quote

I could seriously expect full transparency with higher setting: none of this sample (except maybe the chorus one) showed severe artifacts, but just slight differences.

It might appear as a contradiction, but according to my past experience, problems are never solved by adding few kbps. A more consequent inflation (from standard to extreme, there's 30 kbps difference) is - I'm sure - needed in most cases to "solve" problems (i.e. lowering the distortion level below the threshold of hearing of the tester).

Adding 0.2...0.5 point to a quality level is rarely convincing: look on the difference between vorbis 5.50 and 5.99: near inexistant.

From what I can understand this this problem disappears after q5.99.[a href="index.php?act=findpost&pid=225055"][{POST_SNAPBACK}][/a]

Yes, the complete range between -q -1 and -q 5,99 is affected by this phenomenon. It's easy to notice with CVS encoders (except 1.1). RC3 and inferior release are probably free of this problem, and aoTuV/1.1 lower the amplitude of coarsness.It's a great shame that this quality frontier is located so high in the bitrate scale. At -q4 or -q5, it would be less annoying. But here, this fat sound also affect encoding at 170...210 kbps, HQ setting which should be free of this kind of problem.[a href="index.php?act=findpost&pid=225058"][{POST_SNAPBACK}][/a]

Is it coincidence that the problem goes away after q5.99, which also happens to be the point at which lossless channel coupling begins?

Is it coincidence that the problem goes away after q5.99, which also happens to be the point at which lossless channel coupling begins?[a href="index.php?act=findpost&pid=225074"][{POST_SNAPBACK}][/a]

Lossless channel coupling can be used below q5.99 as well. Q5.99 and below can use a mixture of lossy and lossless coupling if neccessary. Q6 is the point at which lossy channel coupling is no longer used.

Is it coincidence that the problem goes away after q5.99, which also happens to be the point at which lossless channel coupling begins?[a href="index.php?act=findpost&pid=225074"][{POST_SNAPBACK}][/a]

Lossless channel coupling can be used below q5.99 as well. Q5.99 and below can use a mixture of lossy and lossless coupling if neccessary. Q6 is the point at which lossy channel coupling is no longer used.[a href="index.php?act=findpost&pid=225077"][{POST_SNAPBACK}][/a]

So, is it coincidence that the problem goes away at the point at which lossy channel coupling is no longer used?

MPC is still confined to computer, or in best case on PDA – and is maybe doomed to this limited usage.

It would be wonderful if this best case were true, but no: on my Palm I can only listen to MP3, Ogg Vorbis and WMA. And I know the same applies to PocketPC, besides some obscure AAC player. Musepack is unfortunately really confined to computers.

Very interesting test. It does confirm the well-known weakness of Vorbis on classical music and more work needs to be put in to correct this. I'm not sure what is causing the difference in quality between -q 5.99 and 6. The switching off of lossy stereo at -q 6 is one but point stereo only causes stereo collapse on high frequency bands. Noise normalisation also affects higher frequencies and turns off at -q 7 I think so that may not be the reason either. hmm....I don't know.

MPC is still confined to computer, or in best case on PDA – and is maybe doomed to this limited usage.

It would be wonderful if this best case were true, but no: on my Palm I can only listen to MP3, Ogg Vorbis and WMA. And I know the same applies to PocketPC, besides some obscure AAC player. Musepack is unfortunately really confined to computers.[{POST_SNAPBACK}][/a]

Hopefully, not for long. See [a href="http://www.hydrogenaudio.org/forums/index.php?showtopic=23362]here[/url].

Do you have details on the ABX tests ? Did you do them for every sample ? Do you train before beginning ? How much ABX sessions do you perform ? What were the results ?

Last time you posted something like this, no one cared to perform a statistical analysis in order to rank the encoders with 95 % confidence bars. I guess I'll have to do it myself, but since I don't know how, it will take some time.

If you're talking about ABX log and comments, they're all in the .zip archive, accompagning each sample. Not the best idea I must say. I'll upload the log files in a separate and slim archive.

Quote

Did you do them for every sample ?

Yes, but for some files, I've renounced to ABX encoded files against encoded files. Sometimes, difference is very small. These kind of tests need much more concentration. I've nevertheless try to compared encoded files each others when they're sharing the same kind of flaw, in order to have a better idea of which sounded better.

Quote

Do you train before beginning ?

No. I didn't use the latest ff123 ABC/HR soft (offering a training module). The only training I've done was with the Diana Krall samples. It's a sample I've discovered some times ago, when I noticed that mpc --standard produces audible distortions on the cymbals. I've first began my test mith this sample as dilettante, without ambition, comparing MPC against one Vorbis encoding and one MP3 encoding. Then, I have decided to avoid some possible criticism about bitrate by using a wider set of encodings for vorbis and mp3, in order to see how are performing these file formats even at higher bitrate: at their optimal quality (the "excellence step" for each format: --standard, --alt-preset standard, and -q 6,00).

Quote

How much ABX sessions do you perform ? What were the results ?

Generally, I've stopped when pval was low enough. FOr some files, I've ruined the results by doing some mistake. Angry, I've damaged even more the results. Therefore, for some files, I've went up to 50 trials in order to reach again satisfying pval.But you can find precise values by downloading the archive on my ftp.

Could you consider adding WavPack hybrid mode (only using the lossy part) for similar bitrates ? Because if it doesn't perform bad, it could be a serious alternative (you can have both a lossy and lossless file).

Could you consider adding WavPack hybrid mode (only using the lossy part) for similar bitrates ? Because if it doesn't perform bad, it could be a serious alternative (you can have both a lossy and lossless file).[a href="index.php?act=findpost&pid=225188"][{POST_SNAPBACK}][/a]

Hybrid encoders have poor performances at this bitrate. At least with classical: they sound terribly noisy, and coarsness of vorbis is nothing comparing to them. These encoders (DualStream and WavPack lossy) are more interesting at ~300 kbps (or maybe lower, with very loud music, like metal). Otherwise, I had include on of these hybrid encoder.