If Dibrom, JohnV, Gabriel etc etc are reading this thread, please see my previous (long!) post for the serious questions which I can't answer - this post is to clear up the simpler stuff (which I can help with!).

john33,

It's surprising how little difference it make to the bitrate. Maybe some of the noise shaping options cause more of the dither to pop up above the ath - I don't know. With a suitable sample, you could look in spectral view to see. TBH it's hard to know exactly what's happening in the different modes. From a theoretical stand-point, I know that the scale version is best, but what lame is actually doing is anyone's guess. I'm not about to volunteer to ABX the differences!

mrosscook,

Decreasing the amplitude by 18dB (without dither) is essentially dropping the lower 3 bits of the 16-bit signal that comes off a CD. So, there will only be 13 bits of real information in the resulting file.

To explain: -18dB = dividing by 8 (2*2*2). If you did this exactly, you would end up with the 3 most significant bits of the new file being 0, and the 3 least singificant bits of the old file being lost - the middle 13 bits are shift but remain intact. In practice, we're not dividng by 8 exactly, so it won't be this neat: but it gives an idea of the kind of data that's going into the lossless codecs, and hence why they achieve better compression ratios afte tha 18dB drop.

The lossy codecs just look at the sound. If the file is loudish throughout, you haven't actually lost any audible content by dropping the level by 18dB - everything is still way above the -96dB CD noise floor, and the compression ratios show this: there's still just as much "sound" to store before and after wave gaining.

M,

Maybe you're reading too much into it! The one with dither (no noise shaping) is very very close too. The ones with noise shaped dither are not as close, which is to be expected because they don't contain the same audio data: they contain extra high frequency noise, added intentionally.

Re: dither/not dither... In this -18dB example, you're losing 3 bits. There's an OTT example at the bottom of this page where 10 bits are removed. It shows why (generally) dither is a good thing, though in practice there are times where the signal can self dither, and extra dither doesn't help. It rarely hurts though.

From a theoretical stand-point, I know that the scale version is best, but what lame is actually doing is anyone's guess. I'm not about to volunteer to ABX the differences!

The benefit with using --scale is that the scaling is applied to the integer samples after they have been copied into a float buffer so the fractional parts of the results are preserved for the encoder.

This discussion has centred on this issue vis-a-vis LAME. It could, however, equally be applied to the ogg vorbis encoders and, probably, others. Certainly in oggenc/oggdropXPd, the scaling is applied to the input samples after they have been converted to floats but before being fed to the encoder.

--------------------

John----------------------------------------------------------------My compiles and utilities are at http://www.rarewares.org/

Thanks john. That much I'd assumed. It's how the level difference affects the psychoacoustics that interests me. If we use fewer bits then either

a) we were using too many to start with, orB) the quieter version will actually sound worse.

If (a), then in theory we could change the lame psychoacoustics. What would be more valuable would be to use a replay gain calculation to help lame set the ath properly, and see what effect this has.

Cheers,David.

This issue has been lurking for a long time. It's probably about time to figure out what might be done, now that replaygain is so well established. However, it may take a few listening tests to decide if B) is noticeable using the appropriate music replaygained down to 89 dB.

I know that bAdDuDeX once complained about a loss of quality when using --scale to lower the volume before encoding. His preferred genre was Metal.

WARNING: There still occured 1 SCF clippings due to a restriction of StreamVersion 7. Use the '--scale' method to avoid additional distortions. Note that this file already has annoying distortions due to slovenly CD mastering.

My humble experiences with this topic: http://www.hydrogenaudio.org/forums/index....=ST&f=16&t=8046The main "problem" seems to be the sfb21 (high freq) content and lame VBR.As I understood it, if the track is very loud, this high freq. content goes above the ATH and causes the bitrate bloat (sometimes dramatically). Correct me if that's wrong please:)You may try to use -Y switch in your tests, the difference between original and replaygained encoding should be far less then.

My humble experiences with this topic: http://www.hydrogenaudio.org/forums/index....=ST&f=16&t=8046The main "problem" seems to be the sfb21 (high freq) content and lame VBR.As I understood it, if the track is very loud, this high freq. content goes above the ATH and causes the bitrate bloat (sometimes dramatically). Correct me if that's wrong please:)You may try to use -Y switch in your tests, the difference between original and replaygained encoding should be far less then.

It appears Sony666 has identified the problem. It DOES appear to be sfb21 related with LAME, because here are the resulting bitrates for Ministry's "Impossible":

This suggests to me that LAME is spending too much encoding high volume, high frequency sounds with modern overcompressed recordings. Using --scale seems to work around this problem, and -Y is of course a workaround. If Sony666 and I are correct, we have found a serious glitch in LAME that needs addressing!

My humble experiences with this topic: http://www.hydrogenaudio.org/forums/index....=ST&f=16&t=8046The main "problem" seems to be the sfb21 (high freq) content and lame VBR.As I understood it, if the track is very loud, this high freq. content goes above the ATH and causes the bitrate bloat (sometimes dramatically). Correct me if that's wrong please:)You may try to use -Y switch in your tests, the difference between original and replaygained encoding should be far less then.

But how can the encoder know how much I turned my volume knob when I listen to the song (or if I replaygain them afterwards etc)? The only assumption it should be allowed to make about this is IMO that you don't want to ruin your ears when listening and thus that the peak (up until that point in the file to have a "causual" encoder) is below the threshold of pain. LAME seems to make other assumptions as well according to these tests. Garf, do you know how this works in the vorbis encoder?

I believe the issue was adjusting the ath curve based on the volume (i.e., how far from full-scale) the music is. Perhaps the change in the ath happened mostly in the upper frequencies. If so, that might explain why it appears to be related to sfb21. In any case, I think this behavior was meant to be a *feature*, not a bug.

ff123 That's a 40kbps change in total, 39kbps of which appears to be sfb21 (judging from the -Y results). That is a lot of extra information just for what are essentially louder squeeks, don't you think? Why only 1kbps difference when sfb21 is eliminated?

I think you are right in that it is an ATH issue, but this sfb21 thing helps explain why other codecs don't deal with this issue quite as drastically as LAME does. Don't you think?

I believe the issue was adjusting the ath curve based on the volume (i.e., how far from full-scale) the music is. Perhaps the change in the ath happened mostly in the upper frequencies. If so, that might explain why it appears to be related to sfb21. In any case, I think this behavior was meant to be a *feature*, not a bug.

ff123

The problem as I see it is that volume is not necessarily linked with how far from full scale the digital waveform is. It may as we see depend on other things as well - for example replaygain and volume level on your amplifyer.

I guess we just need to roll out some ABX tests. My question is, how much do you need to amplify the specifically --scale'd mp3 before you can tell a difference? (speaking of 89dB) Will it solely depend on how loud or compressed the original was?

--------------------

WARNING: Changing of advanced parameters might degrade sound quality. Modify them only if you are expirienced in audio compression!

Replay gain assumes that you're listening quite loud, so that's not a great worry.

Jebus,

The bits aren't going into sfb21 (i.e. higher frequencies). They're going into making the whole frequency range more accurate, so that sfb21 (above 16kHz) can actually get some bits! sfb21 doesn't have a scale factor of its own (stupid mp3 format), so it gets something related to what the others get (I forget the details - it's been discussed to death).

So, the 39kbps don't go to higher frequencies - they go to all frequencies, as this is the only way to give 16kHz+ "sufficient" bits - probably only a few in this case. Otherwise, it gets starved.

ABX tests: well, if it's a 16kHz+ issue, that's me out! You'll need the people who can hear up there, and also can detect ringing up there, which might be an issue. This should be interesting - we might find that those bits are very much needed! (though not for me!)

Finally reached the end of this thread (to date), and glad to see that sfb21 has been found to be the culprit, as I'd been starting to suspect. I agree with 2Bdecided about how the sfb21 problem manifests itself. It's a common misperception that the extra bits are wasted as 16kHz+, when in fact they're wasted on <16 kHz content if certain things happen in the 16kHz+ content that requires the workaround to maintain quality.

This is my educated guess as to what's happening:

By chance, john33 picked a track where the sfb21 issue wasn't causing much bitrate bloat in normal APS (without the -Y), so it wasn't much higher in the version with no --scale applied.

Jebus picked an album where the sfb21 issue was causing bloat in APS at full volume, but wasn't at lower volume. As 2Bdecided said, there is no scalefactor for 16 kHz+ (sfb21) so the global scalefactor has to be used. If the scale required to get fine enough quantization in sfb21 doesn't match the existing global scalefactor, the workaround is to adjust the global scalefactor, which then causes more bits to be used to encode all the other spectral bands (sfb0 to sfb20). This forces all other bands to be quantized more finely than the masking threshold requires, so wastes bits in encoding detail in the <16 kHz area that is inaudible. (Encoders like Musepack and Vorbis don't have this sfb21 issue, so there's less difference - perhaps a fraction due to the way the adaptive ATH works).

It so happens that the --scale version doesn't need such a big change in the global scalefactor, so it doesn't get forced to waste bits on masked (inaudible) detail in bands 0-20 just to get the quantization noise in sfb21 low enough to go below the masking threshold.

Probably an analysis with mp3x graphical frame analyzer could help to verify whether this is true or not.

So, providing the psychoacoustics are correct about the masking threshold for <16 kHz components of the signal, the lower bitrate file should sound just as transparent as the high bitrate file that has been mp3gained (or had RG in Foobar2000 or similar).

So, if I'm correct, then only for cases where sfb21 bitrate bloat is happening, one should obtain just as good transparency with the smaller file (much closer to Musepack --standard --xlevel bitrates and APS - Y bitrates) when compared to the original lossless file with replaygain applied (even if played back on a 24-bit system, where the noise floor doesn't rise under ReplayGain, because LAME isn't forced to encode inaudible details just to get around the lack of sfb21 scalefactor.

Mind you, can this be right? If so, it implies that one could obtain just as low a bitrate without affecting the original volume by using, for example, --scale 0.5 to obtain -6.0 dB volume change in the encode step then applying a corrective mp3gain Constant Gain of +6.0 dB after encoding (or use anything else that edits the global gain, like mp3DirectCut) to restore the original volume.

If so, surely Lame APS would already use --scale accompanied with an opposite adjustment of the global gain value, like mp3gain, as a more efficient workaround for the sfb21 issue, and could apply it to individual frames of the MP3 where the bloat occurs.

The only exception is if it doesn't work for changes in 1.5 dB steps, but only happens to work for some values between the steps of the global gain, or that the times when it works can't be predicted by LAME (though LAME is well aware of when the bloat is occuring, because the -Y switch lets it ignore the +16 kHz content at those times instead of implementing the bloat-inducing workaround for the lack of sfb21 scalefactor).

In that case, perhaps some files would become MORE bloated when --scale is applied.

Hmm, I think we need a Lame VBR expert to help here.

Wouldn't it be awesome if full-bandwidth LAME APS could be 20-30 kbps smaller without sounding any less transparent or breaking the MP3 standard! I'm just suspecting there's something to stop it working like that, or it would have been done before.

My opinion is that if you intend to always play your tracks on a specific adjusted level, it would be wise to take this into consideration while encoding.

It seems to me that extraction->album analysis->encoding with proper scale value would be the best choice.

Disabling the ath adjustment would be a bad idea. The ath adjustement is taking care (a little) about a possible misadjustement of track level (that is when the sound engineer did not used a proper level for the track).But to my mind, it is also taking care of the constant sensitivity adjustement of the middle ear. Disabling this would reduce the quality. (I used "to my mind" because some do not agree about this)

Using the auto ath adjustement to take into consideration a whole track level misadjutement is not very efficient. First, the ath adjustement is limited in amplitude, and second it is progressive and not instantaneous.

You can alter the ath base level with --athlower, but I would prefer to use --scale.

But how can the encoder know how much I turned my volume knob when I listen to the song (or if I replaygain them afterwards etc)? The only assumption it should be allowed to make about this is IMO that you don't want to ruin your ears when listening and thus that the peak (up until that point in the file to have a "causual" encoder) is below the threshold of pain. LAME seems to make other assumptions as well according to these tests. Garf, do you know how this works in the vorbis encoder?

Vorbis assumes that the loudest sound is played back at a level no more than what will blow yours ears out, and then takes the most pessimistic assumptions about ATH and masking that are applicably given the above.