hi,today I tried to ABX lossywav 1.1.0 with the Ginnungagap problem sample in order to see if there was any improvement from my previous small personal test (Old Test see post 68 & 69), I was still able to ABX 100% Q1.0 easily, but I was quickly surprised that Q1.5 was, at the beginning, much harder than previously (I failed once then I was forced to concentrate) but after some retraining I was still successfully able to ABX it 100% ...

then I tried Q2.0 & I failed 100% ... I failed so badly that I decided to edit the wav to focus on the 3 easiest seconds to ABX ... I retried & put the sound level up & ... I failed 100% ...

so well done Nick I cannot ABX Ginnungagap at Q2.0 anymore & the flaw is slightly reduced at lower settings.

1. Is noise shaping only used with the portable preset (and below) or also with the standard preset (and maybe higher presets)?Seems that it's always used with respect to the chosen quality value (q/10).

2. A comparison of ReplayGain values indicates that peak sample values increase with lossyWAV, reaching 1.0 where the originals were below 1.0. May that indicate possibly clipped samples?

3. Can lossyWAV safely be used with 96 kHz material without any disadvantages? Does it depend on whether noise shaping is used or not?

...then I tried Q2.0 & I failed 100% ... I failed so badly that I decided to edit the wav to focus on the 3 easiest seconds to ABX ... I retried & put the sound level up & ... I failed 100% ...

so well done Nick I cannot ABX Ginnungagap at Q2.0 anymore & the flaw is slightly reduced at lower settings.

Many thanks for the additional ABX testing sauvage78! I'm very pleased that you couldn't ABX the most recent problem sample at -q 2.0. This goes some way to reassure me that the --portable preset is a good one (-q 2.5).

1. Is noise shaping only used with the portable preset (and below) or also with the standard preset (and maybe higher presets)?Seems that it's always used with respect to the chosen quality value (q/10).

2. A comparison of ReplayGain values indicates that peak sample values increase with lossyWAV, reaching 1.0 where the originals were below 1.0. May that indicate possibly clipped samples?

3. Can lossyWAV safely be used with 96 kHz material without any disadvantages? Does it depend on whether noise shaping is used or not?

1) You can disable shaping with --shaping 0;2) Samples will sometimes increase due to rounding off lsb's, sometimes decrease - increasing to 32768 will cause some clipping for 16-bit and will be changed to (32767 shr bits-to-remove) shl bits-to-remove. Clips are counted per channel per codec block. If the number of clips exceeds a preset value (see --longhelp) then the bits-to-remove for that channel is reduced by one and the bit removal process is repeated until the number of clips for that channel does not exceed the permitted number or the number of bits removed is zero;3) Yes - noise shaping is optimised for 44.1kHz and 48kHz - thanks to SebastianG's coefficients - if worried about this noise shaping, use --shaping 0 as above.

2) Peak samples will sometimes increase due to rounding off lsb's - this will cause some clipping at +32768 for 16-bit and will be changed to (32767 shr bits-to-remove) shl bits-to-remove;

NickC, I was also interested in this explanation.

Am I right to presume that 'shr' is 'bit-shift to the right' by the number of places following the 'shr' command (discarding the LSBs that shift off the end), and 'shl' is 'bit-shift left', shifting to the left while setting the new LSBs to zero.

Rather than do my 2 to the power of 5 calculation, this was doneby shifting the maximum value of 32767 by 5 places to the right,going to 0000 0011 1111 1111then back to 5 places to the left, filling LSBs with zeroes:going to 0111 1111 1110 0000

So (2 ^ bits-to-remove) = 32 is the clipping error, which is 1/1024th of the target signal amplitude in this case, and represents a smaller error than 32 from the original signal (presumably between 16 and 31), where the target bits-to-remove would have generated a rounding error of 15 if it had 17 bits available to round upwards instead of having to round downwards.

In this 5-bit case, this clipping error is equivalent to clipping caused by increasing gain by 0.0085 dB above full scale, which is very low-level, and might reassure other users (e.g. try amplifying a full-scale signal by 0.0085 dB and ABX the clipping distortion, which will exceed the distortion in lossyWAV [edit]when 5 bits are removed, that is[/edit]). The sample error adds energy at about -60dB relative to a full-scale sample in this case, which is only mildly indicative of what scale of event may happen in the frequency domain to which the ear responds.

Things get more complicated with noise shaping in use, though presumably there's a feed-forward of accumulated error (like with error-diffusion dither in imaging) which enables the always-negative clipping adjustment to be offset by greater likelihood of positive shifts in following samples, and presumably the instantaneous clipping is likely to be incorporated into the high frequency end of the shaped noise unless there happen to be numerous successive clipped samples, which naturally means lower frequencies.

You can although by default, lossyWAV does not enable dithering of any kind at any quality preset. Dither will add more noise than simple rounding. Also, the simple rounding is in some ways analogous to a random dither, albeit with an indeterminate probability density function.

In terms of best practice, the default setting (--standard == -q 5 --shaping 0.5) should be more than adequate for most samples / music. There is little difference between 16 and 24 bit sample depth as you may end up with, say, 8 bits left from the same codec-block for each depth if the method determines that.

Most testing has been carried out at 44.1kHz / 16bit - but higher sample rates should be fine.

What would be best practice in terms of settings (dithering, noise shaping) for these types of audio:- 16-bit, 44.1/48 kHz- 24-bit, 44.1/48 kHz- 16-bit, 88.2/96/176.4/192 kHz- 24-bit, 88.2/96/176.4/192 kHz

It may be that I should put in place a mechanism whereby the lengths of the FFT analyses used in the processing change from 64/1024 samples at 44.1/48kHz to 128/2048 samples at 88.2/96kHz and 256/4096 samples at 176.4/192kHz. This would require that the codec-block-length would change from 512 to 1024 and 2048 samples respectively.... I'll think on it.

I created four test samples containing white noise: 16-bit/48 kHz, 16-bit/96 kHz, 24-bit/48 kHz and 24-bit/96 kHz. Then I did a frequency analysis of the difference between the original and lossy conversions (default, shaping 0, shaping 0 + dither 1). I conclude three things:

Dithering seems to have no benefit. It just adds noise on top of the existing quantization noise, thus increasing the noise floor.

The bit-depth doesn't seem to matter.

Noise shaping benefits from higher sample rates because the noise is moved even further into inaudible frequency ranges.

I created four test samples containing white noise: 16-bit/48 kHz, 16-bit/96 kHz, 24-bit/48 kHz and 24-bit/96 kHz. Then I did a frequency analysis of the difference between the original and lossy conversions (default, shaping 0, shaping 0 + dither 1). I conclude three things:

Dithering seems to have no benefit. It just adds noise on top of the existing quantization noise, thus increasing the noise floor.

The bit-depth doesn't seem to matter.

Noise shaping benefits from higher sample rates because the noise is moved even further into inaudible frequency ranges.

So, are you more comfortable now having seen the results - I certainly am, thanks for the effort.

QUOTE (Dynamic @ Jul 25 2008, 22:28)

So (2 ^ bits-to-remove) = 32 is the clipping error, which is 1/1024th of the target signal amplitude in this case, and represents a smaller error than 32 from the original signal (presumably between 16 and 31), where the target bits-to-remove would have generated a rounding error of 15 if it had 17 bits available to round upwards instead of having to round downwards.

In this 5-bit case, this clipping error is equivalent to clipping caused by increasing gain by 0.0085 dB above full scale, which is very low-level, and might reassure other users (e.g. try amplifying a full-scale signal by 0.0085 dB and ABX the clipping distortion, which will exceed the distortion in lossyWAV [edit]when 5 bits are removed, that is[/edit]). The sample error adds energy at about -60dB relative to a full-scale sample in this case, which is only mildly indicative of what scale of event may happen in the frequency domain to which the ear responds.

Things get more complicated with noise shaping in use, though presumably there's a feed-forward of accumulated error (like with error-diffusion dither in imaging) which enables the always-negative clipping adjustment to be offset by greater likelihood of positive shifts in following samples, and presumably the instantaneous clipping is likely to be incorporated into the high frequency end of the shaped noise unless there happen to be numerous successive clipped samples, which naturally means lower frequencies.

As bits-to-remove increases, then the difference between what the sample should have been and the clipped sample increases accordingly. However if the bits-to-remove value is high then it is very likely that the audio in that codec-block channel is loud and the clipped sample may be obscured.

It is worth saying again that at preset --standard, only one clip is allowed per channel per codec-block (22.7 micro-seconds.) and from testing, this does not seem to be audible.

1. The FFT size should vary with sample rate, though good luck making that happen easily with all the optimisation you've done!

2. IMO and IIRC, enabling dither shouldn't raise the noise floor much on average - the noise floor should stay the same, but more bits will have to be kept the achieve this.

I don't enable dither either.

Cheers,David.

I will modify 1.1.0 to increase the FFT lengths at 69.08kHz, 138.15kHz and 276.3kHz, i.e. 64 to 128 to 256 and 512 samples respectively and correspondingly for the other lengths with a similar increase in codec-block length, 512 to 1024 to 2048 to 4096 samples. This means that the maximum length of FFT will be 8192 samples. [edit] arithmetic failure [/edit]

The spreading functions will not require to be changes as they will be working over approximately the same number of FFT bins after each change, taking into account the increase in sample rate.

I feel that an upper frequency limit of 384kHz is high enough, although I am open to suggestions.

Basically this means that at present I am not confident in the high (i.e. >48kHz) sample rate performance of lossyWAV 1.1.0 and would caution anyone using it at these sample rates against using it for anything other than testing purposes.

On dither, of course you're right - the increase in noise due to dithering will mean that fewer bits can be removed to keep the added noise to the same level that it would have been had to no dither been used used. I will look at creating additional reference threshold constants for the range of dither between --dither 0 (rectangular) to --dither 1 (triangular) to allow dither to be "safely" used.

Basically this means that at present I am not confident in the high (i.e. >48kHz) sample rate performance of lossyWAV 1.1.0 and would caution anyone using it at these sample rates against using it for anything other than testing purposes.

Do you refer to lossyWAV in general or to how noise shaping is applied?

QUOTE (Nick.C @ Jul 14 2008, 22:38)

My intention is to understand and implement SebastianG's new noise shaping method, but for that I will also have to introduce / find a PSY model of some kind.

I would hope that by using the new noise shaping method some additional bits can be removed for the same apparent quality level of output, thereby further reducing the bitrate.

What would happen if noise shaping is disabled via --shaping 0? Would that be taken into account by *not* removing those "additional bits" then? Otherwise the non-shaped results might be pretty bad in comparison if used with lower quality settings.

I'm also wondering if trading further removal of bits for better noise shaping really yields useful results as both methods seem to cancel each other out:

Do you refer to lossyWAV in general or to how noise shaping is applied?

What would happen if noise shaping is disabled via --shaping 0? Would that be taken into account by *not* removing those "additional bits" then? Otherwise the non-shaped results might be pretty bad in comparison if used with lower quality settings.

I'm also wondering if trading further removal of bits for better noise shaping really yields useful results as both methods seem to cancel each other out:

Removing more bits:- lower filesize- more noise

Stronger noise shaping:- higher filesize- less (perceived) noise

I was referring to lossyWAV in general as the 64/1024 sample fft lengths are fixed at present.Disabling noise shaping will reduce filesize and increase perceived noise. However the added noise (especially at higher quality presets) should be at or below the existing noise floor.

The noise shaping implementation results in a trade-off between bits removed and filesize. That is why the option remains for the user to disable noise shaping. At --insane the addition of noise shaping only adds a few kbit/s to the FLAC encoded lossyWAV output. The increase in bitrate is substantially more at lower quality presets.

Modifications for 1.1.0b:

implementation of increasing fft length for increasing sample rate;

improved logfile output and --detail output;

reference threshold constants for rectangular dither and triangular dither have been calculated so added noise should be the same for dither off and any dither level between 0 and 1 - the number of bits-to-remove will however reduce with "increasing" dither.

[*]reference threshold constants for rectangular dither and triangular dither have been calculated so added noise should be the same for dither off and any dither level between 0 and 1 - the number of bits-to-remove will however reduce with "increasing" dither.I expect to post lossyWAV 1.1.0b tonight.

I'm not clear under what circumstances it is appropriate to use dither. Can anybody explain?

I was referring to lossyWAV in general as the 64/1024 sample fft lengths are fixed at present.

Is that the reason why material with 96 kHz is currently reduced to the same filesize as material with 48 kHz? Are higher sample rates currently treated like if they are played back at a lower speed/rate, thus lowering the frequency spectrum accordingly, which results in a wrong calculation of how many bits can be removed?

QUOTE (Nick.C @ Aug 3 2008, 09:45)

The noise shaping implementation results in a trade-off between bits removed and filesize. That is why the option remains for the user to disable noise shaping.

Does that mean that if noise shaping is disabled less bits are removed? That would reassure me.