I always thought that there is unsuitable real-world wave input for lossyWAV (i.e. an input which won't result in significantly higher compression when prepocessed with lossyWAV). I just came across an example which seems to defy that rule. I am not sure why. Maybe user error?

The source file is the mp3 audio track of a documentary (so it's mostly human voice with occasional background music and f/x), converted to wav (16 bit stereo 44.1 KHz PCM).I used --portable for lossyWAV and highest compression (8) for FLAC.

The compressibility of the correction file suggests that the source wave file was not changed much. I inspected the correction file to confirm that. There are really very little changes and even those are hard to spot (it looked like a flatline and even when pointed to the peak values by Wavelab. Only after maximizing the vertical scale AND increasing gain by 400% I could barely see some not-zero-values).

The point is, preprocessing with lossyWAV saved only 0,2% in filesize (YES, I used 512 blocksize for flac),while wiki gives these indicative expected results:

QUOTE

lossless flac: 854 kbit/s--portable lossywav flac: 376 kbit/s (-56%)

This is quite different from my results. Both in terms of absolute kbit/s and relative savings (-56% vs -0,2%).

thank you Northpack. This increases speed to 9x, but still isn't close to the 33x I had with 1.1.0c

I guess that's still due to the new (fixed) noise shaping algorithms. Maybe Nic can comment on that. If you want it fast, I suppose you should stay with 1.1.0c

QUOTE (chrizoo @ Sep 19 2011, 16:50)

I always thought that there is unsuitable real-world wave input for lossyWAV (i.e. an input which won't result in significantly higher compression when prepocessed with lossyWAV). I just came across an example which seems to defy that rule.

No surprise here. It's known that there are some kind of signals which FLAC itself compresses very efficiently (i.e. solo piano music). Those signals can actually have a worse compression ratio when preprocessed with lossyWAV, because it forces FLAC to use a very small block size of 512 samples, thus decreasing it's efficiency (while lossyWAV can't always make up for the loss in those cases, see the small correction file).

I always thought that there is NO unsuitable real-world wave input for lossyWAV

QUOTE (Northpack @ Sep 19 2011, 17:30)

If you want it fast, I suppose you should stay with 1.1.0c

supposing 1.3.0 is more efficient (and generally better quality thanks to noise shaping), I stay with 1.3.0,I just thought I report that, maybe there is a problem Nic could look intoor some error on my behalf

1) lossyWAV up to v1.2.0 used an approximation for SQRT and LOG2 functions. These routines were faster than the FSQRT and FYL2X x87 instructions at a cost of accuracy. These approximations were removed in the run-up to the release of 1.3.0 as I realised that the development problems with adaptive-noise-shaping that I was experiencing were due the use of approximations rather than the best accuracy available. This change increases processing time.

2) When the latest quality presets were developed I decided to enable the shortest FFT (32 samples, circa 0.73 msec for 44.1kHz output) for all quality presets. This change also increases processing time.

3) If either adaptive noise shaping or fixed noise shaping are enabled then the remove-bits procedure uses an FIR filter (default 96 tap) as part of the bit-removal process. This change also increases processing time.

Regarding content where processing using lossyWAV and compression using FLAC results in a larger file-size - this is not a new discovery - content was found quite early in the original development of lossyWAV where there was no advantage gained.

@Nick.C: many thanks for the explanation for the speed impact. In a nutshell it is due to better quality/higher accuracy. Knowing this, the user is certainly more comfortable with the slower speed :-)

QUOTE (Nick.C @ Sep 19 2011, 19:40)

Regarding content where processing using lossyWAV and compression using FLAC results in a larger file-size - this is not a new discovery - content was found quite early in the original development of lossyWAV where there was no advantage gained.

which type of content is that?

As said above, I would understand that with audio where flac alone compresses extremely well (piano solo, as indicated by Northpack),but in my case ... ?

QUOTE (Northpack @ Sep 19 2011, 17:30)

No surprise here. It's known that there are some kind of signals which FLAC itself compresses very efficiently (i.e. solo piano music).

@Northpack: From my posting, you know that the audio is nothing of that kind. It's a TV documentary, with voice, music, effects, etc. If that's not an input where lossyWAV can play out its strengths, then what is !(?) I think there is more to my findings than you want to accept. If - despite the nature of the audio input - the source.flac file were already almost "perfectly" compressed (leaving only a tiny margin of reduction of 0,2%), then why all of a sudden the new version yields 11% ? See. This suggests, that there was some problem with 1.1.0c. Once that is established, the question whether the 11% is too low compared to the average file size savings of 56% is legitimate.

@ chrizoo: Please post a 30 second clip to allow corroboration of findings.

QUOTE (chrizoo @ Sep 19 2011, 20:43)

which type of content is that?

Content with reduced signal strength at some point in the range 20Hz to c.16kHz. Number of bits to remove is calculated based on the average signal strength and also minimum signal strength over the calculation range.

Thanks for taking the time to provide me with the sample. Below are a sox spectrogram of the lossless original and the post-analysis frequency "plots" (short and long) contained in the lossyWAV.log file. From these it can be seen that there is next to no signal above bins 24 of 32 and 178 of 512 (c. 16.5kHz and 15.5kHz respectively). This explains why you were getting no reduction in size - no bits were being removed due to no signal at the high end of the calculation range (varies upwards with quality setting from about 15.3kHz). As you can see, I have used --limit 14000 for this processing - the resultant lossy.flac file is 196MiB (315 kbit/s), the lwcdf.flac file is 260MiB.

So, yes this is a sample that will not benefit from lossyWAV processing using default settings. However, simply setting the upper calculation frequency limit to below the lowpass that seems to have been applied to the sample fixes this. It almost looks as is this sample has been resampled from 32kHz (NICAM frequency?).

sorry for the late reply, I was gone.Thank you for having looked into that issue.

How can we make generalizations based on this specific example? What conclusions can be drawn from it on a generic/general level?

I do not understand why a lowpass would affect the zeroing of LSBs and thus lossyWAVs inability to make a wav more compressible.There is also no mention anywhere (lossyWAV wiki, cmdln help, etc.), that lowpassed audio does not work with LossyWAV.

Also: how can the sudden jump from 0,2% to 11% (see above) be explained?

lossyWAV is a near lossless audio processor which dynamically reduces thebitdepth of the signal on a block-by-block basis. Bitdepth reduction adds noiseto the processed output. The amount of permissible added noise is based onanalysis of the signal levels in the default frequency range 20Hz to 16kHz.

If signals above the upper limiting frequency are at an even lower level, theycan be swamped by the added noise. This is usually inaudible, but the behaviourcan be changed by specifying a different --limit (in the range 10kHz to 20kHz).

For many audio signals there is little content at very high frequencies andforcing lossyWAV to keep the added noise level lower than the content at thesefrequencies can increase the bitrate dramatically for no perceptible benefit."

So, in essence, processing content using lossyWAV with little or no content at one or more points in the calculation range (20Hz to c.16kHz) will result in less bits removed. This is not a lowpass - this is an outcome of the frequency range in which lossyWAV searches for the lowest frequency bin result. The lowest frequency bin result influences the number of bits that lossyWAV will remove from a particular block of audio. This is mentioned both in the long help and in the wiki (under "Quality Presets").

Lowpassed audio by definition has little or no content above a set frequency. If that frequency is below the upper calculation for lossyWAV then fewer (if any) bits will be removed.

I believe that your recent change from v1.1.0c to v1.3.0 explains the "sudden jump between 0.2% and 11%" as the quality presets were amended during the development of v1.3.0.

first of all I'm sorry that I raised the issue when the answers seem to have been there already,but I simply failed to understand them.

Thanks for explaining. Let me see if I understood:

In essence, for audio input with low levels above 16 kHz there won't be much LSB-zeroing (at least if --limit parameter is not specified) because there is a greater risk that the added noise might be audible. Correct?

What frequency range is the added noise in? Or does this vary a lot depending on the audio input?

QUOTE

with little or no content at one or more points in the calculation range (20Hz to c.16kHz) will result in less bits removed.

... but not because it can't be done, BUT for "security reasons", right (as mentioned above)?

QUOTE

I believe that your recent change from v1.1.0c to v1.3.0 explains the "sudden jump between 0.2% and 11%" as the quality presets were amended during the development of v1.3.0.

oh, that explains it then. thanks.

QUOTE (Nick.C @ Oct 5 2011, 21:40)

This is not a lowpass - this is an outcome of the frequency range in which lossyWAV searches for the lowest frequency bin result.

I wasn't implying that lossyWAV does a lowpass, but that the audio input HAD been lowpassed previously (before parsing with lossyWAV).

What conclusions can non-experts users such as myself can draw from all this? Do we have to (or should we?) create a spectogram each time before processing audio with lossyWAV to see (A) if lossyWAV preparation will lead to a noteworthy file size reduction and (B) if we need to use a non-default value for the --limit parameter ?

(PS: This also means, that instead of sending you 90 minutes of lossless audio, I could just have sent you a jpeg image file of the spectogram? lol)

What conclusions can non-experts users such as myself can draw from all this? Do we have to (or should we?) create a spectogram each time before processing audio with lossyWAV to see (A) if lossyWAV preparation will lead to a noteworthy file size reduction and (B) if we need to use a non-default value for the --limit parameter ?

What I do is process the file using Lossywav in the normal way. If the result doesn't give me the file size reduction I'm looking for I make a choice between a) Live with it, b) use a lossy encoder instead, c) stick with FLAC

You'll also get zero bits removed if there are deep+wide notch filter(s) anywhere in the audible band.

If there's a chunk of spectrum that you allow lossyWAV to analyse that's noise-free (or content-free), and it's wide enough not to get hidden during the spreading functions, then lossyWAV won't throw any bits away at all. It can't, because doing so would add noise above the noise floor - which is exactly what lossyWAV is designed to avoid.

I'm not sure if any release from Nick has ever taken the Minimum Audio Field / Absolute Threshold of Hearing into account, which would let you add more noise at higher frequencies even if they were included in the analysis. It doesn't help very often, and requires a (potentially very wrong) assumption about the replay level, so it's not really worth it IMO. It could have helped a little here though. Not as much as changing the limit or resampling to 32kHz though. If you were designing lossyWAV into a lossless codec, you'd put hidden automatic internal resampling in there for situations like this - i.e. any audio where downsampling is basically lossless gets downsampled when encoding, and then upsampled again when decoding.

My advice is just to live with it. If you ever find a lot of content like this (e.g. discs sourced from 128kbps mp3s) you could resample the lot to 32kHz yourself if you want to use lossyWAV and care about efficiency. Or specify a lower analysis frequency limit.

It's interesting that VBR mp3 can bloat when there's lots of information at higher frequencies, while lossyWAV can bloat when there's very little information at the highest analysed frequency. It would have been far easier to design mp3 to avoid this than it would been to design lossyWAV to avoid it.

This is an absolutely fantastic tool, many thanks for implementing it.

I just ran a test with Absolution by Muse, and it introduced significant audible clipping on a few tracks. My settings were -q X -l 10000. These settings worked fine and proved to be transparent for a jazz sextet CD I tested a few days ago, but the results here are rather nasty. Any suggestions for improving results, aside from upping the q parameter?

You could try using scaling to reduce the amplitude of the signal (pre-processing). This should help to avoid clipping.

Can you please post a clip of an affected area of the sample (less than 30 seconds in length)?

[edit] .... The "-l 10000" is ignoring any signal dips over 10kHz. This will allow signal over 10kHz to potentially be swamped by the added noise (adaptively shaped using your command line). This could be your issue.

In this type of situation, a spectrogram can be useful to determine whether there is any signal above the selected limit (upper frequency limit used in calculations when determining lowest signal power FFT bin). [See the example at post #37]

I suggest that you try removing "-l 10000" from the command line and determining whether that solves the issue. [/edit]

Is it possible to get lower quaity (and bitrate) than Extra Portable? I just want to hear what kind of distortions it brings into the sound. It is said about Extra Portable - "not fully transparent", but was is proved? Are there any problematic samples for lossyWAV 1.3.0?

IMO the 'not fully transparent' statement is to be understood from lossyWAV history and from the high quality claims lossyWAV has.lossyWAV is meant to use a lossless codec in a lossy bit saving way without making the simplifications audible. And it worked more or less from the start at the quality level 'standard' at an average bitrate of roughly 500 kbps. Nick.C then did a restless job in improving things even before applying adaptive noise shaping, and transparency was achieved at a bitrate of roughly 400 kbps.Unfortunately there were not many people who did listening tests with resultant audible issues (easy to understand as lossyWAV quality was great from the very start), so nothing is known for sure.After Nick.C introduced adaptive noise shaping the transparency border probably is much lower than 400 kbps, perhaps transparency is achieved even with 'extra portable' (I can't remember anybody having reported problems), but because of the lack of listening tests and because of lossyWAV's quality claims the description of the various quality levels is rather a bit conservative I guess.

Nick.C can report on this better, but as he didn't reply yet may be he's on holidays.

As Horst has said, the aim during lossyWAV development was to produce a perceived quality (even at the lowest quality setting) that was likely* to be transparent.

I say likely as, as has also been mentioned, there were few (but very much appreciated) ABX testers during the development phase.

Problematic samples in lossyWAV tend to manifest themselves as samples that the process does not manage to remove many bits from - these generally have dips in the frequency range inside the calculation range where the lowest frequency response is determined.

Problematic samples in lossyWAV tend to manifest themselves as samples that the process does not manage to remove many bits from - these generally have dips in the frequency range inside the calculation range where the lowest frequency response is determined.

Do I take it that you mean the only 'problem' in these samples was only that the bitrate didn't get reduced compared to lossless? I don't recall reading of any non-transparency in those cases.

As an aside - I had until recently lost touch with Hydrogen Audio and lossyWAV for quite a few months. I've caught up with the 1.3 development thread (link available via lossyWAV Wiki), which was fascinating reading.

May I offer my belated congratulations and thanks to you, Nick, and also to David, Horst and Sebastian for their valuable contributions. I have to say that my ears, equipment and listening environment seem inadequate (from the time or two I tried) to detect the subtle non-transparencies that Horst reported on eig and furious killer samples at the lowest quality settings.

I remember the original idea was exciting - essentially smartly counteracting the bitrate bloat caused by excessive mastering loudness by exploiting the raised noise floor - something which had previously been partially achievable using wavgain to re-scale the audio, albeit with a permanent volume normalization, though that didn't bother me personally.

The compressed bitrates you've now achieved with a pre-processor that only zeroes some of the LSBs is quite remarkable. All this achieved with no filtering of the audio spectrum and no temporal smearing either. It's an awesome piece of work.

And yet there's probably still scope to go a little further and calculate psychoacoustic masking curves under which the shaped noise spectrum could be fitted, which would presumably also calculate bits-to-remove instead based on the area under the masking curve and thus overcome the 'problem' with lowpassed or (rarely) bandpassed material automatically.

The crux of the issue is whether or not the adaptive noise shaping algorithm's influence on the rounding decision takes account of the actual error signal from neighbouring samples (including the enlarged negative-going error signal caused by lossyWAV's form of clipping and might therefore compensate for much of that error and its spectral and DC-bias effects in the rounding decisions made over a number of subsequent samples). My supposition is that it does use the actual error, but if it instead assumes, say, a typical statistical distribution of rounding errors, for example, my thoughts might be completely misguided. I suspect Nick.C or SebastianG will be best placed to put me right on this issue.

Early on, 2Bdecided spotted the issue with clipping of positive values near full scale that cannot be rounded up to +32768 because that number cannot be represented in 16-bit signed binary and we can't use +32767 and still keep the zeroed LSBs across the block to provide coding efficiency to the lossless encoder that recognises unused-bits.

There's a Max Clips Per Block variable in Nick.C's lossyWAV to limit the number of clippings allowed in any single codec block. If there are too many clips, the Bits To Remove value for that block of 512 samples is reduced and the block is re-processed to see if Max Clips Per Block is no longer exceeded or if Bits To Remove needs to be lowered again. The choice of Max Clips Per Block is decided by the quality preset or by user over-ride.

This helps statistically limit the amount of net DC offset that might be introduced into the added noise within the 512-sample block and helps ensure the energy of the additional error spikes must remain spread over a relatively broad-spectrum (i.e. short duration provides little power per spectral bin) and contribute only a small amount of total noise energy over the 512-sample block compared to the noise that should have been added by the lossyWAV rounding procedure when operating without clipping.

I'm wondering (and not sure if SebastianG or Nick.C is best placed to answer) whether the Adaptive Noise Shaping (ANS) enabled in version 1.3, involving the carrying-forward of rounding/truncation errors in such a way that they are spectrally-shaped, actually might have mitigated a lot of the original "clipping" concern (so long as there aren't an enormous number of clips, occuring at very periodic intervals to force a tonal error signal that might be unmasked).

In other words the fact that the round up or round down decision is biased by the errors in other samples in the 512-sample block and the spectral shape of the signal that the residual noise can be masked behind, may help prevent clipping errors from accumulating across the block as a whole and might even offset any DC bias by the end of the block.

My interpretation might be naive, in thinking of it a little like graphical bit-depth reduction, specifically Error Diffusion image dithering where rounding error is carried over to bias the rounding decision in subsequent pixels. Clearly in the lossyWAV ANS case, the error signal is also spectrally shaped by influencing the decision to round up rather than down or down rather than up according to the shaping filter.

Is it the case that the actual rounding error introduced is considered over the duration of the shaping filter so that for a sample where clipping occurs, the resulting negative DC offset and the additional error magnitude would tend to be counteracted by contributing to the rounding and noise-shaping decisions of numerous other samples in the same codec block rather than resulting in a net DC offset for the block and a spectrally-spread additional noise contribution (assuming each clipping error in brief, so resembles a Dirac Delta Function (negative-going one-sample spike with white spectrum or low power spectral density).

Thinking of reducing maximum clips-per-channel-per-codec-block to zero when adaptive shaping is active to reduce the possibility of the current method which allows a certain number of clips from "overloading" the noise-shaping filter(s).

If my interpretation is right, however, I wonder if the Maximum Clips Per Block value should actually be allowed to increase further when ANS is enabled or should a different measure of the residual (lwcdf) spectrum be used instead when forced down-rounding of high positive values has been detected in a block, for example to ensure that it's a reasonable match to the target noise spectrum without too much of a DC offset or any large spectral peaks above that targetted noise spectrum?

If I'm being hopelessly naive, I don't insist you explain it so that I understand. You're welcome to tell me that it just doesn't work like that (e.g. it doesn't take into account the actual error signal, just a statistical assumption). You probably have more important things to do with your time. My serious knowledge of digital signal processing stuff is limited to what I needed in work on very high rate single-bit discriminated instrumentation, rather than audio file manipulation / post-processing and didn't include noise shaping filters.