It's true some heuristics were introduced, especially spreading and skewing - spreading from the very start. Without these heuristics the method may have a better justification, but it comes at the price of a seriously increased bitrate.With the advanced options everybody who wants to can get rid of the heuristics: -skew 0 -snr 0 -fft 10101 -spf 11111-11111-11111-11111-11111 -nts 0 for instance when using a 64, 256, and 1024 sample FFT.I personally love the reduced bitrate given by spreading and skewing, and I feel secure enough with it according to experience.

I agree however that this gives rise to the question whether we should readjust the quality levels. Maybe -1 should go to Axon's pure method, and maybe -2 should be a mixture of current -2 and -1, for instance the FFT usage like that of -1 (maybe dropping the 128 sample FFT), but with an -nts value of 2.I personally would agree with such a solution.

ADDED:I just saw your new beta, Nick. So I see -snr should be negative to the limit for avoiding the skewing/snr heuristics. Spreading length should be 1 however IMO to avoid the spreading heuristics. The constant spreading of 4 was just 2Bdecided's spreading heuristics at his start up as far as I can see it. There's no reason IMO to use a blocksize of 1024. 2Bdecided just used a 1024 sample block size when he started things.Of course not averaging FFT outcome at all is fine in a pure sense but is suspected to be a huge overkill especially in the high frequency range bringing bitrate up.

It's true some heuristics were introduced, especially spreading and skewing - spreading from the very start. Without these heuristics the method may have a better justification, but it comes at the price of a seriously increased bitrate.With the advanced options everybody who wants to can get rid of the heuristics: -skew 0 -snr 0 -fft 10101 -spf 11111-11111-11111-11111-11111 -nts 0 for instance when using a 64, 256, and 1024 sample FFT.I personally love the reduced bitrate given by spreading and skewing, and I feel secure enough with it according to experience.

I agree however that this gives rise to the question whether we should readjust the quality levels. Maybe -1 should go to Axon's pure method, and maybe -2 should be a mixture of current -2 and -1, for instance the FFT usage like that of -1 (maybe dropping the 128 sample FFT), but with an -nts value of 2.I personally would agree with such a solution.

ADDED:I just saw your new beta, Nick. So I see -snr should be negative to the limit for avoiding the skewing/snr heuristics. Spreading length should be 1 however IMO to avoid the spreading heuristics. The constant spreading of 4 was just 2Bdecided's spreading heuristics at his start up as far as I can see it. There's no reason IMO to use a blocksize of 1024. 2Bdecided just used a 1024 sample block size when he started things.Of course not averaging FFT outcome at all is fine in a pure sense but is suspected to be a huge overkill especially in the high frequency range bringing bitrate up.

At present you can't use a negative -snr value, it's safely forced in the code.

Bearing in mind that the source FLAC files amount to 69.36MB / 781kbps, that's not really a great saving.

The pure method isn't attractive to you, and it isn't attractive to me. But it's intrinsically safe as Axon said.

QUOTE (Nick.C @ Nov 27 2007, 23:52)

[edit] And the 4 bin spreading function was there from the very beginning in David's original script. [/edit]

Yes, 2Bdecided used this spreading heuristics from the very start, and we've improved upon it - both with respect to quality and bitrate saving.

ADDED:I just re-read Axon's post. I'm not sure any more if he dislikes spreading as he seems to accept the critical band heuristics being the most important basis for our current spreading parameters. Sure this means already to accept some heuristics.Anyway the question remains: should we have the -1 configuration in such a way that configuration details have a very high degree of theoretical justification?

The primary advantage of lossless formats, it seems to me, is the future-proof factor (being able to benefit from it when a new and better encoder or a different format comes around rather than having that option made unattractive by the huge quality per bitrate losses involved in transcoding). So has anybody done listening tests to see how files processed by lossyWAV do when encoded into MP3/AAC/Vorbis/whatever?

Also, where is the preferred place to discuss lossyWAV? It seems like it would belong in the "other lossy formats" forum, but all the discussion of it seems to be restricted to this thread and the original thread in the FLAC forum.

I'm just wanting to see if my understanding of the preprocessing method is somewhat accurate:Let's say that an amplitude of part of a 16-bit wave is +32295 (1111111000100111), LossyWAV will simplify (not "clip" , oops maybe I meant snip?) it so that the binary value contains many trailing zeros so that FLAC will compress those away as wasted_bits. The processed value of that amplitude will then become something like +32256 (1111111000000000) and save 9 bits. Is this the basic principle? Just wanting a little bit of clarification, thanks

I just re-read Axon's post. I'm not sure any more if he dislikes spreading as he seems to accept the critical band heuristics being the most important basis for our current spreading parameters. Sure this means already to accept some heuristics.Anyway the question remains: should we have the -1 configuration in such a way that configuration details have a very high degree of theoretical justification?

Well, insofar as nothing in psychoacoustics is set in stone and there are going to be heuristics to evaluate very complicated phenomena, you can't escape them. I mean, the Bark scale seems like a hack in the first place, as every closed-form EBW equation probably is.

But clearly, spreading exists in any halfway-complete masking model. To leave such a tempting bone out there without chewing on it is madness. I'd just like to know how the predicted -spf numbers line up against what the tunings are, and have an option to use the theoretical numbers.

I would use a different option than -1 for a setting that matched theoretical predictions, because there's still a need for -1 to -3 in their current incarnations. Moreover, whatever setting exists must still be absolutely transparent. It seems like 2BDecided's original code had some artifact problems... which makes no sense if it was purely by the book.

I'm just wanting to see if my understanding of the preprocessing method is somewhat accurate:Let's say that an amplitude of part of a 16-bit wave is +32295 (1111111000100111), LossyWAV will clip it so that the binary value contains many trailing zeros so that FLAC will compress those away as wasted_bits. The processed value of that amplitude will then become something like +32256 (1111111000000000) and save 9 bits. Is this the basic principle? Just wanting a little bit of clarification, thanks

808

Yes, that essentially is it. It's only a bit the other way around, and clipping isn't a correct description. LossyWAV decides on a per block analysis how many least significant bits are considered not essential for the 512 samples in the block. If it decides for instance that 9 (that's unusually many, let's also consider 3) least significant bits can be ignored then a sample of 1111111000100111 in the block is rounded to 1111111000000000 (resp. 1111111000101000).

The primary advantage of lossless formats, it seems to me, is the future-proof factor (being able to benefit from it when a new and better encoder or a different format comes around rather than having that option made unattractive by the huge quality per bitrate losses involved in transcoding). So has anybody done listening tests to see how files processed by lossyWAV do when encoded into MP3/AAC/Vorbis/whatever?

Also, where is the preferred place to discuss lossyWAV? It seems like it would belong in the "other lossy formats" forum, but all the discussion of it seems to be restricted to this thread and the original thread in the FLAC forum.

In its purest sense, it's lossy, so lossy it is.

All the discussion and uploading lives in here as I am not a member of the developers group and cannot upload in any other forum.

@Halb27: Maybe I'm being a little over protective of the settings we have arrived at after quite a bit of work. Let's rename them as -DAP1, -DAP2 & -DAP3, and start again on the pure method versions. Thinking about it, I feel that -snr may be useful in the pure method.

Attached again (to bring it closer to the conversation) my spreading excel sheet.

...OFR supports wasted bits but I can't see a way for it to use a 512 samples frame size (nor my OPINION is that OFR was designed to work with such a small frame size).

As long as the target codec can work on a multiple of the lossyWAV codec_block_size, or use -cbs xxx to set the lossyWAV codec_block_size to the same as the target codec, or I get off my behind and implement a -ofr parameter to specify codec specific settings (as for WMALSL).

I think OFR support is a story on his own. From a certain point of view, the facts that it supports wasted bits detection and that it shares with LA the crown for the best compression ratios around were very promising. On the other hand I couldn't find any information about the frame sizes OFR uses or a possible undocumented switch to make it work with a frame size fixed by the user.

As a last chance, I got an OFR file (encoded at default setting), damaged one only sample with an hexadecimal editor and checked what happened.As a result, I got exactly five seconds of silence in the middle of the music.

So I couldn't do any better than assuming that OFR is working with a frame size of 220.500 samples (at least on 44.1khz material at default setting), that means practically no chance to use it with lossyWAV.

That's a risky assumption, but that is the little I could do. Obviously, I can't be sure at all about such a conclusion, so, when somebody knows better that would be welcome.

Well, insofar as nothing in psychoacoustics is set in stone and there are going to be heuristics to evaluate very complicated phenomena, you can't escape them. I mean, the Bark scale seems like a hack in the first place, as every closed-form EBW equation probably is.

But clearly, spreading exists in any halfway-complete masking model. To leave such a tempting bone out there without chewing on it is madness. I'd just like to know how the predicted -spf numbers line up against what the tunings are, and have an option to use the theoretical numbers.

I would use a different option than -1 for a setting that matched theoretical predictions, because there's still a need for -1 to -3 in their current incarnations. Moreover, whatever setting exists must still be absolutely transparent. It seems like 2BDecided's original code had some artifact problems... which makes no sense if it was purely by the book.

I gladly see we're all pretty close to each other.And especially I have done a rather bad job explaining the ingredients from the sausage factory. I'll try to do better:

a) the skew and snr options

These options I think have the worst theoretical justification.But: the only thing they can do is to decrease the number of bits removed, to increase the sample accuracy, that is to potentially increase quality compared to not using them.And it was found that they do a very good job in differentiating between 'good' spots where many bits can be ignored and 'bad' spots where we have to keep nearly all the bits.As far as I was busy with that I did not find good skew/snr values by listening tests. Instead I have a set of regular music where many bits on average are expected to be removable, and a set of problem samples where it is known that only few bits can be safely removed. I've looked at the resulting bitrate of these sample classes for deciding on skew and snr. I've done only few listening tests for the skew/snr value finding due to the exclusively defensive nature of using these parameters.

A certain danger drops in with our decision to use a positive -nts value for -2 and -3 which is done because we have an excellent good/bad spot indicator by using skew/snr and because the skew value is something like nts applied to the low to medium frequency range so that we can safely lower the nts demand with respect to this. However this adds a certain risk for the higher frequencies.We do not do this with -1 which is the option best suited to perfectionists.A -nts value of 2 for quality level -2 is so close to 0 that I think the practical advantages of skewing with respect to good/bad spot differentiation outperform the small danger introduced. Sure we can discuss forever whether the default -nts value should be +2 or +2.5 or +1.5 or maybe 0. In practice it's not very important. Moreover -nts is our main option apart from the quality parameter and everybody can set it easily to 0 with -2 or -3.In the end the -nts values for -2 and -1 match very much IMO what we have in mind for these quality levels.BTW at least I don't have this very strong demand for 'secure' transparency with -2 and -3. I do with -1, but with -2 (more so with -3) I accept a very slight risk that the result is not transparent on rare occasion in case I can expect to get only a negligible problem. So in the end it's the typical lossy approach with -2 and -3, but with extremely high demands for -2, and very high demands for -3.

b) spreading

I'm glad you have a positve aspect towards spreading. When allowing for spreading I think David Bryant's idea of taking care of the width of the critical bands is a good starting point for deciding on the spreading details. As far as I was busy with the spreading details my target was to have several FFT bins in every critical band. With this in mind what at first glance looks a bit dangerous with our -spf values, the rather long spreading length of the highest frequency zone with the 1024 sample FFT in fact is a small danger. The problems come rather from the other end, as frequency resolution is pretty low there. But as our spreading length is short there with the long FFTs I think this is adequate. Moreover we do several FFTs, and especially with -1 this should give a very secure result. Last not least we have skewing to bring a big additional safety margin to low frequencies.As far as I was busy with the critical bands my primary considerations ws about number of FFT bins in the critical bands, and I backed these things up again by checking with my regular and problematic sample set looking at the resulting bitrate. Bitrate should be high with the difficult tracks, and rather low with the regular tracks. The final result was that we got a significantly improved security margin for the difficult tracks (compared to what we had before), and a bitrate decrease with the regular tracks. I also did listening tests, but to a minor degree.Of course we can discuss endlessly the details of spreading as well as other details of how to do the FFT anylasis and do simplifications with the result. For instance I personally would prefer a different FFT covering of the blocks, and I would prefer a 512 sample FFT instead of the 256 sample FFT with -2 in favor of giving additional security to the low end. But after all it's not vital to me (beyond myself it's an open question whether that's useful at all), and IMO we have adequate considerations for the various aspects with our current settings.

So I think your aspects which originate from the theoretical basis (ensuring quality a priori without listening tests) are covered well by using -1. This is your quality level, as what we have in mind with -2 and -3 isn't in full congruence with your targets.Sure any practical suggestion for improving things is welcome.

... @Halb27: Maybe I'm being a little over protective of the settings we have arrived at after quite a bit of work. Let's rename them as -DAP1, -DAP2 & -DAP3, and start again on the pure method versions. Thinking about it, I feel that -snr may be useful in the pure method.

Attached again (to bring it closer to the conversation) my spreading excel sheet.

Sorry it was me who brought in some confusion wanting to have -1 going the extremely pure way.I've thought it over at night (see my last post) - and come to the conclusion that with our current -1 we're going the pure way. Stuff from the sausage factory like skewing doesn't hurt quality a bit - the contrary is true. We do have to make some practical considerations for the way we do the FFT analyses, but here too I think this is in agreement with the pure way though details are always disputable.

So I think we can leave -1 as is. Sure suggestions for improvements are always welcome.

-3 is typically used with DAPs as you said, and -2 is a compromise for -3 and -1, kind of a -1 for the more practically minded.

BTW your spreading excel sheet was of high value for me on deciding about the spreading details - as far as it was me who worked out the details.

A suggestion:It looks like it will be hard to disqualify -3 qualitywise (which is a good thing of course). Maybe for testing we can do it the other way around, start with an even less demanding quality setting in such a way that we do get into trouble, and increase the quality demands until quality is fine with the problems found. This way we can get a feeling of how big the security margin of -3 is. It is expected to be small, but who knows?Essentially this means that we should be able to set -nts to a value higher than +6.

... @Halb27: Maybe I'm being a little over protective of the settings we have arrived at after quite a bit of work. Let's rename them as -DAP1, -DAP2 & -DAP3, and start again on the pure method versions. Thinking about it, I feel that -snr may be useful in the pure method.

Attached again (to bring it closer to the conversation) my spreading excel sheet.

Sorry it was me who brought in some confusion wanting to have -1 going the extremely pure way.I've thought it over at night (see my last post) - and come to the conclusion that with our current -1 we're going the pure way. Stuff from the sausage factory like skewing doesn't hurt quality a bit - the contrary is true. We do have to make some practical considerations for the way we do the FFT analyses, but here too I think this is in agreement with the pure way though details are always disputable.

So I think we can leave -1 as is. Sure suggestions for improvements are always welcome.

-3 is typically used with DAPs as you said, and -2 is a compromise for -3 and -1, kind of a -1 for the more practically minded.

BTW your spreading excel sheet was of high value for me on deciding about the spreading details - as far as it was me who worked out the details.

A suggestion:It looks like it will be hard to disqualify -3 qualitywise (which is a good thing of course). Maybe for testing we can do it the other way around, start with an even less demanding quality setting in such a way that we do get into trouble, and increase the quality demands until quality is fine with the problems found. This way we can get a feeling of how big the security margin of -3 is. It is expected to be small, but who knows?Essentially this means that we should be able to set -nts to a value higher than +6.

I have no idea what a negative -snr value is doing. I had thought bringing in snr means giving the relevant min the chance to go lower than when not using snr. From this understanding any snr value has only the chance to make things more defensive compared to not using snr. Sure as we do use a snr value of 21 we will get lower bitrate when turning the -snr value down. However I wonder what makes your problem samples set go so low in bitrate. Guess there's a specific meaning of a negative snr value.

Anyway I'd prefer to use a higher -nts value of up to say 40 instead. It would give us the chance to keep the usual skew/snr combination and go extreme with noise threshold for learning about lossyWAV behavior.

I have no idea what a negative -snr value is doing. I had thought bringing in snr means giving the relevant min the chance to go lower than when not using snr. From this understanding any snr value has only the chance to make things more defensive compared to not using snr. Sure as we do use a snr value of 21 we will get lower bitrate when turning the -snr value down. However I wonder what makes your problem samples set go so low in bitrate. Guess there's a specific meaning of a negative snr value.

Anyway I'd prefer to use a higher -nts value of up to say 40 instead. It would give us the chance to keep the usual skew/snr combination and go extreme with noise threshold for learning about lossyWAV behavior.

I am beginning to feel that -snr is a bit of packing in the sausage. When I tried -3 -snr -215 (modified average = average - snr_value, i.e. average +215 in this case, effectively removing it from consideration) I got palatable results.

-below set process priority to below normal.-low set process priority to low.

Special thanks:

David Robinson for the method itself and motivation to implement it in Delphi.Dr. Jean Debord for the use of TPMAT036 uFFT & uTypes units for FFT analysis.Halb27 @ www.hydrogenaudio.org for donation and maintenance of the wavIO unit.

fixed a pretty massive FUBAR on my part, the variable name for passing in the quality preset wasn't right, so it was defaulting to -2 always. that's been fixed. that's what i get for initially working on it 9 hours straight without breaks.

I just tried insane -nts settings on my problem set to get a feeling about the security margin we have when using -3:

a) -3 -nts 30 => 319/390 kbps for my regular/problem sample set

I was astonished about the quality of Atem-lied which I tried first. badvilbel was next and also has a remarkable quality. bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A however have big errors (no abxing required), and the errors of furious and triangle are also easy to perceive though quality isn't really bad.The big errors of bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A are pretty much of the kind I know from wavPack lossy.Everybody who likes to hear the potential problems lossyWav has when accuracy demand is too small is invited to do a listening test with this setting. The problems of the bad samples mentioned are easy to hear.

b) -3 -nts 20 => 320/405 kbps for my regular/problem sample set

Results were a lot better. Only bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A are not transparent, with bibilolo and S37_OTHERS_MartenotWaves_A being already roughly acceptable. Just keys_1644ds is still missing quality very seriously, though it too has improved in a remarkable way.

c) -3 -nts 16 => 321/419 kbps for my regular/problem sample set

Only key_1644ds and S37_OTHERS_MartenotWaves_A are not transparent to me. S37_OTHERS_MartenotWaves_A is already very hard to abx for me, and even for key_1644ds it's not easy.

d) -3 -nts 12 => 326/438 kbps for my regular/problem sample set

Only keys is not totally transparent to me - and I was able to abx keys only with a pretty bad 7/10 result.

I just tried insane -nts settings on my problem set to get a feeling about the security margin we have when using -3:

a) -3 -nts 30 => 319/390 kbps for my regular/problem sample set

I was astonished about the quality of Atem-lied which I tried first. badvilbel was next and also has a remarkable quality. bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A however have big errors (no abxing required), and the errors of furious and triangle are also easy to perceive though quality isn't really bad.The big errors of bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A are pretty much of the kind I know from wavPack lossy.Everybody who likes to hear the potential problems lossyWav has when accuracy demand is too small is invited to do a listening test with this setting. The problems of the bad samples mentioned are easy to hear.

b) -3 -nts 20 => 320/405 kbps for my regular/problem sample set

Results were a lot better. Only bibilolo, key_1644ds, and S37_OTHERS_MartenotWaves_A are not transparent, with bibilolo and S37_OTHERS_MartenotWaves_A being already roughly acceptable. Just keys_1644ds is still missing quality very seriously, though it too has improved in a remarkable way.

c) -3 -nts 16 => 321/419 kbps for my regular/problem sample set

Only key_1644ds and S37_OTHERS_MartenotWaves_A are not transparent to me. S37_OTHERS_MartenotWaves_A is already very hard to abx for me, and even for key_1644ds it's not easy.

d) -3 -nts 12 => 326/438 kbps for my regular/problem sample set

Only keys is not totally transparent to me - and I was able to abx keys only with a pretty bad 7/10 result.

That's a lot of listening! It's reassuring that the previously determined -3 settings have been confirmed by your test.

I went down a slightly different path with -snr <large negative number> to effectively remove it from the calculation of the minimum value for each FFT result. I think that some of your large -nts values would sound *very* different without the -snr safety net. That's not to say that -snr is necessarily bad, but I think it bloats the bitrate a bit.

This gives me an opportunity to thank you all though for the work that you have put in. I think this is an extremely exciting development.

I second this!

Thank you very much!

If lossyWAV get's enough users, i will evaluate if some modifications of TAK can significantly improve the compression of it's output. In this context "significantly" means at least by about 20 kbps. I have some ideas, but you can not be sure until you tried it.

This gives me an opportunity to thank you all though for the work that you have put in. I think this is an extremely exciting development.

I second this!

Thank you very much!

If lossyWAV get's enough users, i will evaluate if some modifications of TAK can significantly improve the compression of it's output. In this context "significantly" means at least by about 20 kbps. I have some ideas, but you can not be sure until you tried it.

Thank you again!

Thomas

*Another* 20kbps saving! On top of everything else, that would probably push the average output of -3 down to circa 320kbps using TaK.......

Congratulations on the piping by the way, I may have to beseech aid in implementing it in lossyWAV - though how you pipe in and pipe out of lossyWAV then ensure that the output pipe goes to the lossless encoder I haven't the faintest clue........

Just a side note again .. when you're going to experiment further (in the code) with settings it would be best to call those (in between) versions Alpha again. When you arrive at something you're confident about you could release another beta. (I'm not saying something isn't right, but maybe another alpha round is needed?)

Just a side note again .. when you're going to experiment further (in the code) with settings it would be best to call those (in between) versions Alpha again. When you arrive at something you're confident about you could release another beta. (I'm not saying something isn't right, but maybe another alpha round is needed?)

Well, all I did was change an input range to a particular parameter, I did not substantially change the code. I see what you mean though.

[edit] On reflection, no settings per se have been changed (other than the inclusion of the ability to revert to a close approximation of David's original script), only the ability to change settings has been augmented.

The more I listen to -3 -snr -215, the more I like it. I still think that there is a place for -snr, however I feel that it needs better explanation. I'll work up a spreadsheet which will graphically demonstrate the -skew, -nts and -snr parameters effects on a suitably small fft_length.

The bottom line though is that there is only one process which actually modifies the audio data, namely the bits_to_remove procedure - no heuristics in that process at all. The number of bits_to_remove may depend on a heuristically generated minimum_value, but the added noise caused by the subsequent bit reduction has already been calculated - therefore the link between minimum_value and bits_to_remove. [/edit]

From the bitrate you gave for your sample set which consists of problem samples to a high degree it's hard to imagine that keys_1644ds, bibilolo, or Martenotwaves are fine. I will try it this weekend. Anyway I'd like to know what a negative -snr value is doing.

From the bitrate you gave for your sample set which consists of problem samples to a high degree it's hard to imagine that keys_1644ds, bibilolo, or Martenotwaves are fine. I will try it this weekend. Anyway I'd like to know what a negative -snr value is doing.

Attached spreadsheet shows how -skew, -snr and -nts interact on a 64 sample FFT (random numbers used for FFT output, F9 to recalculate for another iteration).