Recently I have discovered for myself, that the difference of the source and encoded audio can be easily obtained by inverting source audio and mixing it with the encoded one. Then the idea of encode algorithm came into my head: just try to keep the signals difference at the same level (or less), defined by user. Thus, the audio quality is simply measured by volume of the difference of the signals, and this difference is nothing but distortions, produced by encoder.The whole algorithm looks like this:1. Take maximum allowed volume of signals difference from user.1. Make a copy of source audio and invert it.2. Split both source and inverted audio on frames of the same size.3. Encode first frame of source audio, mix the result with first frame of inverted audio and calculate the volume of obtained difference.4. If the volume of the difference is higher, than allowed by user, add some bitrate and repeat from item no. 3.5. If the volume of the difference is not higher, than allowed by user, add first encoded frame to the final output.6. Repeat items 3-5 with second, third, etc... frames, until the end of the source file.

Of cause, this algorithm is much slower then just direct encode, but definately if should not be slower, than video encoding (and people are ready to wait for many hours while their videos are being encoded).

I tried to reproduce this algorithm manually by test using WavPack hybrid mode as an encoder (source audio sample was splitted on 11 parts of 1 second), and it showed, that 23.4 % of space/bitrate could be saved. Another important thing is that the user is guaranteed, that he will not get distortions with volume level, higher then he expects, so he can safely encode many files simultaneously without looking at the content. User gets freed both from unnecessary waste of bitrate and uncontrolled distortions.

The only thing is needed is that some audio developers get interested in this idea and implement it as a computer program.

Then the idea of encode algorithm came into my head: just try to keep the signals difference at the same level (or less), defined by user. Thus, the audio quality is simply measured by volume of the difference of the signals, and this difference is nothing but distortions, produced by encoder.

The problem with this approach is that audibility has very little to do with the absolute volume due to masking. So an error at -20dBFS might be inaudible if its masked, and very audible at -40dbFS if its not masked. So usually what codecs do is when they get to step 4, they compute masking thresholds and adjust the error based on how audible it will be. In this case you will likely find that huge error signals are often highly tolerable, while small error signals often not.

QUOTE

Of cause, this algorithm is much slower then just direct encode

It doesn't have to be. You can make this extremely fast by changing the quantization decisions made in your encoder to reflect the absolute error directly, thus no iterative encoding will be needed.

The problem with this approach is that audibility has very little to do with the absolute volume due to masking. So an error at -20dBFS might be inaudible if its masked, and very audible at -40dbFS if its not masked. So usually what codecs do is when they get to step 4, they compute masking thresholds and adjust the error based on how audible it will be. In this case you will likely find that huge error signals are often highly tolerable, while small error signals often not.

This is the question of further improvements, and it is already an implementation of some psychoacoustics model. I am more interesting in the idea, I've described, because of some substitution for lossless, which is in my opinion is too excessive. I've made many tests of lossy encoders, and get realized, that they cannon satisfy me in this role. WavPack hybrid is much better, but it is not flexible, and you never know, which distortions you get on output.

I've made many tests of lossy encoders, and get realized, that they cannon satisfy me in this role. WavPack hybrid is much better, but it is not flexible, and you never know, which distortions you get on output.

What I meant is that looking at the absolute difference really doesn't tell you anything useful, so while you know what "distortion" is present (or rather the error signal power), you don't really have any idea what it actually sounds like or how good the quality is. The reason codecs do things differently is that the simple approach you're thinking of doesn't actually work.

If you want to demonstrate to yourself how "the sound of the difference" (subtraction) is NOT the same as the "difference in the sound", here are a couple of simple experiments -

Delay one sound by a few milliseconds. This will make no difference in the sound, but when you subtract you will get a huge difference file that's about as loud as either original file with a weird-sounding comb filter effect.

Invert the copy and subtract. Again, there is no difference in sound between the two files. But of course, when you subtract a negative it's the same as adding a positive, and you will get a difference file that's twice as loud as either original (and probably clipped).

just try to keep the signals difference at the same level (or less), defined by user. Thus, the audio quality is simply measured by volume of the difference of the signals, and this difference is nothing but distortions, produced by encoder.

The only thing is needed is that some audio developers get interested in this idea and implement it as a computer program.

If I understand you correctly, audio developers have implemented this idea already 4 decades ago. The simplest case: take some high-word-length audio (e.g. a CD rip) and convert it to e.g. 8-bit PCM. Your difference signal will always be at the same level, depending on the target word-length. Slightly more elaborate cases: A-Law or Ķ-Law. There your maximum allowed volume of the difference signal is also known.

The problem with this approach is that audibility has very little to do with the absolute volume due to masking.

But who can guarantee, that this masking will work, and that the difference will not be audible on all input signals? The whole idea is not about audibility, it is about using minimum bitrate for maximum mathematical closeness of output audio to input audio, just pure calculations, which seems to be the only guarantee here.

QUOTE (greynol @ Mar 6 2013, 05:22)

However (and IIRC), WavPack Lossy does not use a psychoacoustic model, so this might loosely apply.

At least, we should give it a try...

QUOTE (DVDdoug @ Mar 7 2013, 00:01)

Delay one sound by a few milliseconds. This will make no difference in the sound, but when you subtract you will get [b]a huge difference file

If an encoder do some shifts of audio on a timeline, that means that the idea of this topic is simply not appliable to it.

QUOTE (C.R.Helmrich @ Mar 7 2013, 00:21)

If I understand you correctly, audio developers have implemented this idea already 4 decades ago.

Well, I do not see any software, which uses maximum allowed error level of audio as an input parameter.

Well, I do not see any software, which uses maximum allowed error level of audio as an input parameter.

That's because (as others have said!) to guarantee this you do not need anything clever at all. You just need to reduce the bitdepth of the audio signal by an amount equivalent to the difference (=noise) you're willing accept. You'll get 6dB more noise per extra bit dropped. Lower bitdepth = lower bitrate when losslessly encoded. So, use any audio editor that allows you to change the bitdepth, then use almost any lossless codec on the result = job done.

For a smarter way of doing it, take a look at lossyWAV. I think you can bound how many bits it removes.

LossyWAV is commonly discussed here and I lamented not including it shortly after posting.

QUOTE (softrunner @ Mar 7 2013, 07:59)

But who can guarantee, that this masking will work, and that the difference will not be audible on all input signals? The whole idea is not about audibility, it is about using minimum bitrate for maximum mathematical closeness of output audio to input audio, just pure calculations, which seems to be the only guarantee here.

Who can guarantee that this "maximum mathematical closeness" will work, especially when it makes no attempt to consider how the human auditory system functions? Also, please don't insult our intelligence by suggesting that we must try all possible input signals before rejecting the assertion that this idea will do better than already established practice built upon well established knowledge when you have not even offered any evidence supporting your concept.

If this isn't about audibility then I completely fail to see the point. Audio quality is one of the primary determinants in gauging the performance of a lossy encoder. Other worthwhile determinants will focus on performance/ease of coding/decoding related issues. Perhaps someone can make a case as to why "maximum mathematical closeness" affects either of these groups or if it may fall into a new and equally important group.

Well, I do not see any software, which uses maximum allowed error level of audio as an input parameter.

There are a lot of people hating on this concept. We do, actually, see this used in practice, it's just simpler than you'd think. You can change the maximum allowed error level of audio simply by altering the number of bits allocated per sample in an uncompressed context. With an appropriate codec, you can use fractional numbers of bits-per-sample. Then you can compress it down losslessly for a further reduction in file size.

LossyWAV is commonly discussed here and I lamented not including it shortly after posting.

I thought about posting that, but his test samples posted above are 160-256kpbs, so I think hes interested in highly compressed audio, whereas lossy wav is going to be about 2x that bitrate for good results.

Please show me a lossy algorithm with no psychoacoustic model that beats one with a psychoacoustic model where the metric is how low you can go in average bitrate and achieve transparency or near transparency for non-contrived test samples.

As I see it, the issue at hand is that the level of a difference signal is being ranked over audibility.

EDIT: Saratoga beat me in demonstrating that bitrate is an important factor. To add, WavPack Lossy isn't exactly regarded as being competitive in the sub-256kbit range either.

Please show me a lossy algorithm with no psychoacoustic model that beats one with a psychoacoustic model where the metric is how low you can go in average bitrate and achieve transparency or near transparency for non-contrived test samples.

Oh, I agree that it'll never be competitive. I'm just saying that if you look at it the right way, we kind of already have such a thing.

But who can guarantee, that this masking will work, and that the difference will not be audible on all input signals? The whole idea is not about audibility, it is about using minimum bitrate for maximum mathematical closeness of output audio to input audio, just pure calculations, which seems to be the only guarantee here.

Your theory is mathematically wrong: human auditory system is a non linear system, so the subtraction of two input signals (lossless - computed error) doesn't work the way you expect to.We use psychoacoustic models exactly because we haven't an exact mathematical description of the auditory system, otherwise lossy compression would be deterministic and "just pure calculation" (well, more or less... anyway still more complex than sums and subtractions).

Indeed. Softrunner, if you want mathematical closeness (whatever that means), you should at least consider the following clarification of your objective: try to keep the signals difference at the same level (or less) relative to the instantaneous level of the input signal. In other words, you should try to keep the instantaneous signal-to-noise ratio (SNR) at the same level (or higher).

That's already quite close to what modern audio encoders do, by the way. And it makes perfect sense: if you wouldn't consider the input level, a quiet signal would sound worse after your coding than a loud but otherwise identical signal.

You just need to reduce the bitdepth of the audio signal by an amount equivalent to the difference (=noise) you're willing accept. You'll get 6dB more noise per extra bit dropped. Lower bitdepth = lower bitrate when losslessly encoded. So, use any audio editor that allows you to change the bitdepth, then use almost any lossless codec on the result = job done.

And I have to do all this manually for all lossless files I have? Actually, that's not what I'm willing for. And I think, this will not be efficient enough.

QUOTE

For a smarter way of doing it, take a look at lossyWAV.

I know about lossyWAV. First, it does not accept maximum allowed volume of error signal as an input parameter, volume of distortions is dependant on material, being processed. Also, check this sample. On it lossyWAV gives distortions, which are audible even at "extreme" preset (235 kbps; in FLAC this sample uses 301 kbps), so we have to admit, that it's bits reduction and masking technics do not work properly on all kind of audio material. And for this sample WavPack even on 96 kbps gives perfect result, it's error signal is extremely quiet, so, the only thing needed is just to guide WavPack, teach it, where to use more bitrate and where to reduce it.

QUOTE (greynol @ Mar 7 2013, 21:05)

Who can guarantee that this "maximum mathematical closeness" will work

Definately it will work. It you do not allow distortions of some volume, they will never appear there. Very simple logic, which simply works. And I do not claim, that it is the final destination. I accept, that it is possible to allow encoder to be more aggressive in certain circumstances, but it is the question of separate research for each encoder separately. Firstly, very simple approach should be implemented.

QUOTE

Also, please don't insult our intelligence by suggesting that we must try all possible input signals before rejecting the assertion that this idea will do better than already established practice built upon well established knowledge when you have not even offered any evidence supporting your concept.

I do not know exactly, how it will work, but I want to try it, because already established practice does not work good enough. All, we do, is just a blind play with bitrates, believing, that we have some quality there. And when we find one more killersample, we realize, that it was just a believe.

QUOTE

If this isn't about audibility then I completely fail to see the point.

The point is in reducing file size without any audible loss of quality on all inputs possible with 100% guarantee. That means, that there will be no more killersamples at all. Every user will use his own level of allowed distortions, dependent on sensibility of his ears, and he will know exactly, what he gets.

QUOTE (Canar @ Mar 7 2013, 23:20)

You can change the maximum allowed error level of audio simply by altering the number of bits allocated per sample in an uncompressed context. With an appropriate codec, you can use fractional numbers of bits-per-sample. Then you can compress it down losslessly for a further reduction in file size.

All this is far from real practice, and I'm not against existing methodics, on the contrary, I am for using them, but with looking at the result they give.

QUOTE (saratoga @ Mar 7 2013, 23:25)

so I think hes interested in highly compressed audio, whereas lossy wav is going to be about 2x that bitrate for good results.

No, encoder can use as much bitrate, as it can for max. allowed signals difference. For substituting lossless I would accept the difference of approximately -45 dB and lower if it would be efficient enough.

QUOTE (Nessuno @ Mar 7 2013, 23:54)

We use psychoacoustic models exactly because we haven't an exact mathematical description of the auditory system, otherwise lossy compression would be deterministic and "just pure calculation" (well, more or less... anyway still more complex than sums and subtractions).

One more time, this psychoacoustic models do not garantee you anything. They give you only approximate results and sometimes fail.

QUOTE (C.R.Helmrich @ Mar 8 2013, 01:51)

And it makes perfect sense: if you wouldn't consider the input level, a quiet signal would sound worse after your coding than a loud but otherwise identical signal.

First, if I understand you correctly: turn the volume control on maximum, and you will hear the noise... but nobody listens music on such a volume. Also, I've made a test: encoded one sample into WavPack 192 kbps (lowest possible), and track peak of the difference file was 0.077026. Then I decreased the volume on 40 dB, encoded again in 192 kbps, and you think track peak of the difference file was about the same 0.077026? No, it was 0.000854. Encoders know about such a tricks, so we are in safety here.

so I think hes interested in highly compressed audio, whereas lossy wav is going to be about 2x that bitrate for good results.

No, encoder can use as much bitrate, as it can for max. allowed signals difference. For substituting lossless I would accept the difference of approximately -45 dB and lower if it would be efficient enough.

Like 2Bdecided suggested, -45 dB error corresponds to 8 bit PCM. You can do this just by peak normalizing your tracks, converting the files to 8 bit and then compressing with flac. No need for anything new.

The point is in reducing file size without any audible loss of quality on all inputs possible with 100% guarantee. That means, that there will be no more killersamples at all. Every user will use his own level of allowed distortions, dependent on sensibility of his ears, and he will know exactly, what he gets.

Quite funny how you can say this so casually. Anyway, good luck with that.

softrunner, you evidently lack the theorical bases to cope with many aspects of this argument, but don't want to listen to what people here with more knowledge and experience than yours on this ground is trying to explain you because you are in love with your "simple and revolutionary new theory".

It's quite a common pattern, so best wishes for your idea and its implementation, but I think this discussion here has became a non sequitur.

In support of Nessunoís conclusions, as well as the juicy number quoted by greynol, we have this:

QUOTE (softrunner @ Mar 9 2013, 02:09)

I do not know exactly, how it will work, but I want to try it, because already established practice does not work good enough. All, we do, is just a blind play with bitrates, believing, that we have some quality there. And when we find one more killersample, we realize, that it was just a believe.

Yeah. OK.

I donít feel like trying to respond methodically to your, erm, points. What I will say is that (1) a one-size-fits-all approach is not going to work, regardless of how nice and easy it might sound and how much you like it for that reason*, and (2) a uniform level of noise throughout one stream does not necessarily mean a uniform level of non-audibility of the same noise.

Again, if youíre wondering why this hasnít been done despite apparently being so simple, you need to consider the very real possibility that it hasnít been done because itís too simple.

* And this sentiment takes us back to your previous ideas about VBR encoding, wherein you were also effectively demanding that people create an encoder that can guarantee transparency to everyone at a single setting. That wasnít viable, either.

On a very basic level, lossy encoders have a mechanism for1) introducing distortion2) evaluating the audibility of said distortion (the psycho-acoustic model)3) storing the distorted data

These three need to work hand-in-hand to make an efficient encoder.

If 2) deviates from the actual way we humans hear, 1) will in some cases add too much distortion, resulting in audible artifacts, and in other cases not add enough distortion, so that 3) will waste many bits.

The "maximum allowed volume of error signal" is a crude representation of human hearing and will thus feature the aforementioned problems, even if it is just used to augment Wavpack's psycho-acoustic model.

* And this sentiment takes us back to your previous ideas about VBR encoding, wherein you were also effectively demanding that people create an encoder that can guarantee transparency to everyone at a single setting. That wasnít viable, either.

Yep, I remember that same line of nonsense.

I think he should focus his efforts in enlisting our help to solve problems concerning energy and the environment. Surely a Nobel prize is just in reach.