How can you say this when SebG and Woodinville both gave you examples to the contrary?

I hit the exact problem Woodinville describes using the method I posted on the first page of this thread - a listener gets stuck in a "false" minima of audibility because double the difference gives you the original signal back (with the part "removed" by the codec being inverted, but that difference is not usually audible). Hardly monotonic - the chance of hearing the artefact becomes zero at a single gain setting (+6dB), and (with the specific audio I used - YMMV!) leaps back to the "expected" function very quickly either side of that.

In many papers devoted to "coding margin" a special filtering is recommended to eliminate those "ghost" frequencies. We also use it.

How can you say this when SebG and Woodinville both gave you examples to the contrary?

I hit the exact problem Woodinville describes using the method I posted on the first page of this thread - a listener gets stuck in a "false" minima of audibility because double the difference gives you the original signal back (with the part "removed" by the codec being inverted, but that difference is not usually audible). Hardly monotonic - the chance of hearing the artefact becomes zero at a single gain setting (+6dB), and (with the specific audio I used - YMMV!) leaps back to the "expected" function very quickly either side of that.

In many papers devoted to "coding margin" a special filtering is recommended to eliminate those "ghost" frequencies. We also use it.

How do you know what "it" is? You have to work specifically to every bit rate, every bandwidth, every sampling rate, every different encoder?

In many papers devoted to "coding margin" a special filtering is recommended to eliminate those "ghost" frequencies. We also use it.

How do you know what "it" is? You have to work specifically to every bit rate, every bandwidth, every sampling rate, every different encoder?

This is not useful.

Subtracting a portion of reference signal from output one it's not hard to figure out what frequencies are "ghosted' and remove them with FIR filter. So, yes, we do it for every test sample with amplified artifacts. This helps to get smoother perception curves. Every item tested at SE has its own unique curve plotted on results of SE listening tests. Extrapolating that curve we get resulting quality rating for each testing item.

I can see how this could work for a simple low pass filter, but not how it could work for SBR.

With SBR, there's nothing you can usefully present to a listener that's "just like the coded version, but with the faults a bit louder" or "just like the coded version, but with the faults a bit quieter".

It's like me singing the same song twice. You can't figure out how close the two different versions are by subtracting them or amplifying differences. Subjectively (if I was a very consistent singer) the two versions could sound basically identical, but mathematically every sample would be very different, and I can't see how what you propose could work. SBR isn't so different from this example!

In many papers devoted to "coding margin" a special filtering is recommended to eliminate those "ghost" frequencies. We also use it.

How do you know what "it" is? You have to work specifically to every bit rate, every bandwidth, every sampling rate, every different encoder?

This is not useful.

Subtracting a portion of reference signal from output one it's not hard to figure out what frequencies are "ghosted' and remove them with FIR filter. So, yes, we do it for every test sample with amplified artifacts. This helps to get smoother perception curves. Every item tested at SE has its own unique curve plotted on results of SE listening tests. Extrapolating that curve we get resulting quality rating for each testing item.

This leads to a very simple question: What does "sub-threshold differences in a listening test" mean?

Differences that can be proven to exist with technical means, but are undetectable with a standard listening test.

Let me try this analogy:Someone has to leave the next day on a 6-month boat trip. He has to prepare canned food and can choose between two unlabeled lots that look identical. Someone told him that the lots have different "best before" dates: one expires in 1 month, the other in 10 months. He tastes a bit from each, but both taste absolutely identical. He knows that best before dates don't mean that the food will be bad the day after, but his chances to survive the trip are probably bigger when he picks the fresher one.(btw, the boat is too small to take both)

With SBR, there's nothing you can usefully present to a listener that's "just like the coded version, but with the faults a bit louder" or "just like the coded version, but with the faults a bit quieter".

Why not? If there is a difference with main signal, then there is something to present. The main question is - how good such differences will represent the drawbacks really important for HAS. Probably there are some psychoacoustic tricks which are badly covered by the metric. Then usual question is - to what extent such cases will affect final rating? All metrics have its limits.

QUOTE (2Bdecided @ Dec 1 2010, 19:26)

It's like me singing the same song twice. You can't figure out how close the two different versions are by subtracting them or amplifying differences. Subjectively (if I was a very consistent singer) the two versions could sound basically identical, but mathematically every sample would be very different, and I can't see how what you propose could work. SBR isn't so different from this example!

This leads to a very simple question: What does "sub-threshold differences in a listening test" mean?

Differences that can be proven to exist with technical means, but are undetectable with a standard listening test.

Let me try this analogy:Someone has to leave the next day on a 6-month boat trip. He has to prepare canned food and can choose between two unlabeled lots that look identical. Someone told him that the lots have different "best before" dates: one expires in 1 month, the other in 10 months. He tastes a bit from each, but both taste absolutely identical. He knows that best before dates don't mean that the food will be bad the day after, but his chances to survive the trip are probably bigger when he picks the fresher one.(btw, the boat is too small to take both)

The best before date is a simple function - an apples to apples comparison - you know that 6 months is better than 5 months. You also know that what you want to do (go out longer in the boat) relates to what you are measuring (how long the food will last).

Comparing codecs isn't like this at all. Comparing codecs is an apples to oranges comparison - you don't know that artefacts 6dB below threshold are better than artefacts 5dB below threshold - 1) because the characteristic of the artefacts could be different, and 2) you haven't said what "better" means. Better for what? Not for just listening (either is fine), so for what?

It's like me singing the same song twice. You can't figure out how close the two different versions are by subtracting them or amplifying differences. Subjectively (if I was a very consistent singer) the two versions could sound basically identical, but mathematically every sample would be very different, and I can't see how what you propose could work. SBR isn't so different from this example!

Why not as well?

There must be some disconnect here, because this doesn't make sense to me. Either I don't understand what you mean, or you don't understand what I mean.

If I sing the same thing twice, what do you do to these two files to present them on SoundExpert.com?

If I sing the same thing twice, what do you do to these two files to present them on SoundExpert.com?

There is nothing to present on SE in this case. Or we have to define that the first recording is reference one and we want to know how different (bad) is the second. Then why not to amplify the difference to some extent.

Comparing codecs isn't like this at all. Comparing codecs is an apples to oranges comparison - you don't know that artefacts 6dB below threshold are better than artefacts 5dB below threshold - 1) because the characteristic of the artefacts could be different, and 2) you haven't said what "better" means. Better for what? Not for just listening (either is fine), so for what?

Do we agree that there are 3 types of quality levels, from better to worse:1- artefacts are non-existent (-inf), like in lossless coding2- artefacts are below the hearing threshold3- artefacts are audible, by at least one listener for at least one (killer)sample

In my view the better codec is the one that will remain in category 2 in any situation (e.g. inserting an Orban in the monitoring chain).

Example: original master is 24/96. Two lossy copies are made, one 16/44.1 and one mp3 320kbs. Both sound identical to the master.I would say the 16/44.1 is better than the mp3, but if you can give arguments for the contrary, I'm all ear.

QUOTE (2Bdecided @ Dec 2 2010, 12:32)

If I sing the same thing twice, what do you do to these two files to present them on SoundExpert.com?

SoundExpert won't work for this, nor will ABX since there's a huge risk for false positives. A lot depends on where you switch from A to B. Small tempo and pitch differences will remain unnoticed when heard in isolation, but as soon as you jump from one to the other they can become apparent. This is the daily job of an audio editor, to find the best spot to inaudibly switch from one take to another. (hint: it's not always easy and I'm glad to be paid per hour)

Comparing codecs isn't like this at all. Comparing codecs is an apples to oranges comparison - you don't know that artefacts 6dB below threshold are better than artefacts 5dB below threshold - 1) because the characteristic of the artefacts could be different, and 2) you haven't said what "better" means. Better for what? Not for just listening (either is fine), so for what?

SE metric is trying to find the way how to get to know this. And "better" means "as if it was judged by golden ears in perfect listening environment". What for? I don't know but all audio pro guys want huge quality margins for their equipment and most listeners want flac while aac@192 is transparent. May be they are just not clever enough.

Comparing codecs isn't like this at all. Comparing codecs is an apples to oranges comparison - you don't know that artefacts 6dB below threshold are better than artefacts 5dB below threshold - 1) because the characteristic of the artefacts could be different, and 2) you haven't said what "better" means. Better for what? Not for just listening (either is fine), so for what?

Do we agree that there are 3 types of quality levels, from better to worse:1- artefacts are non-existent (-inf), like in lossless coding2- artefacts are below the hearing threshold3- artefacts are audible, by at least one listener for at least one (killer)sample

You can certainly define 3 such categories. It also sounds like a thing that's theoretically true (whatever that means). I suspect your categories are completely useless though...

In practice, it's hard to find a codec in category 2 that gives a significant bitrate saving over those in category 1.

It's rather difficult to prove that the codec is in category 2 rather than 3. You've got to get everyone in the world to listen carefully to every possible audio signal.

QUOTE

In my view the better codec is the one that will remain in category 2 in any situation (e.g. inserting an Orban in the monitoring chain).

Ah, good, so now we have everyone in the world listening to every possible audio signal via every possible piece of audio processing. Excellent.

Now, seriously, even if we put the "every person" and "every audio signal" parts to one side, you must realise that for any codec which changes the signal (let's assume the change is inaudible), there must be some audio processing we can do to make that change audible. So no codec can remain in category 2 "in any situation".

QUOTE

Example: original master is 24/96. Two lossy copies are made, one 16/44.1 and one mp3 320kbs. Both sound identical to the master.I would say the 16/44.1 is better than the mp3, but if you can give arguments for the contrary, I'm all ear.

If the mp3 is made from a 16/44.1 file (as is normal) then this is silly - of course it can't be better, since it's a copy of a copy.

However, if the mp3 is made from the 24/96 by resampling to 44.1 but maintaining 24-bits, then it's kind of trivial to find a situation where the mp3 is "better":The original master contains a signal at -110dBThe mp3 is decoded to 24-bitsThe "processing" applied to the 16/44.1 wav and the decoded 320kbps mp3 is... increasing the level by 80dB.

Oh look - both sounded identical to the master before processing, but with my highly advanced processing in place (well, OK, it was a volume control!) the mp3 is revealed to be far closer to the master than the 16/44.1 version.

These are all silly examples, but I think they prove the point - there's far too much assumption in the SoundExpert methods, or the "this thing sounds the same but must be better" statements.

QUOTE

QUOTE (2Bdecided @ Dec 2 2010, 12:32)

If I sing the same thing twice, what do you do to these two files to present them on SoundExpert.com?

SoundExpert won't work for this, nor will ABX since there's a huge risk for false positives. A lot depends on where you switch from A to B. Small tempo and pitch differences will remain unnoticed when heard in isolation, but as soon as you jump from one to the other they can become apparent. This is the daily job of an audio editor, to find the best spot to inaudibly switch from one take to another. (hint: it's not always easy and I'm glad to be paid per hour)

But SBR is "singing along" with the music without tempo and pitch differences, yet re-creating it from scratch (the original waveform is discarded). ABX works fine. Amplifying the sample-by-sample differences is meaningless.

I don't see any explanation of why the SoundExpert approach works for SBR, or accurately quantifies the subjective quality of SBR wrt "traditional" coding.

It's funny - we've seen a second revolution in audio coding. The first was when basic psychoacoustics came in, and suddenly having a waveform that was "closest" to the original was no longer the way to judge quality. With two codecs, the one which had a greater error signal could sound better.

Now with SBR and PS we have another revolution where the waveform isn't an (inaudibly) distorted version of the original, but actually bares no resemblance to the original. So any measurements that include psychoacoustics while assuming that the waveform should be at least vaguely similar are also broken.

I'm not convinced that the SoundExpert method actually survived the first revolution, but it's difficult to see how it survived the second.

ABX will survive whatever happens.

I'll eat my words if someone can provide a detailed explanation of how SoundExpert works, and prove a correlation - but if it relies on sticking plasters to undo or account for each new coding trick, it's no good generally.

In my view the better codec is the one that will remain in category 2 in any situation (e.g. inserting an Orban in the monitoring chain).

Ah, good, so now we have everyone in the world listening to every possible audio signal via every possible piece of audio processing. Excellent.

Exactly, that's not very practicle.And that's the very reason why so many audio professionals prefer to offer lossless formats and let the customer decide how to process it for his/her personal use.I remember numerous complaints from HA members about online music being only available in lossy formats. Deutsche Grammophon offers both flac and 320kbs mp3, which makes a lot of sense IMO, even if they sound identical

Now with SBR and PS we have another revolution where the waveform isn't an (inaudibly) distorted version of the original, but actually bares no resemblance to the original. So any measurements that include psychoacoustics while assuming that the waveform should be at least vaguely similar are also broken.

Below are Diff. Levels of 9 SE samples processed by HE and LC profiles of CT encoder (@128 kbit/s):

As you see waveforms of both profiles differ from reference waveforms approx. to thу same degree. So it is an illusion that with SBR the waveforms "bares no resemblance to the original". The illusion is inspired by the knowledge of how SBR works. In reality both waveforms are changed to the same level (@192 HE versions even closer to references than LC ones, though the encoders and modes are different). The main question is how they are changed.

This leads to a very simple question: What does "sub-threshold differences in a listening test" mean?

Differences that can be proven to exist with technical means, but are undetectable with a standard listening test.

Thanks to Quantum Mechanics, a good enough measurement will always be different if this is an analog signal. If it's a digital signal, well, you have a tiny leg to stand on, but still, let's take a 120 second log sweep from 20 to 15khz. Under that by 40dB I put a 4kHz tone.

Now, the difference is going to be a constant 4khz tone. The "noise" is stationary and exactly predictable. Its audibility is going to vary enormously over the time of the sweep.

We have to use how many different examplars or whatever to decide the audibility of this noise? The scale is continuous, so ...??????