Arnie, you suggested to improve this test by upsampling the downsampled material to the samplerate the original had. I very much see the logic in your suggestion. However, may it be that brick wall filtering could introduce audible obstacles into the signal that are unwanted?

I think that the audibility of brick wall filtering in the downsampling is the actual object of the test.

QUOTE

I´m referring to the thread "Audibility of 20kHz brick wall filtering". So far (only three people have participated, including me) it seems that brick wall filtering may be audible. Further tests by several people are required however.

If I understand that test properly, it has a serious limitation - the program material being used is implulses, not real world music. Even impulsive sounds in music far fall short of the extreme spectral content of a steady stream of impulses. Listening to impulses is about as much fun as listening to white noise.

QUOTE

And it is my impression that both downsampling & upsampling use brick wall filtering to avoid aliasing artifacts for downsampling and imaging products for upsampling. Is that assumption correct?

It is my understanding that downsampling uses brick wall filtering, but that upsampling either uses no brick wall filtering at all, or uses brick wall filtering at the nyquist frequency of the higher sample rate. It is very hard to avoid brick wall filtering in digital, so that's rarely the goal. As I understand it, the major goal of higher sample rates is raising the frequency of any brick wall filters.

QUOTE

So, if I´m using brick wall filtering two times, wouldn´t that be even more audible? Or am I getting this wrong?

I think that the frequency of the corner frequency of the brick walls is highly significant. I don't think that anybody disagrees with the idea that in general, the higher the better. The only questions I'm aware of are how high, and what phase response is required for sonic transparency.

It is my understanding that downsampling uses brick wall filtering, but that upsampling either uses no brick wall filtering at all, or uses brick wall filtering at the nyquist frequency of the higher sample rate. It is very hard to avoid brick wall filtering in digital, so that's rarely the goal. As I understand it, the major goal of higher sample rates is raising the frequency of any brick wall filters.

]

I was fooling around with upsampling and found some stuff that varied from my previous understanding.

In CEP 2.0 the upsampling function has an option for using pre/post filtering or not. Furthermore, the program breaks resampling down into steps and gives progress messages.

With the pre/post filtering, upsampling is followed by filtering.

I tried upsampling 21 KHz @ 0 dB @ 44.1/32 to 96/32. Without pre/post filtering there was a spurious response at 23.090 KHz about 120 dB down. With pre/post filtering there was no spurious response of any kind, just a noise floor 160 dB down.

Niether the -160 dB noise floor nor the spurious response could ever imagninably be heard in a relevant listening test. However, the fact that it disappeared after post filering is pretty good evidence that the post filtering is a brick wall filter at 22.05 KHz.

So upsampling 44.1 KHz sampled information could result in yet another brick wall filter being applied at 22.05 KHz. Any potentially audible artifacts would thus be increased.

Hello,I didn't follow all the discussion about the FireFace, but one thing is sure : if all you got was a -120 dB alias at 23.09 kHz, there was some kind of filtering involved. What CEP calls upsampling can involve any kind of algorithm that, mathematically, involve more or less filtering. Then the program proposes another filtering process than cleans the result.

Pure resampling from 44100 Hz to 96000 Hz without filtering of any kind would consist in inserting 319 null samples between every original sample, then pick one sample out of 147 in the result.

It would give a pulse train that would feature a flood of harmonics.

Thus, don't draw any conclusion from the hypothesis that no filtering would only introduce a quiet 23 kHz alias. You just applied filtering without knowing it.SoundForge 4.5 worked the same way : there was resampling, with 4 quality levels, then optional filtering. But in reality, the 4 quality levels were actually 4 filters, from weak to strong.

Niether the -160 dB noise floor nor the spurious response could ever imagninably be heard in a relevant listening test. However, the fact that it disappeared after post filering is pretty good evidence that the post filtering is a brick wall filter at 22.05 KHz.

Yes, from my understanding, it is. Though I don't see the point of doing it in two steps. Judging from the impulse responses with-vs-without-pre-post-filter at http://src.infinitewave.ca [and my own experiments] the post filter just increases the effective filter length = steepness. A steeper interpolation (anti-imaging) filter with [more stop-band attenuation and] its -6 dB frequency a little below the original Nyquist frequency would do the same job.

I tried up sampling 21 KHz @ 0 dB @ 44.1/32 to 96/32. Without pre/post filtering there was a spurious response at 23.090 KHz about 120 dB down.

...if all you got was a -120 dB alias at 23.09 kHz, there was some kind of filtering involved.

I see your point. The frequency is right for being an image. 22.05 - 21.0 = 1.05. 22.05 + 1.05 = 23.10 - close enough to 23.09 given that the analysis tool was a 16k point FFT. If I use a larger FFT, I get almost exactly 23.10

QUOTE

What CEP calls up sampling can involve any kind of algorithm that, mathematically, involve more or less filtering. Then the program proposes another filtering process than cleans the result.

So it would seem.

QUOTE

Pure re sampling from 44100 Hz to 96000 Hz without filtering of any kind would consist in inserting 319 null samples between every original sample, then pick one sample out of 147 in the result.

It would give a pulse train that would feature a flood of harmonics.

Thus, don't draw any conclusion from the hypothesis that no filtering would only introduce a quiet 23 kHz alias. You just applied filtering without knowing it.Sound Forge 4.5 worked the same way : there was resampling, with 4 quality levels, then optional filtering. But in reality, the 4 quality levels were actually 4 filters, from weak to strong.

Got it. There is at least one layer of brick wall filtering in the CEP2.1 up sampling, and you have the option of adding one more. The brick wall filters are at the Nyquist frequency of the source data.

In the end, you get a very clean job of up sampling, even without the "post filtering".

- Regarding our choice of format comparison and technical chain, our purpose was to investigate perceptive differences between 88.2 vs. 44.1 in "real-life" use of the equipment, thus by taking into consideration what happens in music production and release.

Not surprising that you'd have marginally detectable differences when both stuff at the recording and playback sides were changed for the different trials. If I understand the results (as summarized by krabapple), the higher sample rate was not reliably identified as sounding better.

The strange thing is that there are ABX results with two-tailed statistics. I don't have the article, but it rather seems that these particular listeners mistook A and B with such consistency that they got a score like 2 right answer out of 12, rather than preferring the low resolution version. That's the only way I interpret "two-tailed ABX results" with listeners "significantly selecting the wrong answer".

I see that you calculated the p values for many different cases. Basically, for the 16 listeners as individuals, then for all of them as a group, then for 3 of them, then for 13 of them, in each case for 3 formats times 5 samples, and that you also included two-tailed results in addition to one-tailed results.

It gives a total of [(16 x 5) + (16 x 3) + (1 +1 + 1) x 3 x 5] x 2 = 346 possible p-values, out of which you got 12 significant ones. However, out of 346 p-values, we should expect in average 346 / 20 = 17.3 of them to be significant by chance, that is false positives !

I spotted that too. I didn't do the maths to get the numbers, but it was quite apparent that the paper looked for so many possible results that, with a 5% probability of each one ocurring at random, it would have been amazing if a positive results hadn't been found.

Or, to put it more simply, unless I've misunderstood the stats, there were a couple of people who seemed gifted in giving the wrong answer consistently, and everything else was basically random.

QUOTE

The trick is to use a suited analysis that takes into account all these variables at once, but is not prone to false positive picking.

I don't know what method should be used in this case, but i'm sure that some forum members, more knoledgeable than me in statistics, can help.

Good grief. It's hardly state-of-the-art! Thanks for pointing that out Arny.

QUOTE

On balance I don't see any flaws in the Pyramix that would necessarily invalidate results obtained by using it.

I don't know. Check out the passband. There seems to be a ~0.1dB error across most of it. Admittedly it's a linear frequency plot, so it's not "across most of it" as we hear things, but still - you'd want to level match down to 0.1dB wouldn't you? A signal with most energy in the 6-16kHz region (unlikely!) would be reduced by ~0.1dB by this device. That's not really good enough IMO to give robust ABX results. Especially when it's arguably not the actual thing under test - it could easily amplify an otherwise inaudible fault in the thing under test.

Upsampling and oversampling differ, at least in terms of purpose and implementation.

QUOTE

I also don't understand why a 192kHz DAC is supposedly cheaper to build. It is cheaper to build a good sounding 96kHz ADC than a 44.1kHz one, since the latter needs brickwall filtering.

Actually, they both need and get brickwall filtering. Some high sample rate DACs have a slow drop in response above 20 KHz, to like maybe 6 dB down at Nyquist. Then they have the usual sharp cutoff. From a digital filter desgn viewpoint it is all pretty much the same. The gentle roll off is window dressing. They still need a fairly complex digital filter to get the 90+ dB rejection above Nyquist. Some DACs are programmable to work either way. How moot does that make things?

Putting a gentle ramp a few feet high in front of the brick wall does not mean a signficiantly gentler stop when you hit the brick wall! ;-)

I don't agree.

1. It doesn't need to be brick wall if you have a potential 76kHz transition band2. 6dB down and then a brick wall is different from just a brick wall. Any ringing due to the brick wall will be 6dB down!

I'm not claiming any of this is audible, but it's all real and measurable.

I´m referring to the thread "Audibility of 20kHz brick wall filtering". So far (only three people have participated, including me) it seems that brick wall filtering may be audible. Further tests by several people are required however.

If I understand that test properly, it has a serious limitation - the program material being used is impulses, not real world music. Even impulsive sounds in music far fall short of the extreme spectral content of a steady stream of impulses. Listening to impulses is about as much fun as listening to white noise.

No, it's real music. Well, it's Limehouse street blues played in a New York jazz club - whether you count that as real music or not is up to you. There's just one impulse at the end of the file as a check. You don't have to listen to that part if you don't want to.

The fact, that the study's authors have registered here, but eventually did not really contribute much more than 'hello', might suggest that they cannot clear up the raised statistical concerns. If we assume that they do not deliver anything further for said reason, could the claimed significance be dismissed?

In dubio pro reo is generally a good principle. But when you look at the data by krabapple, I think it is allowed to say that, even if one cannot be 100% sure, doubt overweights by a good margin. My gut tells me that we won't see the study's authors bringing more light to this. Time will tell.

I read the full paper and I think there may be some transitive errors here.

It looks like in some cases listeners can detect a difference between 88.1 and 44.1 native , not between 44.1 down and 44.1 native and not between 88.1 and 44.1 down for the same material. This implies that 44.1 native lacks something but 88.1 to 44.1 retains what was lost in the 44.1 native. This does not appear to make sense since both 44.1 native and 44.1 down have the same limits (give or take dithering concerns) - I am confused by this. ???

In dubio pro reo is generally a good principle. But when you look at the data by krabapple, I think it is allowed to say that, even if one cannot be 100% sure, doubt overweights by a good margin. My gut tells me that we won't see the study's authors bringing more light to this. Time will tell.

If we had the raw data we could run our own analyses. One thing stll puzzles me that the three who showed a significant ability to detect a difference were fantastically unable to correctly match A to X or B to X as required. when I run DBTs in fooBar I can do worse than chance but I never do so badly that FooBar decides that I was not guessing after all, I just did a faked 4/12 run and my probability of guessing was 93% ??? - when I redid and got 0/12 my guessing probability went up to 100% ???

I read the full paper and I think there may be some transitive errors here.

It looks like in some cases listeners can detect a difference between 88.1 and 44.1 native , not between 44.1 down and 44.1 native and not between 88.1 and 44.1 down for the same material. This implies that 44.1 native lacks something but 88.1 to 44.1 retains what was lost in the 44.1 native. This does not appear to make sense since both 44.1 native and 44.1 down have the same limits (give or take dithering concerns) - I am confused by this. ???

When you have confusion like this, the problems are usually procedural. I don't know if it was the statistics or the technical details, and it could be both. It appears that projects involving so-called "hi rez" audio are popular senior year/graduate thesis projects. The problem has been studied for at least a decade without conclusive results that satisfied enough people to put an end to this sort of thing.

Right now, it seems clear that projects like this are a great way to show how sighted evaluations create strong perceptions that become subtle or non-existent when sufficient experimental rigor is added.

IME there is nothing different about hearing as compared to any other human performance issue. Do enough clean trials and do your statistics right and the results are asymptotic to the same result, over and over again. Historically, the asymptote in this realm is that high rez past the CD format either doesn't matter or it matters very little.

We need to be chasing the big, slow, meaty rabbits that hop about all over the place like micing and speakers and rooms; not the skinny fast rabbits that only come out in vanishing numbers at night.

We need to be chasing the big, slow, meaty rabbits that hop about all over the place like micing and speakers and rooms; not the skinny fast rabbits that only come out in vanishing numbers at night.

I like that analogy.

Problem is (for the industry) is that it's very easy to create and sell you something to "solve" the "problems" caused by the skinny fast rabbits. I mean, how hard is it to handle twice as much data, twice as fast? Wait a couple of years, and most of the problem is solved by default. Whereas micing, speakers and rooms? That would need real research. It's audio FFS - who wants to throw money at that?

(with all due credit to those companies who are out there solving real problems.)

The strange thing is that there are ABX results with two-tailed statistics. I don't have the article, but it rather seems that these particular listeners mistook A and B with such consistency that they got a score like 2 right answer out of 12, rather than preferring the low resolution version. That's the only way I interpret "two-tailed ABX results" with listeners "significantly selecting the wrong answer".

In the case of the 3 outliers when tested with violin samples 44.1 native or 88.2 downsampled to 44.1 there were 12 trials (4 per person) of those 12 trials there was a total of precisley 1 correct answer, (identifying A = X or B = X) i.e 1 person got 1 out of 4 correct, the other 2 got 0/4 correct, however with only 4 trials per sample pair this is not a big stretch.

Separating the 3 outliers out does make a big difference to the overall results, it always increases the overall score, in at least one case pushing it above the threshold.

Interestingly, perhaps, in some cases the difference between 88.2 and 44.1 native can be detected but not 88.1 vs 88.2 downsampled, in other cases the reverse is true.

They set the threshold at 63% for trials with an n of 52 , this seems a bit low, but stats is not my forte. In toto , all samples , all modes the correct hit rate seems to be between 53% and 55%

This does not appear to make sense since both 44.1 native and 44.1 down have the same limits (give or take dithering concerns) - I am confused by this. ???

If the statistics were significant, it would make sense, because downsampling 88.2 to 44.1, you can use perfect digital antialias filters, while recording directly at 44.1, you have to use analog antialias filters, in order for your signal to be lowpassed before it reaches the ADC.

This does not appear to make sense since both 44.1 native and 44.1 down have the same limits (give or take dithering concerns) - I am confused by this. ???

If the statistics were significant, it would make sense, because downsampling 88.2 to 44.1, you can use perfect digital antialias filters, while recording directly at 44.1, you have to use analog antialias filters, in order for your signal to be lowpassed before it reaches the ADC.

Modern ADCs do have analog anti-aliasing filters, but they are relatively simple and operate at ultrasonic frequencies. The brick wall that is right up against the audio band is digital and therefore the overall performance can be very similar to what you get if you record at a higher sample rate and downsample in the digital domain. Note that there can be considerable techncal variation in the details of how the digital filtering is implemented, whether in the ADC or applied later on.

Thanks all for your interest in our paper,I received an invitation from hydrogenaudio to provide further details on our work. Thus I will do my best to answer a few questions I could extract from the discussion.

- We used Pyramix 6.0 for down-sampling, as this software is currently used by a lot of audio professionals who produce HD recordings.- Regarding the statistics, the "p" we provided for the results refers to the probability that we got the result by chance. Traditionally for this kind of test (here an ABX), researchers consider that if p<.05, the result is not obtained by chance (as the probability is below 5%), thus participants could discriminate. If .05<p<.1, it may be that the result was not obtained by chance but it's not for sure, that's what is called "a tendency". If the test was easy, we would not need statistics, as participants would have almost 100% of good answers. But this test was extremely challenging for the expert listeners, implying a lot of errors even if some of them could perceive some differences between formats in specific cases (musical excerpt, type of format comparison).- There is no proof that upsampling doesn't introduce artifacts. - Regarding our choice of format comparison and technical chain, our purpose was to investigate perceptive differences between 88.2 vs. 44.1 in "real-life" use of the equipment, thus by taking into consideration what happens in music production and release: in a few cases, music is produced and released in high-resolution (thus playback in high-resolution); in more cases, music is produced in high-resolution and then down-sample into 44.1 for commercial release (thus playback in 44.1); in a lot of cases, music is produced and released in 44.1 (thus playback in 44.1). We used the Fireface DAC as it was the only one that allowed us to switch sample rate with a reasonable delay for the test (less than 1sec.). I wish we could use a better one. However, the Fireface is still pretty good compared to most playback systems people use in their house.

I am a sound engineer myself and started working in research as a part time job 3 years ago. I was glad to work on the high-resolution project as I have heard a lot of discussions in studios and during my sound recording studies on the topic. My main question was if it was worth working in High-Res when the project was to be released in 44.1.This AES paper is the first publication for this study and provides a few answers, maybe not enough for most of us. There will be more stuff coming up. And maybe other labs will work on that topic too as they are A LOT of tests to be done.

Bottom line, although the topic is interesting, mainly these days when the Blue Ray Pure Audio is to be defined, never forget that differences between formats, ADC, DAC,... remain extremely subtle compared to differences between miking techniques, room acoustics, and of courses musicians and their instruments!Best,Amandine

Many thanks for your reply, is there any chance we could get access to the raw data ?

Modern ADCs do have analog anti-aliasing filters, but they are relatively simple and operate at ultrasonic frequencies. The brick wall that is right up against the audio band is digital and therefore the overall performance can be very similar to what you get if you record at a higher sample rate and downsample in the digital domain. Note that there can be considerable techncal variation in the details of how the digital filtering is implemented, whether in the ADC or applied later on.

The part about the final filtering being digital seems right, as far as I understand from reading, but based on my experiments, and those of a few others, the result of recording at 44.1 is never like that from recording at a 88.2 or 96 and downsampling with good software, as I pointed out earlier in this thread and in at least two others in HA (based on results using test tones, the only way to actually observe the final product). Do you have evidence that some soundcards really do better?