Human hearing beats sound’s uncertainty limit, makes MP3s sound worse

Not Heisenberg's, Fourier's. Knowing how it's beaten may help us encode music.

Modern audio compression algorithms rely on observations about auditory perceptions. For instance, we know that a low-frequency tone can render a higher tone inaudible. This perception is used to save space by removing the tones we expect will be inaudible. But our expectations are complicated by the physics of waves and our models of how human audio perception works.

This problem has been highlighted in a recent Physical Review Letter, in which researchers demonstrated the vast majority of humans can perceive certain aspects of sound far more accurately than allowed by a simple reading of the laws of physics. Given that many encoding algorithms start their compression with operations based on that simple physical understanding, the researchers believe it may be time to revisit audio compression.

Time and frequency: Two sides of the same coin

You'll notice I didn't say, "human hearing violates the laws of physics," even though it was very tempting. The truth is that nothing violates the laws of physics, though many things violate the simplified models we use to approximate them.

Take a tone, played continuously for ever and ever. The frequency of the tone is very well-defined, but it has no start or end point. Therefore, the time that the note was played is entirely uncertain. Conversely, when we beat a drum, the sound has a very sharp temporal definition, but the tone is actually a broad spectrum of individual frequencies all added together. These two properties, the timing of a tone and its frequency, are related to each other. The measurement of one limits the measurement of the other (called the Fourier uncertainty principle).

In between our infinitely long note and the drum beat, there are short sharp packets of sound that have their frequency and timing as precisely defined as they can be. Any individual note would have to be longer to measure its frequency more accurately—indeed, the note would have to be longer to have a better defined frequency. But the note would have to contain more frequency components to have a sharper temporal structure. These bits of sound are often called Fourier-limited pulses, since they possess a temporal and frequency uncertainty that are, together, as small as possible.

Humans, are you nonlinear?

These pulses of sound represent the ultimate limits for linear measurements. If human hearing uses a linear form of frequency and temporal sound perception, we should expect that we will not be able to perceive timing and frequency differences that are smaller than these ultimate limits.

To test this, a pair of physicists from Rockefeller University gave a group of subjects tests where they were asked to perceive frequency differences between Fourier-limited sound packets. They were also asked to perceive timing differences between Fourier-limited sounds and to do both simultaneously. The tests were run with distracting high notes being played.

They found humans certainly do not perceive sound in a linear fashion. Indeed, one subject was able to determine the relative timing of notes to an accuracy of about one oscillation period. However, this high temporal precision came at the cost of frequency precision. Even taking the decreased frequency acuity into account, the combined precision was still much better than that given by the limits of a linear model. Likewise, another subject had extraordinary frequency perception at the cost of temporal resolution but still beat the uncertainty limit.

Most subjects clocked in with uncertainty limits about 10 times better than a linear model would suggest, with musicians, composers, and conductors performing best.

Why, yes you are nonlinear

The obvious conclusion, of course, is that humans don't perceive sound linearly. To a large extent, this was already known. We know volume is perceived nonlinearly, but we didn't really know much about temporal/frequency perceptions. Researchers suspected that this was nonlinear—because the brain is anything but linear—but they didn't know which model would accurately represent what goes on in the brain. Researchers and sound engineers have continued to work with linear models because they don't really know what else to use.

As the researchers point out, their results go a long way to eliminate many nonlinear models because most don't predict the combined temporal and frequency resolution found in humans. They also point out the importance of this work for audio encoding. Even now, one of the first steps of many encoders is to use a linear algorithm to break up an audio track into a 2D soundscape, which is then used as input for the actual encoding.

I don't have a lot of time for audiophiles with gold-coated connectors and "unidirectional" coaxial cable, but this data is something I could buy into.

I'm still amazed that $100+ HDMI cables or $300/ft speaker wire even exist, but they do. Companies are successful because people are - for the most part - not scientific. Of course not everyone can be (or should be) an EE, but some basic knowledge of these topics would be useful to all.

I laughed the first time I saw a 6 ft $299 HDMI cable for sale at Best Buy. Had some fun with the sales people that day.

...its not obvious why you would want to use wavelets here. The DFT actually works pretty well, the main problem being that its frequency bins are evenly spaced. You can do better (and a few formats do) just by using a filterbank, but its much slower. So theres a tradeoff here between efficiency and performance.

The fixed spacing of the frequency bins is a consequence of the (finite) samples which you feed into your transform. Assuming a practical filter also has to have a finite length, in what sense can a filterbank be "better" than an DFT-based method if it's also slower?

You can get rid of the evenly spaced bins that the DFT forces on you. Your ideal transform should have bins that evenly sample the bark scale. The DFT doesn't do this, and instead oversamples high frequencies (or alternatively undersamples lower frequencies).

But if you have speed to burn, then you can oversample the high frequencies and then you do analysis on groups of bins. That should be no different from unevenly spaced bins. (That's why I included the context of the slower speed).

redleader wrote:

Kalessin wrote:

A DCT (not a DFT) tends to concentrate the variance from many real-world signals into a relatively few, lower-frequency bands.

The residual signal is not a real-world signal. Its pretty close to white, so transforming it does nothing for you aside from adding rounding error which you then have to handle.

If the residual signal is (practically speaking) white noise then subtracting it should be like adding white noise. Is there something inherently wrong with this if the goal is lossy compression up to some threshold of quality loss?

Quote:

To be clear, the post you quoted refereed to lossless coding.

Ah, I missed that point. Well, what's a vanishingly small loss between friends, right?

I don't quite follow the last sentence. Does you mean that most big listening tests are now done at lower than 128K?

Yeah. When was the last big 128k test someone did where even a majority of people rated MP3 lower then lossless? Almost ten years ago I think.

Kalessin wrote:

Also, are you saying that 128K is "good enough" because the differences are only noticeable on "hard to encode samples"?

No, because there are still samples where it gets it wrong. But the problem for formal testing is picking samples. Say 1 in ten songs will have an artifact. You pick 10 songs in your listening test. Theres a chance that none have that problem, then you conclude MP3 is transparent. Theres a chance you pick 3 that have it. Then you conclude MP3 isn't transparent.

The problem becomes getting enough statistical power in this situation. You can't realistically test enough samples, so no one bothers. IMO ~160kbps is probably enough though. At that point you're mostly looking at pathological samples if you want to hear artifacts, and they're usually too subtle to notice without careful comparison to the reference.

But if you have speed to burn, then you can oversample the high frequencies and then you do analysis on groups of bins. That should be no different from unevenly spaced bins. That's why I included the comment about relative slowness.

To reconstruct the signal you have to actually transmit the bins. If you oversample them, your compression is going to be awful, or you're going to have to have some way of compressing the redundancy you just introduced. You want them critically sampled if possible.

Quote:

Quote:

Kalessin wrote:

Quote:

A wavelet is a transform. It can't compress anything. If you put in N bytes into a wavelet transform you get at least N bytes worth of data out. Theres no compression...

A DCT (not a DFT) tends to concentrate the variance from many real-world signals into a relatively few, lower-frequency bands.

The residual signal is not a real-world signal. Its pretty close to white, so transforming it does nothing for you aside from adding rounding error which you then have to handle.

I'm not sure what you mean. If the residual signal is (practically speaking) white noise then subtracting it should be like adding white noise.

The goal of residual coding is to losslessly compress the residual using the smallest number of bytes possible. Since its pretty close to random noise, you're basically fucked.

I like how "anyone with relatively undamaged hearing *should* be able to tell the difference" has backpedaled into a lot of butthurt nonsense.

Shendai wrote:

As an aside, a few people that came over before and casually overheard my music playing asked about it since - in their approximate words - "it sounds better than what I've heard before."

hahahahah

Not butthurt. No nonsense.

Just flatly stating you're wrong.

Wrong about what exactly?

Shendai wrote:

Don't really care what your background or qualifications are. When it comes to enjoying music I trust my ears, not your assertions.

sooooo butthurt

My last exchange with you on this topic, only to clarify.

You're wrong that lossless format = lossy format, as it relates to sound/music.

You wrote "bullshit" as a response to my statement that lossless format is superior. I'm not going to debate it, just going to say that - in my opinion - you're wrong. Many people hear the difference. Many people can hear above 20K Hz, for example. I'm not talking about 1-2% edge cases either. FLAC is objectively superior to mp3 from a quality standpoint. If the file sizes were about the same, this wouldn't even be a conversation.

Good, I don't have anything specifically against lossy audio, I merely choose lossless because I can tell the difference, I know that everyone'll say "that's bullshit" but listen to a lossless track and an MP3 side by side with any pair of headphones, and you can easily hear differences. Long story short, if they were to come out with a perfect lossy compression algorithm that I couldn't hear the difference between it and lossless, I'd gladly switch.

Even mainstream loseless audio is not good enough, it's not a just problem of bit rate or compression artifacts. We need higher than CD quality audio sampled at higher than 44kHz. The Nyquist Shannon sampling theorem states that theoretical accurate reconstruction of the original waveform with the maximum frequency component of f requires sampling frequency of at least f/2. Note the term 'theoretical' and 'at least'.

Human can perceive tones around 20kHz. However, that just mean that human is capable of hearing a single sinusoid of 20kHz. We don't know any nonlinear relationship. It may well be possible that quick abrupt 'attack' sound which undoubtedly has higher than 20kHz components may be perceived differently to one without higher than 20kHz components by the auditory system.

It'd be better to play it safe and sample audio at 48kHz (DTS/Dolby Digital) or even 96kHz (DTS-HD/TrueHD)

Even mainstream loseless audio is not good enough, it's not a just problem of bit rate or compression artifacts. We need higher than CD quality audio sampled at higher than 44kHz. The Nyquist Shannon sampling theorem states that theoretical accurate reconstruction of the original waveform with the maximum frequency component of f requires sampling frequency of at least f/2. Note the term 'theoretical' and 'at least'.

The actual ratio for modern DACs is about .48 of Nyquist, so the upper limit is close to 21khz. In practice lots of stuff is way less on CD though, because no one cares about such high frequencies. By the time you're in your thirties you've lost all that hearing.

MobiusPizza wrote:

Human can perceive tones around 20kHz. However, that is just single sinusoid. We don't know any nonlinear relationship. It may well be possible that quick abrupt 'attack' sound which undoubtedly has higher than 20kHz components may be perceived by the auditory system.

This is so easy to test. Just go try it and see if you can hear it instead of speculating.

MobiusPizza wrote:

It'd be better to play it safe and sample audio at 48kHz (DTS/Dolby Digital) or even 96kHz (DTS-HD/TrueHD)

Isn't 48k DTS a lossy subband codec? Thats probably not what you meant to say. IMO 48k would have been nice for everything, but in practice no one ever shows a difference on blind tests so its not worth caring about.

But if you have speed to burn, then you can oversample the high frequencies and then you do analysis on groups of bins. That should be no different from unevenly spaced bins. That's why I included the comment about relative slowness.

To reconstruct the signal you have to actually transmit the bins. If you oversample them, your compression is going to be awful, or you're going to have to have some way of compressing the redundancy you just introduced. You want them critically sampled if possible.

I don't think I'm getting my point across. If the FFT is significantly faster than the filter bank, then I can oversample and construct equivalent filter banks in Fourier space. Shouldn't the two methods produce exactly identical results?

redleader wrote:

Kalessin wrote:

Also, are you saying that 128K is "good enough" because the differences are only noticeable on "hard to encode samples"?

No, because there are still samples where it gets it wrong. But the problem for formal testing is picking samples...The problem becomes getting enough statistical power in this situation. You can't realistically test enough samples, so no one bothers. IMO ~160kbps is probably enough though. At that point you're mostly looking at pathological samples if you want to hear artifacts, and they're usually too subtle to notice without careful comparison to the reference.

I see. That's an interesting aspect of the experimental design I hadn't really thought about.

But then it doesn't actually contradict the claim that people with undamaged ears should be able to hear the difference between 128 K and lossless. What it does say is that the differences will only be audible with some material.

If the artifacts are real and audible, then it's not just someone's imagination, it's really a value judgment. People spend huge sums of money for asymptotic gains in quality (real or imagined). 1 artifact in 10 songs could be something you can (or ought to) live with, or it could be worth 32 more kpbs (or whatever) to cut it down to some arbitrarily small number.

But if you have speed to burn, then you can oversample the high frequencies and then you do analysis on groups of bins. That should be no different from unevenly spaced bins. That's why I included the comment about relative slowness.

To reconstruct the signal you have to actually transmit the bins. If you oversample them, your compression is going to be awful, or you're going to have to have some way of compressing the redundancy you just introduced. You want them critically sampled if possible.

I don't think I'm getting my point across. If the FFT is significantly faster than the filter bank, then I can oversample and construct equivalent filter banks in Fourier space. Shouldn't the two methods produce exactly identical results?

Well, first, you don't use an FFT, but rather an MDCT. Second, how does oversampling give you non-uniform sampling? Are we even talking about the same thing?

Edit: shit I said DFT in the first post instead of MDCT

Kalessin wrote:

redleader wrote:

Kalessin wrote:

Also, are you saying that 128K is "good enough" because the differences are only noticeable on "hard to encode samples"?

No, because there are still samples where it gets it wrong. But the problem for formal testing is picking samples...The problem becomes getting enough statistical power in this situation. You can't realistically test enough samples, so no one bothers. IMO ~160kbps is probably enough though. At that point you're mostly looking at pathological samples if you want to hear artifacts, and they're usually too subtle to notice without careful comparison to the reference.

If the artifacts are real and audible, then it's not just someone's imagination, it's really a value judgment. People spend huge sums of money for asymptotic gains in quality (real or imagined). 1 artifact in 10 songs could be something you can (or ought to) live with, or it could be worth 32 more kpbs (or whatever) to cut it down to some arbitrarily small number.

But then it doesn't actually contradict the claim that people with undamaged ears should be able to hear the difference between 128 K and lossless.

Sure it does. If you get a lot of people together, and do a bunch of 128k samples, and they don't rate the 128k different then the lossless files, then ...

Kalessin wrote:

What it does say is that the differences will only be audible with some material.

Obviously you can always pick a sample that will produce an audible difference given that you understand what the encode is doing well enough. But so what? If you rig the test the results become uninteresting.

Just flatly stating you're wrong. Not really sure why you care, but not going to agree with you when repeated experience shows me a difference.

Don't really care what your background or qualifications are. When it comes to enjoying music I trust my ears, not your assertions.

There seems to be a downvote patrol here trying to punish anyone who claims to be able to hear the difference between FLAC and lossy compression, which is unfortunate, because there are a lot of people who can hear a noticeable difference, especially for MP3 below 256 kbps.

I would make a YouTube video to demonstrate my own skill with ABX tests, but it doesn't seem like it would be worthwhile when the people here who keep calling bullshit (presumably because they can't hear any difference) would probably just claim I was faking the results. I'm sure the other people here like Shendai feel the same way about the situation.

So which is the more likely explanation: that everyone who prefers lossless to lossy compression is deluding themselves in the same way as audiophiles buying $100,000 cables are deluding themselves, and that they can't really hear the difference because it's somehow impossible for the human ear/brain to detect the difference, or... that it is possible for the human ear/brain to hear the difference between lossy and lossless (because of all that data which is being thrown away under the presumption that people won't notice it was discarded), and that we're making informed decisions as to the encoding formats and bitrates that we choose for our own music collections?

If MP3 works for you, then fine, great, have fun. But it's pretty offensive to be condescended to by people who act like their "non-golden ears" are the gold standard for what everyone else should be doing. Do you also go to the Fraunhofer Institute, Dolby Labs, etc., and tell them they should just give up and close their doors because in your expert opinion 128 kbps MP3 is the be-all and end-all of audio codec technologies and there's no possible way to improve on it, so why bother? If not, then why not? Is it because you know that your "expert opinion" doesn't trump the experiences of actual experts in the field?

There seems to be a downvote patrol here trying to punish anyone who claims to be able to hear the difference between FLAC and lossy compression

Maybe they just don't like butthurt posting?

Puma720 wrote:

So which is the more likely explanation: that everyone who prefers lossless to lossy compression is deluding themselves in the same way as audiophiles buying $100,000 cables are deluding themselves, and that they can't really hear the difference because it's somehow impossible for the human ear/brain to detect the difference, or... that it is possible for the human ear/brain to hear the difference between lossy and lossless (because of all that data which is being thrown away under the presumption that people won't notice it was discarded), and that we're making informed decisions as to the encoding formats and bitrates that we choose for our own music collections?

This is such a pointless thing to say. I haven't seen a post claiming either of these things.

Puma720 wrote:

But it's pretty offensive to be condescended to by people who act like their "non-golden ears" are the gold standard for what everyone else should be doing. Do you also go to the Fraunhofer Institute, Dolby Labs, etc., and tell them they should just give up and close their doors because in your expert opinion 128 kbps MP3 is the be-all and end-all of audio codec technologies and there's no possible way to improve on it, so why bother? If not, then why not? Is it because you know that your "expert opinion" doesn't trump the experiences of actual experts in the field?

Calm down. No one is saying any of those things, so no need to get all worked up over it.

Good, I don't have anything specifically against lossy audio, I merely choose lossless because I can tell the difference, I know that everyone'll say "that's bullshit" but listen to a lossless track and an MP3 side by side with any pair of headphones, and you can easily hear differences. Long story short, if they were to come out with a perfect lossy compression algorithm that I couldn't hear the difference between it and lossless, I'd gladly switch.

I'm curious, have you tried doing a double-blind ABX test? You can look at HydrogenAudio to find those.

I'd be very impressed if you could consistently actually tell which track is which in such a test. For example, I can hear that there's a difference but I cannot determine which is which.

generally for me the one missing the upper and lower ranges is the mp3/lossy compression...I tend to notice the missing bass and higher pitches (Snare Drum/wind intruments/brass instruments) on the mp3 as opposed to the flac/cd versions....and of course compared to the analog version there is a lot of "noise" missing....now if you give me a digital sample and compare it to analog i can tell the digital from the analog but would not be able to tell you if the digital version is lossy/lossless unless you gave me a copy of all 3 (analog/lossly/lossless) then i could tell you the which is which. EDIT: This is on my home system and even ipod with good headphones....in the car with the noise from the wind, road etc lossy/lossless is pretty much irrelevent.

generally for me the one missing the upper and lower ranges is the mp3/lossy compression...

lossy compression does not change the relative intensity of each frequency band (by design). Instead it changes the time/frequency resolution the waveform and possibly the stereo separation. So if you hear changes to the frequency content at each band, but do not hear things like pre-echo, you're doing something very wrong.

Perhaps this is a good time to try an ABX test using known good software (e.g. foobar2000)?

I think "butthurt" is in the butt of the beholder. I'm just mildly annoyed.

Anyway, why don't you take a look at some of the crowdsourced encoder ratings at SoundExport.org? Even at 256kbps, there are measurable differences. AAC performs best and MP3 is near the bottom. Curiously, iTunes 7.1's AAC encoder performed the worst, while newer versions of iTunes performed much better.

It doesn't give you non-uniform sampling, but you should be able to construct appropriate bin groups, after transforming, to give the desired (variable) resolution if that's the goal. Then it comes down to which is computationally more efficient, since it should be possible to achieve identical results.

redleader wrote:

Kalessin wrote:

But then it doesn't actually contradict the claim that people with undamaged ears should be able to hear the difference between 128 K and lossless.

Sure it does. If you get a lot of people together, and do a bunch of 128k samples, and they don't rate the 128k different then the lossless files, then ...

If people with undamaged ears can be trained to recognize artifacts and know where to look for them, then they can hear the difference. This is obviously becoming getting into semantics, but I don't think the previous poster's comment was particularly wrong. It's more that the two of you are arguing about slightly different things.

redleader wrote:

Kalessin wrote:

What it does say is that the differences will only be audible with some material.

Obviously you can always pick a sample that will produce an audible difference given that you understand what the encode is doing well enough. But so what? If you rig the test the results become uninteresting.

I'm going to go out on a limb here and suggest that most "critical" listening is rigged anyway. People enjoy their "sweet" sounding speakers by listening to shrill recordings. Other people justify their big subs by playing tracks with big bass that they couldn't hear before they bought the subs. The improvement is always relative to some known (or perceived) problem. Anyway, it's just a thought. I don't really want to argue about what constitutes "rational" listening behavior.

It doesn't give you non-uniform sampling, but you should be able to construct appropriate bin groups, after transforming, to give the desired (variable) resolution if that's the goal. Then it comes down to which is computationally more efficient, since it should be possible to achieve identical results.

Can you do that with a lapped transform though? I'm not sure if overlapping is going to work in that case.

Kalessin wrote:

redleader wrote:

Kalessin wrote:

But then it doesn't actually contradict the claim that people with undamaged ears should be able to hear the difference between 128 K and lossless.

Sure it does. If you get a lot of people together, and do a bunch of 128k samples, and they don't rate the 128k different then the lossless files, then ...

If people with undamaged ears can be trained to recognize artifacts and know where to look for them, then they can hear the difference.

Who said untrained listeners? And who says people can be trained to do this?

Kalessin wrote:

This is obviously becoming getting into semantics, but I don't think the previous poster's comment was particularly wrong. It's more that the two of you are arguing about slightly different things.

I think he has no idea what hes talking about.

Kalessin wrote:

I'm going to go out on a limb here and suggest that most "critical" listening is rigged anyway. People enjoy their "sweet" sounding speakers by listening to shrill recordings. Other people justify their big subs by playing tracks with big bass that they couldn't hear before they bought the subs. The improvement is always relative to some known (or perceived) problem.

The kinds of things that break codecs are barely even music. I doubt that is what anyone is referring to.

If people with undamaged ears can be trained to recognize artifacts and know where to look for them, then they can hear the difference.

Who said untrained listeners? And who says people can be trained to do this?

What else could the following comment imply?

redleader wrote:

Obviously you can always pick a sample that will produce an audible difference given that you understand what the encode is doing well enough. But so what? If you rig the test the results become uninteresting.

If there exists "understanding" to be had that allows one to "rig" the test by producing audible differences, then it suggests that listeners can be trained to recognize what they're looking for and where to find them.

redleader wrote:

The kinds of things that break codecs are barely even music. I doubt that is what anyone is referring to.

Many of the historical problem samples for mp3, especially at lower bit rates (including 128K), have contained stuff like piano notes, percussion hits, castenets, etc. All "music" by most fairly open-minded definitions of the term.

It doesn't give you non-uniform sampling, but you should be able to construct appropriate bin groups, after transforming, to give the desired (variable) resolution if that's the goal. Then it comes down to which is computationally more efficient, since it should be possible to achieve identical results

I'm not quite getting why this wouldn't work either - and why it isn't just an implementation efficiency question. Maybe you (redleader) and Kalessin are talking about, and I am understanding, different things?

I'm not quite getting why this wouldn't work either - and why it isn't just an implementation efficiency question. Maybe you (redleader) and Kalessin are talking about, and I am understanding, different things.

Just flatly stating you're wrong. Not really sure why you care, but not going to agree with you when repeated experience shows me a difference.

Don't really care what your background or qualifications are. When it comes to enjoying music I trust my ears, not your assertions.

There seems to be a downvote patrol here trying to punish anyone who claims to be able to hear the difference between FLAC and lossy compression, which is unfortunate, because there are a lot of people who can hear a noticeable difference, especially for MP3 below 256 kbps.

I would make a YouTube video to demonstrate my own skill with ABX tests, but it doesn't seem like it would be worthwhile when the people here who keep calling bullshit (presumably because they can't hear any difference) would probably just claim I was faking the results. I'm sure the other people here like Shendai feel the same way about the situation.

So which is the more likely explanation: that everyone who prefers lossless to lossy compression is deluding themselves in the same way as audiophiles buying $100,000 cables are deluding themselves, and that they can't really hear the difference because it's somehow impossible for the human ear/brain to detect the difference, or... that it is possible for the human ear/brain to hear the difference between lossy and lossless (because of all that data which is being thrown away under the presumption that people won't notice it was discarded), and that we're making informed decisions as to the encoding formats and bitrates that we choose for our own music collections?

If MP3 works for you, then fine, great, have fun. But it's pretty offensive to be condescended to by people who act like their "non-golden ears" are the gold standard for what everyone else should be doing. Do you also go to the Fraunhofer Institute, Dolby Labs, etc., and tell them they should just give up and close their doors because in your expert opinion 128 kbps MP3 is the be-all and end-all of audio codec technologies and there's no possible way to improve on it, so why bother? If not, then why not? Is it because you know that your "expert opinion" doesn't trump the experiences of actual experts in the field?

Thanks.

Redleader is one smug SOB who acts like they know everything about sound quality. I'm just going by my ears and their experience.

Because we PERCEIVE sound non-linerarly however does not change the physics behind the sound generated, nor the physics of the ear itself as well as the speakers. Some sounds actually cannot be heard because the sound waves in fact cancel out before they even GET to the ear. Other sounds are outside normal human range of being capable of being heard, and even though they may impact other wafeforms in the spectrum, those can often be factored in by generating the modified wave in the first place instead of both waves and letting them cancel, so there are still lots of types of compression that can still be done that will have zero testable discernable differences.

All we know is, our simple models are too simple. But, those models were based on the limitations of processing power in the late 90s (or even earlier) so new models can be made.

Also, we don't use a single point sound generator, we use numerous independent speakers. The timing of those speakers is FAR from perfect. If there's even minute differences in when the sound is generated from one vs another, or angles are odd or walls are allowed to reflect and refract those signals, unwanted signal canceling (or lack there of when its wanted) are the result. Even with the best multi-driver isolated headphones, we actually can;t even come close to acurate reproduction of the originating wave inclusive of direction. A single mic recorded the wave, but we may be reproducing it from several drivers each having disimilar signal production ranges. This process alone introduces FAR more distortion and unintended sound than the best available compression algorithms do.

In the end though, space (storage) really is not an issue, nor is CPU power for advanced algorithms. We could at this point easily be using spacial soundmaps instead of 2 dimensional soundwaves, and using completely lossless compression, at what, maybe 4-5 times our current storage needs? I have 38K tracks, almost all of whch are 256bit MP3 or AAC, some even better, and its barely over 100GB. If it was half a TB, that would still be reasonable for most people (since most people should only have a few thousand songs anyway, I got stream-rip happy a few years ago and amassed way more audio than I can reasonably sort and listen to and now that we have Pandora and Spotify I hardly ever even use it). We're making all the efforts to study better algorithms, when we already passed the point that a lossless surround audio profile is superior to our ability to even reproduce the signal into actual sound, so we should just do that, and start working on 3space sound generation and all new methodologies.

But then it doesn't actually contradict the claim that people with undamaged ears should be able to hear the difference between 128 K and lossless.

Sure it does. If you get a lot of people together, and do a bunch of 128k samples, and they don't rate the 128k different then the lossless files, then ...

Kalessin wrote:

What it does say is that the differences will only be audible with some material.

Obviously you can always pick a sample that will produce an audible difference given that you understand what the encode is doing well enough. But so what? If you rig the test the results become uninteresting.

This is the idea behind "made for iTunes" Yes, no matter what algorithm you use, applied generically, it will be easy to find files that the algorithm handles poorly and would be excessively noticable, even to untrained ears.

What Apple is doing is a) working with AAC which is far superior at the same bitrate to MP3 in the first place, and b) actually teaching recording studios how to record, and how to arrange PROPERLY to avoid weaknesses in the output, and TUNE the algorithm specifically to the music to eliminate almost all of these audible issues, producing a non-lossless file that no one has yet statistically confirmed sounds any different from the lossless file equivalent or raw original. If we use compression inteligently, the differences become so subtle that even trained experts have admitted, they;re essentialyl taking a guess which sounds better. In more detailed tests where they;re asked to point out the instants that they hear differences by listening to both tracks interchangible, they have no accuracy at all in doing so, more often pointing out "differences" where even technicans can;t see them on screen, and missing massive waveform differences entirely when they happen equally as often (aka, they're just guessing, and really have no idea). If in highly structured tests with experts they can't tell, it's good enough for the rest of us. Yes, higher grade speakers/headsets do matter, but so long as you;re using tuned compression and proper recording technioques, and as long as we're talking modern compression standards (256bit or higher), outside of a few anomolies the studio should be able to detect, and resample, and fix before releasing. The diferences between new "made for iTunes" 256bit AAC files and older direct ripped 256bit AAC files is actually more pronounced than the differences between 256bit AAC and 128bit MP3.

So, yes, to support the article, better understanding of compression leads to better files, and yes, humans can hear compression. This is just due to POOR USE OF COMPRESSION, not to say we can;t compress properly.

Did anyone contest the fact that analog cable quality has an impact on audio quality? The mockery has been fairly aimed at the absurdly expensive digital cables.

Yes: see the last paragraph of the article.

He didn't specifically say analogue vs digital, but since virtually all audio cables and connectors are analogue it's probably what he meant.

1: there's no such thing as a "digital" cable. just digital signaling ON a cable. The cost of the cable itself is actually irrelevent in comparrison. yes, a multi-pin cable costs more than a simple coax cable, at leasty, in most cases (sometimes the amount of metal involved, or being coaxial vs simply twisted changes that price), but all else equal, digital cables should not cost more. In fact, if anything, since they're less susceptrible to interference, and thus need LESS shielding (in some cases NONE), they should be the same price or cheaper.

2: keeping the signal digital until the very last second DOES improve signal production. Each hop along analog introduces noise. A digital recording converted immediately to analog, sent from a player, to a TV, to a receiver, and down speaker wires makes many analog hops, and also goes through protocol conversion multiple times intrioducing significant artifacting. Keeping a signal digital from source to receiveer and only convertying to analog to go down the speaker wire IS superior, and yes, even with identical speakers you CAN tell this difference, even using the best possible cables. If the signal were to be recoreded in chanel independent formats, and you actually had endpoint decodes ON speaker, and we sent digital all the way with only the analog membrane itself the only issue, we would wire the entire damned system on unshielded pairs and it would be superior. Putting that tech in-speaker is proihibitive for many reasons, but converting to analog at any point before the last point before the speaker should be avoided at all costs. Equally, converting from 5.1 to 2.1 to play through a 5.1 (or even 7.1, aka, any configuration other than what it was recoreded in) is bad for the audio. Many TVs will downsample THX or Doble 5.1 to 2.1 when using passthru over fiber, or converts from one format to another digital format, which is a lossy conversion, and this should also be avoided.

What Apple is doing is a) working with AAC which is far superior at the same bitrate to MP3 in the first place, and b) actually teaching recording studios how to record, and how to arrange PROPERLY to avoid weaknesses in the output, and TUNE the algorithm specifically to the music to eliminate almost all of these audible issues

I don't think this is right. As I understand it (and what I can gather from what Apple says) all "Mastered for iTunes" is is a way to monitor what something is going to sound like in AAC while mastering (it's just an AU plugin that you put last in the monitoring chain). So supposedly, for some reason, you might want to mix/eq/whatever differently for iTunes. It isn't at all obvious to me that this really helps beyond allowing one to say that "I made it sound as good as I can in AAC" (which is something I suppose). It in no way "TUNE[s] the algorithm specifically to the music" (at least not that I know of).

What Apple is doing is a) working with AAC which is far superior at the same bitrate to MP3 in the first place, and b) actually teaching recording studios how to record, and how to arrange PROPERLY to avoid weaknesses in the output, and TUNE the algorithm specifically to the music to eliminate almost all of these audible issues

I don't think this is right. As I understand it (and what I can gather from what Apple says) all "Mastered for iTunes" is is a way to monitor what something is going to sound like in AAC while mastering (it's just an AU plugin that you put last in the monitoring chain). So supposedly, for some reason, you might want to mix/eq/whatever differently for iTunes. It isn't at all obvious to me that this really helps beyond allowing one to say that "I made it sound as good as I can in AAC" (which is something I suppose). It in no way "TUNE[s] the algorithm specifically to the music" (at least not that I know of).

You are correct but zelanni's basically got it as well.

The crux of audio mastering is to make it sound a certain way for a given delivery format or mechanism, be it iPods, 78s, LPs or AM radio --- all of which are examples of the same situation, with specific mastering done for them.

Bah Humbug, all digital sound is lossy. There is no true lossless compression of digital music.

Look, the moment the sound wave hits a diaphragm inside a microphone and the pulse get converted into a digital signal that get transcoded into bits, there is loss. Digital music can only estimate and get close to true sound, but never really hit it.

All studio music sounds wrong. All the studios care about is digital recordings.

Give me a full orchestra and a symphony hall. That is the only reproductions I can enjoy. The record player comes second.

Good, I don't have anything specifically against lossy audio, I merely choose lossless because I can tell the difference, I know that everyone'll say "that's bullshit" but listen to a lossless track and an MP3 side by side with any pair of headphones, and you can easily hear differences. Long story short, if they were to come out with a perfect lossy compression algorithm that I couldn't hear the difference between it and lossless, I'd gladly switch.

I for one will never call bullshit. I accept there are people with better hearing and better knowledge than mine about music quality and sound. However, to maintain my music in a portable and convenient package, I chose to remain in ignorant bliss because I'm enjoying how I listen to my music now

I for example have noticed difference while upgrading source quality. But as I am too mostly mobile listener, and the quality is also dependent on my headphone, I chose to not go to the lossless category. If I ever buy one of those dac/amp combo and the headphone I've been eyeing, flac it shall be.

And you'll find some of the "reviews" of these cables even funnier!...

QUOTE:-

"If there is one cable I would whole-heartedly trust to my Chimera-hunting needs, this would be the cable. No other cable has the tensile strength to properly and efficiently garrote a lycanthrope, asphyxiate an Esquilax or even gag a mermaid. ..."

Chris Lee / Chris writes for Ars Technica's science section. A physicist by day and science writer by night, he specializes in quantum physics and optics. He lives and works in Eindhoven, the Netherlands.