Not Heisenberg's, Fourier's. Knowing how it's beaten may help us encode music.

Share this story

Modern audio compression algorithms rely on observations about auditory perceptions. For instance, we know that a low-frequency tone can render a higher tone inaudible. This perception is used to save space by removing the tones we expect will be inaudible. But our expectations are complicated by the physics of waves and our models of how human audio perception works.

This problem has been highlighted in a recent Physical Review Letter, in which researchers demonstrated the vast majority of humans can perceive certain aspects of sound far more accurately than allowed by a simple reading of the laws of physics. Given that many encoding algorithms start their compression with operations based on that simple physical understanding, the researchers believe it may be time to revisit audio compression.

Time and frequency: Two sides of the same coin

You'll notice I didn't say, "human hearing violates the laws of physics," even though it was very tempting. The truth is that nothing violates the laws of physics, though many things violate the simplified models we use to approximate them.

Take a tone, played continuously for ever and ever. The frequency of the tone is very well-defined, but it has no start or end point. Therefore, the time that the note was played is entirely uncertain. Conversely, when we beat a drum, the sound has a very sharp temporal definition, but the tone is actually a broad spectrum of individual frequencies all added together. These two properties, the timing of a tone and its frequency, are related to each other. The measurement of one limits the measurement of the other (called the Fourier uncertainty principle).

In between our infinitely long note and the drum beat, there are short sharp packets of sound that have their frequency and timing as precisely defined as they can be. Any individual note would have to be longer to measure its frequency more accurately—indeed, the note would have to be longer to have a better defined frequency. But the note would have to contain more frequency components to have a sharper temporal structure. These bits of sound are often called Fourier-limited pulses, since they possess a temporal and frequency uncertainty that are, together, as small as possible.

Humans, are you nonlinear?

These pulses of sound represent the ultimate limits for linear measurements. If human hearing uses a linear form of frequency and temporal sound perception, we should expect that we will not be able to perceive timing and frequency differences that are smaller than these ultimate limits.

To test this, a pair of physicists from Rockefeller University gave a group of subjects tests where they were asked to perceive frequency differences between Fourier-limited sound packets. They were also asked to perceive timing differences between Fourier-limited sounds and to do both simultaneously. The tests were run with distracting high notes being played.

They found humans certainly do not perceive sound in a linear fashion. Indeed, one subject was able to determine the relative timing of notes to an accuracy of about one oscillation period. However, this high temporal precision came at the cost of frequency precision. Even taking the decreased frequency acuity into account, the combined precision was still much better than that given by the limits of a linear model. Likewise, another subject had extraordinary frequency perception at the cost of temporal resolution but still beat the uncertainty limit.

Most subjects clocked in with uncertainty limits about 10 times better than a linear model would suggest, with musicians, composers, and conductors performing best.

Why, yes you are nonlinear

The obvious conclusion, of course, is that humans don't perceive sound linearly. To a large extent, this was already known. We know volume is perceived nonlinearly, but we didn't really know much about temporal/frequency perceptions. Researchers suspected that this was nonlinear—because the brain is anything but linear—but they didn't know which model would accurately represent what goes on in the brain. Researchers and sound engineers have continued to work with linear models because they don't really know what else to use.

As the researchers point out, their results go a long way to eliminate many nonlinear models because most don't predict the combined temporal and frequency resolution found in humans. They also point out the importance of this work for audio encoding. Even now, one of the first steps of many encoders is to use a linear algorithm to break up an audio track into a 2D soundscape, which is then used as input for the actual encoding.

I don't have a lot of time for audiophiles with gold-coated connectors and "unidirectional" coaxial cable, but this data is something I could buy into.

Share this story

Chris Lee
Chris writes for Ars Technica's science section. A physicist by day and science writer by night, he specializes in quantum physics and optics. He Lives and works in Eindhoven, the Netherlands. Emailchris.lee@arstechnica.com

218 Reader Comments

It turns out that our brains are good at detecting patterns, and we can hear music playing at a volume -below- the level of the noise. So the dynamic range of what our ears can detect as music turns out to be much higher with records than the SNR number would have predicted.

A bit like hearing my 68 year old father acing his hearing tests year after year, hearing things even a teenager shouldn't be able to hear... and oh, my father's a ham radio operator; used to finding signals way below the noise level.

Yeah, that signal processor between our ears is a truly remarkable piece of kit.

All to avoid changing the reverb of the audio signal (basic reverb is a trivial DSP operation that would bore a 1986 56k DSP to tears

Reverb that such a DSP could produce would bore me to tears. Today reverb is done with impulse responses recorded in actual spaces and applied using convolution to the dry signal.

yet_another_wumpus wrote:

I'm sure you could add any reverb you want real time without issue)

Yes you could, but it would be on the whole mix, not on individual channels. You can't add reverb only to vocals or only to strings. What you would get would sound like swimming shit.

With carefull speaker placement, and some equalizer skills It is possible to have a place in the room where your favorite music sounds good and where the effects of the room on the sound are minimized and listening to musinc over speakers still has some advantages than using headphones.

Properly done lossy encoding encoding is indistinguishable from lossless. That is a fact. You can't argue against an ABX test.

Except that this article confirms that there is no such thing as "properly done lossy encoding" yet.

Judging by the complexity of our hearing system there will never be such thing as "properly done lossy encoding".

You know, it is pretty stupid to argue with people and to try to convince them that they can't hear the difference when they clearly can. Maybe you can't hear it? You should be happy if that's the case because world is full of crappy sound and you can enjoy it while others cringe every time they hear it.

Beg to differ. 16-bit, fair enough. The history of 44.1 kHz as a valid choice was interesting, thank you, but 48 kHz is also compatible with NTSC (and PAL & film) and was and continues to be the standard used by recording engineers. There was quite a while when prosumer devices were available that could do 48 to interface with pro gear but specifically wouldn't do 44.1 so that they couldn't be used to duplicate cds. This was a desirable outcome from Sony's point of view. Witness also the existence of Audio CD-Rs.

I'm not entirely sure about the very first DAT recorders, but all the ones I've ever worked with since the mid-nineties have been switchable 44.1/48 kHz.

The only recording engineers I know who regularly work in 48 kHz work in video soundtracks (and those are the only situations I switch). Everybody else is on 44.1.

Haven't studies shown that people who consistently listen to low-quality MP3s start to prefer them?

No. Studies have shown that people do recognize good quality music and that you don't need 'golden' ears for that. Trained listeners will be a little more consistent and need less time to judge.

Quote:

I don't know that "sounds better" is the right criteria.

Of course it is.

There is no effect that makes ALL music sound better, so the best way to make things sound best is to not apply any effects at all. So "sounds better" and "sounds accurate" converge to the same when the tested sample is big enough.

There was a study some years back in which it WAS proved that people who listen predominantly to mp3s will prefer that sound.

Fidelity to source material does not necessarily correlate with preference. If you took the average person's stereo and made playback as linear as possible (and thus as true to the original source as possible), many people would HATE the result, as they're used to boosting the bass by 18dB, having ear-searing highs, and no mids.

It may be that the artifacts introduced by the mp3 actually mask a shitty mastering job, or a decision made by the band or the engineer that actually resulted in something intentional but not as pleasant.

Case in point: I have an old (original) cassette of Stevie Wonder's "In Square Circle", which I actually enjoy listening to. Putting on a 24-bit mastered lossless version in the studio is unbearable. It sounds just terrible. As does the original vinyl from 1982. The losses in audio quality incurred by the cassette medium and playback are actually pleasing to the ear.

So in ABX tests, people do need to be able to consistently pick out the individual versions, but indicating PREFERENCE, so long as it correlates with the versions, has no bearing upon the effectiveness of the compression algorithm used.

There was a study some years back in which it WAS proved that people who listen predominantly to mp3s will prefer that sound.

There was an article by Geoffrey Morrison, "The Kids Are Alright," last year in Sound & Vision, which introduced some relevant research.

G. Morrison wrote:

A few years ago a Stanford professor lit off the audio doomsday fetishists by claiming that informal tests of his students found that year over year, more preferred the compressed MP3 to that of CD. Despite his own claim this was an informal test, those who wanted to hear such information, accepted it as gospel.[...]Except, turns out, it’s all crap. At the Audio Engineering Society (AES) convention in April, Harman’s Dr. Sean Olive published a paper entitled “Some New Evidence that Teenagers and College Students May Prefer Accurate Sound Reproduction.”[...]58 students, high school and college, listened to four selections, each in CD and 128 kbps MP3. The tests were double blind. The students “expertise” ranged from none (art students) to some (recording arts majors). On average, the students picked CD over MP3 70% of the time. No student showed a preference for MP3, though a small minority showed a difficulty in telling the two apart (a nearly equal number picked CD every time, so...). Interestingly, or perhaps not, the more a student was interested in audio, the more likely they were to choose CD over MP3.[...]In a second round, the same students went through Harman’s speaker listening tests....The students listened to four speakers. As a group they preferred the most accurate speaker of the four, and disliked the least accurate. This falls in line with other tests Harman has done with both trained and untrained listeners.

Depending on the environment preventing static even on a digital connection does make sense. As static is random you will occasionally get minor voltage spikes in your signal, if the environment is noisy enough this can cause errors. Now granted they do have all sorts of error correction algorithms and such (though i'm not sure they implement this for audio equipment).

Anyway point is that those kinds of cables aren't useless, but certainly the need for them is exaggerated by the manufacturers.

I once saw an ad by monster cable that showed a noisy AC power supply in one picture, and then compared it to a perfectly flat DC supply in the next picture. Needless to say, i didn't buy their product...

The point there was that toslink cables send optical signals through glass. Visible light is not affected by external electrical interference.

>The ones I really like are the ones with gold-coated connectors on their digital cables.

That way each bit received (1 or 0) is of the highest quality signal.

Except its not. People purchase high quality digital cables, and then put *any* old bits into the cable! It's amazing what cruft accumulates in the cable if your original source of bits isn't properly sanitized.

Now, obviously the bits on some older CDs get kind of gunky over time, but a decent Bit Sanitizer placed between the source and the cable will keep the cable clean and also avoid bit cruft build-up in the speakers.

When using a Bit Sanitizer, it's important to change the filters regularly. A plugged filter can fail catastrophically, leading to a surge of nasty bits all hitting the tweeters at once.

I'd be very impressed if you could consistently actually tell which track is which in such a test. For example, I can hear that there's a difference but I cannot determine which is which.

A lot of it just comes down to ear training. For those who have experience in music production, especially, this isn't a difficult task at all. For example, I've ABX tested my laptop's internal soundcard with an external audio interface and been able to identify which was which - while playing the same file, mind you. It is easier to tell lossy vs. lossless formats than it is to compare DAC units, I've found.

I just let a friend plays their files to me with me facing forward. But it really doesn't need that much concentration - I can be at a audiophile meet, and point out which files in another's portable system are 320ks - and almost always right.

Well, I will stop the line of questioning because it's not really fair to you. However, I will point out that those aren't double blind tests. The administrator of the test must also not know what the answer is for it to be double-blind.

The other interesting thing is not all 320kbps mp3 encoding is created equal...

He doesn't. He uses random play on an iPod without looking at the song. He doesn't even know what I'm hearing until I say 320k or lossless.

As for the encoding issue, there are differences, but the overall quality loss is the more obvious, most of the time

Nearly all of that probably comes from the fact that most mp3 encoders will lowpass between 16 and 18khz, even if you think you can't hear above 15khz anymore. (Of course, older encoders did suck: Blade and FhG 320 was often worse than current LAME 128, and a lot of old stuff is still kicking around on various platforms in that format.)

> I don't have a lot of time for audiophiles with gold-coated connectors

The ones I really like are the ones with gold-coated connectors on their digital cables.

That way each bit received (1 or 0) is of the highest quality signal.

Bu, bu, but . . . with digital signals, you don't need high quality 1's and 0's. As long as the receiving end of the signal can distinguish a "static-ey" 1 or 0, it has perfect information.

Gold connectors serve exactly the same purpose as the gold plated Fingers on PCI and other circuit cards: they prevent corrosion. A little corrosion on a connector changes its impedance. That's a problem for typical analog audio... if you have a few miles of cable, or very high power (speaker cables), otherwise, not so much. It's far more of a problem for high speed signals, analog video and pretty much all digital. Matched impedances actually matter.

That said, gold plated contacts don't add significant cost. And they're only a good idea for a connector that's going to remain in place for years, like your PCI card. The gold wears off after a handful of plug unplug cycles. Professional audio cables usually plated with some nickel alloy, which resists corrosion but also lasts for years of studio/stage use.

That's were I draw the line for audiophile vs. audiophoole. If you're spending crazy money to do things never even considered during the recording and mastering the music you intend to play, you are a phool indeed. Things like Speake cables, $10,000 for six feet, imbued with various magical properties and yet still imparts "color" to the music (aka distortion) spoken of in terms that would embarrass a wine afficinado. Or $1,000 power cables... because after 100ft of $0.50/foot Romex, that 3ft power chord really can add "the flavor of a fin arabica coffee bean" to your listening pleasure, as well as protect you from a beavy of imagined audio closet monsters.

These is plenty of ground for ridicule. Don't start in the one thing that does actually have a purpose, albeit not the slightest immediate effect on actual sound.

The point there was that toslink cables send optical signals through glass. Visible light is not affected by external electrical interference.

FYI, the TOSlink (from Toshiba Link) system specifies plastic optical fiber, not glass, and 650 nm red LEDs, not lasers. This is the primary reason this interface is locked to stereo CD/DAT bitrates, and has a pretty high error rate in practice, particularly over longer cables.

Good, I don't have anything specifically against lossy audio, I merely choose lossless because I can tell the difference, I know that everyone'll say "that's bullshit" but listen to a lossless track and an MP3 side by side with any pair of headphones, and you can easily hear differences. Long story short, if they were to come out with a perfect lossy compression algorithm that I couldn't hear the difference between it and lossless, I'd gladly switch.

I'm curious, have you tried doing a double-blind ABX test? You can look at HydrogenAudio to find those.

I'd be very impressed if you could consistently actually tell which track is which in such a test. For example, I can hear that there's a difference but I cannot determine which is which.

You can't tell which is which because you don't have the original sound to compare them with as a standard, otherwise it would be easy. So hearing a difference is sufficient, the lossless is better!