Updates from the world's leading classical music label.

I’ve been putting together a blog post on the way to get the best sound out of MP3s, but there are so many elements to deal with, I thought I’d tackle it in pieces. For this post, I’m just going to talk about the missing frequencies in an MP3.

One of the ways we can fit more music into an MP3 is by discarding the least important information. High frequency sounds have a lot going on very quickly, and they can take up a lot of space, so there’s a lot to be gained from getting rid of them.1

Still. We don’t want data to be missing. If the range of human hearing is 20-20,000Hz, and everything above 16,000Hz is missing, that feels like a lot. It seems like that would be 20% of the music.

That’s not how frequencies work, though. Every time we go up an octave, the frequency doubles. Going up like this, numbers can get pretty big, pretty fast, and it makes the high frequencies look a lot more important than they really are. If you wanted to make a piano covering the entire range of human hearing2, you’d need to give it 120 keys instead of the normal 88. If, halfway through building it, you decided you only wanted it to go up to 10,000Hz, not 20,000Hz, you wouldn’t remove half the keys. You’d only remove 12 of them – seven white ones and five black ones.

In any case, 20,000Hz is the highest anybody can hear, not the highest everybody can hear. Above that, your pets might notice, but you won’t. Our sensitivity to high frequencies deteriorates with age, so for most adults the ceiling is more like 16,000Hz. Your kids can probably hear things you can’t, and your pets can hear things your kids can’t.

If, like me, you’ve spent a lot of time around very loud music, your hearing might top out even lower. I can’t hear much above 13,000Hz.

Try it for yourself: this is a 30-second sweep across the full range of human hearing, from 20hz to 20,000hz. Hit the play button, and listen until it goes quiet: that’s as high as you can hear.3

[If you're reading this in a feed-reader, you might have to scroll to the bottom of the page or visit the site the see the player]

There’s an argument that, while these frequencies might be inaudible by themselves, they add character to other sounds in ways that are perceptible to our ears. If this were true, it would be relatively straightforward to prove it and, as far I can see, nobody ever has. It also doesn’t stand up to common sense. Sounds simply don’t become more noticeable when there’s other noises, indeed, the opposite is widely accepted.

So there you go: unless you’re a dog, you can test your hearing and pick and MP3 format that only excludes frequencies you can’t hear. There are, of course, other aspects of MP3 encoding that affect the quality of the sound. Next time, we’ll look at bit rates, fixed and variable, and the effect these have on the sound.

1The point I wanted to make here is way too nerdy for the first footnote.7

2Most notes produced by musical instruments are a combination of several related frequencies, overtones or harmonics. In the piano analogy, I’m only talking about the lowest (and loudest) of these frequencies, called the “fundamental”.

3This is a bit of fun, not a diagnostic tool. If you’re concerned about your hearing, you should see a professional. If you’re interested in playing around with acoustics, though, you should check out the tools at this site. The sound clip on this page is a linear sweep at constant amplitude (-3dBFS). If it seems to get louder and quieter over its range, that’s because your hearing is more sensitive to certain frequencies, (normally around the range of the human voice). This clip is itself encoded as an MP3, but because it contains an extremely simple sound, it doesn’t need to filter out the high frequencies. The MP3 specification is quite flexible on encoding, but all decoders are essentially the same, so I can be confident that your computer will decode the same sound that I get from this file, regardless of the software used to play it back.

4The fundamental frequency of the highest note on the piano is 4186.01Hz, but its overtones will extend upwards beyond the limit of human hearing. If you’re interested in this stuff, I’d recommend this video and, if you still want more detail, this one.

6Dogs can hear up 60,000Hz, mice up to 90,000Hz and bats up to 120,000Hz.

7Ok. You’ve been warned. An MP3 describes a complex sound wave in terms of lots of little bits of a sine wave: “At this point, the wave goes up with a bump this tall and this long”. If you want to lose the rest of the day in articles about mathematics on Wikipedia, then it might help you to know that this is called a Fourier Series. The reason I bring all this up is because if you’re encoding music this way, the high frequencies take up a lot of space: at 20Hz, there are 20 wobbles in the line to describe each second of music. At 20,000Hz, there are 20,000 of them. By getting rid of a small number of high frequencies, you can get rid of a very large amount of data. The trick is to find the frequencies you won’t miss.

Thank you for this. I’ve been very curious about how MP3 is done, and you’ve given me a start. I’ve looked at Wikipedia, whose article describes in detail how it’s done mathematically, but doesn’t reveal much, to me, about how those processes affect the actual sound wave.

This will be covered in more detail it the next post, but it is difficult to explain. My goal here is to reassure you that compression does not have to destroy your recordings, any more than JPEGs destroy your photographs. To do this, I think it’s important to look at the range of options and to illustrate how it is possible to reduce the size of the file without affecting perceived quality, but it isn’t my aim to offer a complete account of the technical process of compression.

Hi end hearing varies by gender, as well as age. Women in general can hear higher tones than men. I think it might have something to do with the length and mass of the bones in the middle ear – smaller better for higher, bigger better for lower?? While it may be true that few adults can hear anything over 16k, most adult _males_ top off somewhere around 12k to 13k. So FWIW, I doubt whatever proximity to loud sounds you had cost you any more hearing range than the life experience of any city dweller.

Perceived sound quality is less about how high the harmonics go, and more about what happens up there in terms of tonal balance and harmonic distortion. The later is a widely misunderstood term: it’s not like ‘fuzztone’ or other kinds of distortion we might associate with rock guitar. ‘Harmonic Distortion” means the reproduction creates overtones at frequencies or levels that did not exist in the source. These overtones are way up at the edge of our hearing so we don’t perceive them as sounds-in-themselves, but as subtle differences in the character of the sound. The musical term ‘timbre’ is largely (but not exclusively) about differing overtone structures. So harmonic distortion affects ‘timbre’. And it’s not strictly a numbers game where less is better. Tube amps, for example, produce a LOT more harmonic distortion than solid state amps, but they tend to add overtones the ear finds acceptable if not pleasing, while solid state devices tend to add overtones at frequencies we experience as harsh. And the distortion produced by electronics is small compared to the distortion produced by transducers (e.g. speakers or headphones) which is more often than not also of the harsh variety.

Now, I have no idea how any of this is affected by codec algorithms, but any high efficiency lossy codec like MP3 isn’t just making a digital sample of what was there in the analog signal. It’s making choices about what to keep and what to toss out, and I would guess this is not just a matter of topping off the high end, but could also possibly alter those harmonic structures????

Thanks Dave. These are all excellent points, some of which will be covered in the next post.

The way these codecs work is to describe the waveform in terms of lots of little bits of sine waves of different amplitudes and frequencies. It’s only ever an approximation, but given enough data, it can be a very good one.

Where it always struggles is with sounds that aren’t obviously “musical” – applause and very percussive sounds present particular problems, because they consist of sharp impulses rather than continuous waves.

The idea of describing the shape of a line with an equation instead of a long list of X/Y co-ordinates isn’t inherently inferior (indeed it has some powerful advantages) but its weaknesses are complex and less predictable.

Because of this, because of the many variables involved, and because the constant developments in encoding algorithms, it seems the sensible approach is to judge its performance on a fair assessment of the quality of the output, rather than on an a priori supposition of what ought to work.

Frequency response alone isn’t a good way to pick a bit rate or file format. I offer the “how high can you hear test” as reassurance, not in an attempt to convince anybody to change their behaviour. Such a test can cast serious doubt on the idea that the absence of a particular frequency is deleterious to your experience of recorded music, but it certainly can’t prove the adequacy of a file format.

Hi Andy
I tried this test on headphones. And of course it does depend on how loud you have it and the frequency response of the phones or speakers.
The response of the ear is amplitude dependant. It is more flat at higher levels of sound. This is why music sounds “better” loud because the ear becomes more responsive to the bass and high treble frequencies. This is why a decent hifi amp will have a “loudness” button which will boost those frequencies when listening at low levels to make the music sound more “linear”

I listened quite loud and I can detect what seem like “aliases” in the sound. That is other lower frequencies that have been created by the digitisation of the signal. Of course what is being sent over the internet in your test is a digital sound signal.

I cut out at about 11-12 kHz but have to have the volume high for this. Many years ago I stopped hearing tape hiss and listened to cassettes with the Dolby off. Still do. ( “What’s a cassette?” Young person) Moi, born in 1947.

What is really annoying is the way that digital rock music stations which use lower sampling rates and high compression make the music sound raw and distorted at the top end.

When it comes down to it, if you enjoy the music, that is what is important. When I was younger I used to say never mind the music listen to the reproduction. Now I just enjoy the music. I no longer have a golden ear.

You’re right, of course, that it’s easier to hear these things when they’re louder, and that the ear’s frequency response is not the same at low volumes as at high ones.

When we digitise sound, though, we can only record up to a limited amplitude (described as 0dB full scale, or 0bDFS), and that maximum amplitude is the same for every frequency. If you turn it up until the low notes are as loud as you can bear, and the high stuff still isn’t audible, then it is going to be very difficult to construct a musical recording in which these frequencies are audible.

As for the lower frequencies you attribute to aliasing, I think they’re probably created during playback, somewhere in your hardware. I suggest this because:

1) There’s considerable variation in MP3 encoders (I used the Fraunhofer encoder, the LAME encoder is also popular), but the MP3 specification doesn’t leave much room for innovation in the way the encoding is turned back into sound, so everybody should get the same PCM waveform out of the file, regardless of the software they use to decode it. What varies is the way that PCM stream is turned into sound, and that happens in your sound card.

2) I was concerned that such problems might arise, so I looked for them: I’m effectively using an MP3 to show you the things some MP3s can’t contain, so I needed to be sure this MP3 did contain them. To check, I converted the MP3 back into PCM audio, reversed the polarity*, mixed it with the original source file, and checked that they cancelled out. I also visually inspected the waveform to ensure that it was the same shape and without the sort of obvious harmonic distortion that would create additional audible frequencies.

3) We’d expect everybody to hear the same thing – at least, we’d expect the people who heard all the way to the top to hear the lower frequencies.

4) Aliasing occurs at half the frequency of the sample rate. The sample rate here is 44.1kHz (the same as for a CD), so we should be good up to 20kHz. You certainly shouldn’t experience any aliasing if your hearing cuts out before 15kHz. Since aliasing is a product of the sample rate, it would happen with lossless files too. Here’s the WAV file of the 20-20k sweep.

You might also try burning this to CD and playing it back on something else. That would tell you if what you’re hearing is a product of the file or your DAC.

Finally, let’s talk about compression in radio. I’m guessing you know this, but there’s two kinds: dynamic compression (where you make the loud bits quieter, normally so you can make everything else louder) and then there’s lossy compression, where you make the files smaller to save space.

These two different kinds of compression work in totally different ways to achieve totally different effects.

Radio stations often apply dynamic compression to everything they broadcast. The idea is to make all the music (and talking) audible, even if you’re in a noisy environment, like a car. If it’s done right, this is helpful to most listeners, at the expense of a slightly claustrophobic feel to the sound for those listening in very quiet places. Done wrong, it can make all the peaks sound crunchy and unnatural, create sudden quiet moments after each loud peak, or make everything sound boring. Radio engineers would do well to remember that most modern popular music has already been compressed and limited to the point where there’s not much mileage left in this technique.

Digital radio stations (and this is particularly noticeable with American satellite radio stations) use very low bit rates to fit many channels (or many listeners) in a small amount of bandwidth. If it sounds bad, it’s because they’ve done too much of it.

There’s a problem with such tests on computers, given the lousy equipment through which most people will hear this. On my laptop (i.e. sound card -> internal speakers), the wave isn’t accurately replicated at all: At 23 seconds it grows a little dim, but then louder again and then it wobbles up and down at around 16k (which is loud and annoying… like the frequency of a TV on mute.) It doesn’t ever reach 20k, which, if it would, might still not be transmitted by all internal computer speakers if they cut out everything above 18k.

When taking the sound through an external DAC, the wave is fine. Well, presumably it is… at least the progression is smooth and after 27seconds I just assume it continues.

As a kid I could hear 20k, when my uncle brought out a test-vinyl. But with with age, the top frequencies get lost and I’m now down to 18, more likely 17k.

The sensible thing to do for any audio test is to use whatever you’re going to use to listen to the music in a real situation. If you do all your listening on your laptop, you’re not going to miss frequencies it can’t reproduce. If you used your laptop to pick a compression codec and then switched to your DAC, you could be quite disappointed.

Conversely, if you went to a super-quiet acoustically-controlled environment like a mastering studio to pick a codec (or a piece of equipment), you might be taking home rather more firepower than was useful. People would do well to remember this when they’re sitting in the demonstration room in a hi-fi shop.

I’m don’t think anyone here has mentioned this.
The two most important points:
At 44.1KHz, a 20kHz sine wave gets only two samples. to a computer this defines a square wave thus, to describe a sine wave with this little information, creates (for arguments sake) the greatest frequency distortion. This effect is relevant way down the spectrum, in regions you CAN hear.
2. Most all analog amplifier designs require extended frequency input to produce lower distortion output.

I can’t hear a 20kHz sine wave or a 20kHz square wave so I can’t check what these sound like.

I will say, though, that although the PCM data can only describe one data point for each of the extremes of the wave and so records no information about the shape of that wave, the DAC won’t reproduce it as a simple square wave, because it does a whole bunch of filtering on the output signal.

Just for kicks, I saved the orignal sound file (at 256kbps) and imported into iTunes. I then re-encoded the sample to various lesser AAC bitrates: 128kbps, 64k, and 32k.

When I run the test at 256k I top out between 17 and 18 seconds. (I play through decent headphones, close my eyes, and hit ‘Pause’ when the sound disappears.) The same holds for 128k. At 64k the sound disappears (for me) around 15 seconds, and at 32k around 8 seconds.

I also re-encoded the 64k and 32k files with Variable Bit Rate enabled, and the high end warbles just like it’s coming out of a steaming water kettle whistle. Interestingly, even though the top end of those samples lack fidelity to the true wave form, I can still hear sound up until 17 seconds again.

Most compression algorithms handle sine waves a bit strangely, since these are the building blocks from which they construct sound, so while this is indeed an interesting exercise, it doesn’t tell us much about the way they’d handle music. A sine wave of constant pitch should be just about the easiest thing to encode because every single vibration is the same. The gradual increase in pitch in this file may create big problems for some algorithms, because every single vibration is slightly different. Since this is a noise that rarely occurs in music, it’s unlikely the algorithms will be optimised to support it, especially at low bit rates.

Still, it’s comforting to note that, at 256kbps, it comes out very well indeed.

I then opened each of the sound files in an editing program (Sound Studio) and noticed two things: 1) the wave forms for the lower bit-rate files actually cut off at the times I described earlier, and 2) the VBR files no longer produce the warbling tone that I hear when playing in iTunes. I’m not sure where that’s coming from.

Also of interest – the wave forms in the ‘pure’ 128k, 64k, and 32k files taper off ‘gracefully’ at their respective top-ends, whereas the cutoff of the VBR files is much more abrupt. I suspect that reflects the encoding algorithm in iTunes.

Subscribe to Blog via Email

Naxos, the world’s leading classical music label is known for recording exciting new repertoire with exceptional talent. The label has one of the largest and fastest growing catalogues of unduplicated repertoire available anywhere with state-of-the-art sound and consumer-friendly prices. The catalogue includes classical music CDs and DVDs as well other genres such as jazz, new age, educational and audiobooks.