That Little Bit of Knowledge That Makes Everything Work

I think everyone knows that lossy audio compression formats in the likes of MP3 and AAC sacrifice audio quality for a smaller file size. Some people like myself can distinguish between lossless and lossy copies of the same song just by ear, but most cannot, or simply do not know what to listen for.

You can easily take a diff of 2 text files and see the differences, so why can’t you do that with audio?

For my experiment, I began with Jillette Johnson’s song “Torpedo” stored as lossless 16bit/44.1kHz CD audio in the FLAC format. For copyright reasons, I won’t provide a download link to the file, but the Internet is a thing, so you can find it if you want.
I began by transcoding this original into several different formats with varying parameters:

320kbps CBR MP3 – 10.9MB

256kbps CBR MP3 – 9.1MB

256kbps VBR MP3 – 8.4MB

128kbps CBR MP3 – 5.6MB

256kbps CBR AAC – 5.4MB

500kbps VBR OGG – 15.5MB

256kbps VBR OGG – 9.6MB

CBR stands for constant bitrate, and VBR is variable bitrate.

I took one channel from each converted file and the original FLAC file and aligned them to the sample. I then took the difference of the 2 waveforms. This resulting difference is the error induced by the compression algorithm. I saved 3 data points from each comparison: the waveform as an image, the frequency spectrum versus amplitude plot, and a rendered audio file of the error. The waveforms plot the relative amplitude at each moment in time from 0 to 1. The frequency spectrums show the average amplitude in decibels (dB) of each frequency from 0Hz to 22,050Hz (the maximum of CD audio) over the entire signal. 0dB is the highest amplitude that can be stored in the audio file. All amplitudes are taken relative to this maximum and are expressed in negative decibels, so the more negative the amplitude, the quieter is is. I have chosen to render the error signals as uncompressed WAV files in order to accurately represent them in their entirety and still allow them to be played in a web browser (most don’t have FLAC support yet). I have not amplified any of the error signals to keep everything as accurate as possible. If you intend to listen along, try seeking the track to the times listed along the top of the waveform images to hear how different parts of the song sound, like the chorus at 1:40. Also note that I’m not really interested in the exact error versus filesize figures, so feel free to figure them out on your own.

Just a disclaimer; I am not an audio engineer and these tests are in no way scientifically accurate. This is just a home-grown experiment.

Let’s take a look at the 320kbps CBR MP3 first. You can click any image to enlarge it in another tab.

As you can see (and hear), the error is clearly there. If you know the song, you can probably even make out lyrics. The first thing to note is that the highest amplitude error is in the high frequencies, specifically above 16kHz. This is not surprising, because this is how MP3 was designed. MP3 takes in to account how we perceive sound. Frequencies in the range of the human voice for example are most important, and high frequency content which is for some people not even audible at all, is least important. The highest error amplitude peaks at -53dB with the average around -61dB.

In order to make some comparisons, let’s take a look at 256kbps CBR MP3.

Notice that the highest peak is at -52dB, not much higher than the 320kbps CBR file, however the other frequencies are significantly higher, by about 3 to 4 dB. Remember that dB is a logarithmic scale, so an increase by 3dB is roughly 10 times louder.

I have heard that VBR is “better” than CBR, so I decided to test that as well.

Going from 256kbps CBR to VBR, the spectrum is very similar with the exception of the high frequencies. The peak that was very prominent at 320kbps CBR, and slightly less at 256kbps CBR, is now actually lower in amplitude than the lower frequencies. At a glance it looks like the error has actually decreased, but look at the scale. The overall error has actually increased by 1 to 2dB. Now given that the file size is 700kB lower, it’s debatable whether VBR is “better”.

Just to round out this part of the experiment, I tested 128kpbs CBR MP3 as well.

The average error is a whopping 10dB higher than the 256kbps CBR, and follows an almost linear drop as the frequency increases. This is consistent with the gradual drop off of average amplitude in the original file, so statistically that is expected. At this point the song is becoming scarily audible; making out lyrics is not difficult at all. Just remember that what you hear in the error signals is what is missing from the transcoded file.

Who says that MP3 is the only format used for lossy compression? I thought I’d start off with the lesser known OGG Vorbis audio codec. The following is the result of transcoding at 256kbps VBR.

This is quite different than what we saw with the MP3 spectrum, and it sounds very different too. The OGG codec has higher error at low frequencies, which is easier to hear than see, however the overall average error is less than that of MP3 over the rest of the spectrum, by about 2dB. Being 1.2MB larger in size than the 256kbps VBR MP3, this should be expected.

Unlike MP3, OGG is not limited to 320kbps. I ran the test at a bitrate of 500kbps VBR.

The filesize is 5.9MB larger than the 256kbps VBR OGG file, but the error is a massive 15dB lower on average. Surprisingly, the error at the low end is still worse than that of 320kbps CBR MP3. Also note that just because the spectrum plot bottoms out around 17kHz does not mean that there is no error there, it is just at -82dB, which is near the edge of the zoomed in area.

The last file type I compared, is 256kbps CBR AAC, the same format used by Apple for most iTunes downloads.

I was kind of horrified at how poor the output was. The error is only about 2dB less than the 128kbps CBR MP3 file. The only saving grace for AAC is that the resulting filesize was 5.4MB, 200kB less than the 128kbps CBR MP3. The error also sounds very different than the other formats.

A common scenario is the conversion of one lossy format to another. In my library, all files are either FLAC or MP3, so when purchasing music off iTunes for example, it would make sense for me to transcode the files to 320kbps CBR MP3.

The error doesn’t seem that bad, however, remember that this is the error from the second transcode. This error compounds on the error from the original transcode. The following is a diff taken from the original FLAC to the new MP3.

The total error is almost no different from the original AAC. This is not surprising, as the relative amplitude of the new error in the AAC to MP3 conversion is several orders of magnitude lower than the original error in the AAC file.

In a future post I’ll explain more about errors in audio reproduction more mathematically, including quantization error and bit depth, sample rate, and noise floors. Until then, I hope this was interesting and provided a little more insight into audio compression quality.

8 Comments on “Experimental Differences in Audio Compression Formats”

I’m really glad you wrote this. I’m subbed to both of your channels and really like your content and knowledge. Can you please tell me how you went about diff’ing the audio files and creating the “error difference audio”? It looks like Audacity but I’d love to know exactly how you did it. Also you say you’ll talk about bit-depth, quanitzation, noise floors, etc. when will you do this? I really would like to see more audio talk from you, either a podcast or on a video explaining digital audio in-depth and the common mis-conception that lossy is “transparent” to lossless.

This is how I made the diffs.
Import both tracks into a project. Zoom in until you see samples. Error will manifest as changes in sample amplitude, not the waveform shape in general. It should be pretty obvious if you are looking at a sharp peak in amplitude whether the waves are in sync. You can drag them to fix any offsets imposed by a crappy transcoder. Make sure that you are comparing left to left or right to right channels on a stereo track. Then select one track and run Effects->Invert. Then run Tracks->Mix and Render. You should be left with the difference of the 2 waveforms. You basically have to do a + (-b) since Audacity can’t do subtraction.

watched one of you videos last night about digital audio, and liked the way you explained it. I agree to most of your conclusions and laughed about the word “audiofools”.

Let´s talk about lossy compression. Compression-algorhithms like MP3 or OGG use the effect, that some signals mask other signals. So you are not able to hear the masked signal. And if the encoder finds such a signal that can not be heard, it deletes it and so saves up information.

To make this work, the listener has to fit in the psycho-acoustic model of the algorhithm. And the masking signal has to be reproduced properly by the playback equipment.

There was a test by a famous german computer-magazine “c´t” around the year 2000, where they did a blind test with compressed and uncompressed audio. In short they found, that “normal listeners” with normal working ears could not distinguish MP3-coded files with a bitrate higher than 192KBits/s from uncompressed files. Even professional audio-engineers and of course so called golden ears could not distinguish the files. Funny: most found the 128KBit/s-Version “the best”.

The point is: there was a person who could distinguish MP3-compressed files from the uncompressed ones. Later on came out, this person had a hearing damage, and therefore the psycho-acoustic model didn´t work for him. And this can be explained: if you are not able to hear the masking signal properly, you then of course can hear the gap and artifacts of the deleted masked signal, because it is not masked any more.

Same is true für “bad playback equipment” that can not reproduce the masking signal properly.

So: the better the audio equipment, and the better the hearing capabilities, the harder it is to detect the compressed audio, because the psycho-acoustic model then works very good.

This leads me back to your video. You said, you could detect MP3-compressed files in a blind test for several times, even at high bitrates. Earlier in you video you said, you can hear up to 15 KHz with one ear, up to 14 KHz with the other ear. I don´t now, how old you are exactly, but I guess you are a couple of years younger than me (I am 47 years old), and I still can hear up to something between 15 and 16KHz what is neither good or bad. So I wonder about the 14 and 15 KHz you can hear in your age. As you already suspect I guess you have some kind of hearing damage that “helps” you to detect the compression-algorhithm.

This does not change your recommendation to use lossless compression, but it then comes from a different perspective.

Hi there.
I saw your amazing video on the ground up explanation on digital audio.
Being an engineer myself, it struck a chord.
I loved it.
I am a digital audiophile myself. I have my Genelec 8250A speakers that I love.

I wanted to know if I do downgrade all my Hi-Res files from 24/96 or 24/192 or 24/88.2 or 24/48 to 16/44.1 FLAC will there be any loss in perceivable quality ? Or will there be mathematical errors induced ?
I plan to do this to all my files to save a huge amount of space.

I’m from the town where they make the Genelec speakers and I’ve had the joy of listening to different kinds of high end audio setups when visiting Genelec and friends who work there. It always blew me away.

However since the equipment is not exactly cheap it took me a long time to invest in it myself. After getting my first set of decent speakers and DAC I stumbled across your youtube video about audiophiles and audiofools and ended up listening to your diffs.

Fucking hell, this an awesome way of describing the difference between formats. For me the last 1½ hours was an awesome intro into the fundamentals of digital audio and why I should keep investing into high quality audio files (in retrospect I’m glad I’ve done that for quite a while now regardless of my audio setup being shit).

The AAC cuts peaks more, on the other hand mp3 has that rumbling-bubbling errors what I hate when they happen. Not all the time, but in some songs they appear just terrible. For listening music on windows I use WASAPI exclusive, for movies bit-stream to receiver and decode on that (windows sound mixing isn’t greatest…), for youtube it doesn’t matter, but for flac music and proper movie soundtracks the difference is enough to do that. Mostly agree with you about digital audio quality, not 100%, on the paper it is working, in real life not so much. The reason to go higher bit rate is not purely digital. When you worked on these files, you done that purely in digital environment, converted them, and than compared files. I think the proper way to compare them will be to output them to analog signal, and than import back as digital, and only than compare them. As analog circuitry is making some errors too, so outputting higher quality signal gets you closer to desired output, not by much, but there is some very slight differences. Of course going over 24bit/96khz is overkill, but from my experience there can be heard differences of going less (not in every song, just some very specific parts of specific songs granted you know what to look for). For everyday use CD quality is enough. P.S.: Most people don’t care if they hear the details in music.

I don’t really like Vorbis, and you very well quantify what I already kinda suspected, but I do really like Opus, which is basically what you should use in an ogg container instead of Vorbis any time nowadays. Could you try with Opus too?

I must admit I myself can hear a diff between lossless/y and lossy bitrates, but I really then have to concentrate on the sound for that.

I mean, on an average listening, with moderate ambiant sound, I can’t feel the difference, even for ugly 128kbps.

But isolated from ambiant noise, with good material headphones and sound card, yes, that can be hearded and felt, especially in high frequencies . Sound gets poorer as bitrate decreases, and the word is just what it means : you get something poor, not enjoyable, your hearing system is able to be fed with much more frequencies than that.

I would conclude it all depends on your needs. If I listen to music with no concentration, just to relax while walking or travelling, a 192Kbps is pretty nice for me, or even just a 128kbps if space requires it.
But if I isolate myself, push sound with good/average hardware, and want to feed myself from the sound getting to my ears, I would use FLAC obviously ^^