Lossy Compression: the Sonic Dangers

Editor's Introduction: In 2013, lossy compression is everywherewithout lossy codecs like MP3, Dolby Digital, DTS, A2DP, AAC, apt-X, and Ogg Vorbis, there would be no Web audio services like Spotify or Pandora, no multichannel soundtracks on DVD, no Bluetooth audio, no DAB and HDradio, no Sirius/XM, and no iTunes, to quote the commercial successes and no Napster, MiniDisc, or DCC, to quote the failures. Despite their potential for damage to the music, the convenience and sometimes drastic reduction in audio file size have made lossy codecs ubiquitous in the 21st century. Stereophile covered the development of lossy compression; following is an article from more than two decades ago warning of the sonic dangers.Editor

Low bit-rate" coding, Peter W. Mitchell
"Low bit-rate" coding is the accepted new jargon for digital bit-rate compression schemes that encode audio with many fewer bits than the 1.4 million bit/second used in the CD (footnote 1). Suddenly these schemes are being discussed and evaluated everywhere. Consumer digital recording systems will use moderate bit rates (384 kilobits/second for the Philips DCC and 300kb/s for the Sony Mini-Disc), reductions of 4:1 and 5:1 from the CD bit rate. Proposed digital audio broadcasting systems will use a bit rate of 192 or 256kb/s for stereo. Other applications may use still lower bit rates, implying more drastic compression schemes and more audible compromises. You may already be listening to some of these. For example, FM stations, which traditionally have used equalized telephone lines to carry the audio signal from the studio to a transmitter several miles away, are changing over to digitally encoded studio-transmitter links.

Listening tests to evaluate low bit-rate coding systems have been underway in Sweden for about a year. In the US the Digital Audio Subcommittee of the Electronics Industries Association has launched its own program to evaluate digital bit-rate compression schemes and digital broadcasting systems. Participants in a seminar on low-bit coding at the October AES convention agreed on the importance of caution, to avoid the error of adopting a system that might later be found to have audible flaws. Bart Locanthi, former chairman of the AES digital standards committee, told of receiving a DAT copy of sample low bit-rate recordings that were used in the Swedish listening tests. He discovered false tones in the sound that had escaped detection by all of the listening panels!

Andrew Miller of the Canadian Broadcasting Corporation discussed the CBC's tests of seven low bit-rate systems and played sample recordings made through each. The variations in sound quality were shockingly large. The systems that used a moderate amount of bit-rate reduction reproduced vocal timbres reasonably well. But systems with very low bit rates caused drastic changes in the timbre of something as simple as a male speaking voice, dulling its clarity and adding severe colorations. A recording of a glockenspiel provided an even greater challenge. The systems with the lowest bit rates (below 100kb/s) produced grossly dull and distorted sound. Even the best of the tested systems altered the character of the sound slightly at high frequencies.

Of the seven designs, the one that produced the least sonic impairment was the Dolby AC-2, originally developed for satellite relays of digital audio. An earlier Dolby AC-1 system, which uses adaptive delta-modulation and operates at about half the bit-rate of the CD, has been in widespread use for several years, notably for national distribution of TV sound or networked FM broadcasts. (Historical note: The ADM circuit in the AC-1 system is a refined adaptation of the encoder used in a consumer time-delay ambience simulation system marketed by A.D.S. a decade ago.) The AC-2 system is said to provide similar sound quality with much lower bit rates, 192 or 256kb/s, the same as the European MUSICAM system.

A description of the AC-2 system outlined two reasons for its lack of obvious sonic flaws. One is that its digital filters were matched to the "masking" curve of the human ear with exceptional care, erring on the conservative side to ensure that the noise and distortion products that result from low bit-rate coding will always be inaudible. (Of course, skilled use of masking is nothing new for Dolby; every Dolby noise-reduction system from A to SR has relied on it.) The second factor is Dolby's handling of the transitions between digital data blocksthe points where the amount of data compression is altered according to the level, frequency content, and masking behavior of the incoming signal. Many digital compression systems exhibit a momentary dropout or burst of distortion at those transition points, a problem that Dolby solved with a rapid crossfade between successive data blocks.

AT&T's T1 long-distance digital transmission standard uses either fiber-optic cable or a special type of coaxial copper cable to provide a bandwidth of 1.5 Megabits/second. With digital data compression, several channels of high-quality audio can be carried on a single T1 lineor a combination of audio and computer data. Lucasfilm uses a T1 line to link accounting computers, a network of computer workstations, and several voice-grade telephone channels from Skywalker Ranch (near San Francisco) to studios in Los Angeles. They have also used Dolby AC-2 encoders to send two channels of wide-range audio over the 400-mile path. When a Dolby SR film soundtrack was encoded, transmitted to LA, and bounced back to Skywalker Ranch for comparison with the source signal, the sound was found to be nearly indistinguishable.

Lucasfilm plans to lease additional T1 lines and send all 24 channels of a soundtrack to LA for remote mixing. Other plans will give the term "remote recording" a new meaning: instead of transporting a truckload of equipment to record a concert in a church, just hang microphones, preamps, and AC-2 encoders to transmit line-level signals back to the studio and record them there while monitoring the sound in a familiar acoustic environment.Peter W. Mitchell

Lossy Images of Audio, Robert Harley
In addition to the Audio Engineering Society's twice-yearly conventions, the organization holds special conferences on particularly timely audio topics. "Images of Audio," the 10th International conference, was held in London this past September. Although the conference's theme was audio for visual images (HDTV in particular), most of the presentations and discussions were about what's happening at the cutting edge of digital audio in general. The 17 technical presentations covered everything from digital audio in video recorders to sample-rate conversion. There was also a series of papers on bit-rate reduction and a panel discussion on these schemes, described in detail below.

The conference was unique in that it included a one-day tutorial on digital audio. Malcolm Hawksford's superb introduction included jitter, oversampling, noise-shaping, filters, and aliasing. In addition to being one of our foremost experts on digital audio, he is an excellent teacher and presenter of technical information.

During his discussion of jitter, Dr. Hawksford mentioned that "we read in magazines about attempts to reduce jitterlike the green pen on a CD." This brought howls of laughter from the audience. Dr. Hawksford quickly admonished the audience with a pointed finger and this response: "Don't be so quick...There may be more to this than you think."

Taking a decidedly contrary attitude was John Watkinson, author of six books on digital audio and video including the excellent The Art of Digital Audio (Focal Press). The thrust of his talk on digital audio data storage was that digital audio storage devices should be thought of exactly as computer data storage devices. He said that digital audio data is identical to data stored on a floppy disc, his American Express card's magnetic stripe, and other forms of data storage. I found Mr. Watkinson's insights into the fundamental principles of digital audio fascinating, but he took the opportunity to attack audiophiles, saying essentially that if the ones and zeros are the same, the sound must be the same: "Somehow I can't conceive of an audiophile [binary] 'one'."

Footnote 1: For more complete discussions of data compression, see "As We See It" in April 1991, "As We See It" in May 1991, "As We See It" in July 1994, and Tom Norton's discussion of the DTS codec in March 1995.John Atkinson

An article on 20+year old lossy codecs, really? Is this the internet wayback machine?

For the good of high end audio, it's time to retire, Robert. Stereophile, you can do WAY better for an article on lossy codecs. The only danger is staying the course with Robert and his discussions of old codecs.

You wanna attract a larger high end market? Write about how LAME 3.99.5 V2 or better is audibly essentially transparent with music compared to lossless. Or write about OPUS 1.1 for low bitrate uses. Or how spending wisely on a system like foobar2000 or JRiver playing FLAC or 3.99.5 V0 with a $30 Behringer DAC with a $60 calibrated microphone and REW, using Blue Jeans cables, will allow more funds for better, really cool speakers, multiple subwoofers and room treatments.

An article on 20+year old lossy codecs, really? Is this the internet wayback machine?

My goal is eventually to have everything that was published in Stereophile available in our free on-line archives. I thought this article from 22 years ago would be of interest.

deftoejam wrote:

Write about how LAME 3.99.5 V2 or better is audibly essentially transparent with music compared to lossless. Or write about OPUS 1.1 for low bitrate uses.

With hard-drive prices at an all-time low and fat "pipes" becoming the domestic norm, why would anyone need their music encoded with a lossy codec at all, if you intend to listen to that music seriously? Despite advances in codec technology, there is yet to be a lossy codec that is transparent to all people at all times on all systems with all types of music.

So why not just use FLAC or Apple Lossless for your music library and forget about the possible sonic compromises of a lossy codec, other than where it makes something possible that would otherwise be impossible, such as listening to a live concert from the UK's BBC 3 via the Internet?

Great article and essential reading for manufacturers and discerning consumers (i.e. audiophiles). Or as someone once said "Those who don't have a grasp on the history of their circumstances are likely to repeat some of its mistakes".

After all, as in the case of LAME 3.99xyz, how many times have we heard that since the beginning of digital? I convert my FLAC tracks to 320k MP3's for playing outdoors and on public transport, but I don't lose the originals. As Cookie Marenco has advised, best to get the earliest master possible, even in digital.

Interesting historical reference. But why is this "essential reading"? Nothing wrong with the contents but in 2013, I think it's fair to say that we've all experienced it (MP3, AAC, Ogg Vorbis...) and there's really no big "danger" here so let's not be too emotional about all of this.

I think many of us feel that 320kbps is indistinguishable qualitywise from lossless FLAC or ALAC but that's different from advocating wholesale conversion of music archives to a lossy format! It does however at least help put our expectations into perspective and when/if I need to go portable with my music, conversion to MP3 isn't hysterically treated as if it were some kind of "big deal". No great danger, no boogeyman, no monster.

This article talks about listening tests using MP3 at 128kbps. I think we're all quite aware of this bitrate as not being enough to ensure CD-equivalent sonic integrity and as far as I'm aware, no commercial service has been selling music at this resolution for years... I suppose it's still used for streaming radio, but again, no audio lover I know of would be mistaken with considering this bitrate as true high fidelity.

Better to keep putting attention on squashed dynamics due to the "loudness wars" than harp on the minimal effects of high bitrate lossy compression these days.

The reason why the attention would be prioritized on lossy files over loudness wars is because the former has permanence based on the reality of the files being archived. Worse I think is that the lossy files (on average) are more likely to be married with loudness adjustments than lossless files. Your experience could vary a lot, and I can't declare a hard-and-fast rule on any of this, but just ask the question around audiophile communities: "If loudness increases bother you, would you be more likely to find those loudness increases in lossy or lossless media?"

I see no difference over the years between loudness of MP3 vs. CD rips. If the original master is loud, it's loud. It's not like record companies release 2 versions - less compressed for CD release, louder for Amazon/iTunes as far as I can tell. It's the vinyl releases which tend to be less loud at least in a large part due to limitations of the technology.

Again, nobody's advocating archiving with a lossy process. MP3 works and serves its purpose with minimal if any perceivable sound degradation to human ears/brains at commonly used (256/320) bitrates in 2013.

In the '80s we knew that cassette was an easily audible downgrade to vinyl albums but many if not most of us were happy to record our albums and play the cassettes, not only in our cars, but often at home just to preserve the condition of the albums.

What we are trying to do is to capture a musical performance perfectly, store it permanetly and to reproduce it perfectly later on.

There have always been constraints on the process like size, cost, technological state of the art. In the digital domain, lossy compression was a response to some of those constraints like storage space and transmission bandwidth.

If you want lots of music in a small digital player or you want to stream music over limited bandwidth media, you are still need lossy compression of the digital source material. But as constraints fall away we can usher in a new era of ever increasing reproduction fidelity in the digital domain. Lossy compression will seem quaint in a few years.

I hope the recorders of the performance will step up and improve the quality of their source files. (Dynamic range, distortion, etc.) We as "audiophiles" can still pursue our pastime of reproducing those source files as accurately/musically as possible.

The lossy codec tests suffer from the same flaw that all audiology has since the 1930's when they started using a vacuum tube sine wave oscillator. The test subjects listen to music predominantly if not exclusively through speakers. This means their psycho-acoustic processing is acclimated to the temporal and spatial distortions of speakers. In fact, it appears from the description that the "Expert listeners" are people who listen to speakers for a living.

Professional acoustic musicians hear differently. In particular, they hear phase and have a much richer perception of acoustic space. The latest research indicates that they have far more inter-connection neurons especially traversing the Corpus Collossum, and that the increased neuro-genesis is driven by focused listening to acoustic sounds in childhood. These are the only "Expert Listeners" to music (as opposed to reproduction) in the industrialized world.

I have been working with conservatory trained musicians who have heard acoustic music at least two hours a day from childhood, and they agree that MP3, AAC and internet streaming CODECs are not merely detectable, but un-listenable. Even at 320K, I fatigue in under a half hour and take several hours to recover.

The information lost in bit reduction is largely the low level discrete multi-bounce echoes that illuminate the space where the recording took place. I take exception to the test tracks on that basis.

Two are processed pop recordings that started as mono close miked, deliberately colored center-electrode "vocal microphones" in a dead studio environment, two are artificial bass instruments with no acoustic reference. Of the transient signals, the fireworks were undoubtably recorded at a large distance which is effectively an acoustic peak limiter by absorbtion of high frequencies and phase shifting the remaining spectrum, and has no acoustic space reference like a rectangular room. Besides, nobody hears fireworks often enough to remember what they sound like. The Glockenspiel and Castanet tracks likely share some of these characteristics.

Modern trumpet typically is played legato with no transients to serve a time makers for echo decoding. This leaves the one jazz track which I don't know and male speech. If the male speech were recorded in near coincident stereo in a reverberant environment it may indicate something, but this unlikely.

Test signals should be pure acoustic recordings with no processing, complex and yet clear like a chamber orchestra with one player per part. Staccatto technique is essential to exercise the codec response to real acoustics, including staccatto acoustic bass instruments. Further, the test signal should be a voice with which the test subject is acclimated by recent experience (less than 24 hours). My favorite test signal is harpsichord because it is the closest acoustic equivalent to a Dirac impulse function and I hear it daily.