The Ghost in the MP3

Tom's Diner <=> moDernisT

"moDernisT" was created by salvaging the sounds lost to mp3 compression from the song "Tom's Diner", famously used as one of the main controls in the listening tests to develop the MP3 encoding algorithm. Here we find the form of the song intact, but the details are just remnants of the original. Similarly, the video contains only material which was left behind during mp4 compression.

(sounds and algorithms by ryan maguire - footage by takahiro suzuki)

The Ghost in the MP3

I. Overview

The MPEG-1 or MPEG-2 Layer III standard, more commonly referred to as MP3, has become a nearly ubiquitous digital audio file format. First published in 1993, this codec implements a lossy compression algorithm based on a perceptual model of human hearing. Listening tests, primarily designed by and for western-european men, and using the music they liked, were used to refine the encoder. These tests determined which sounds were perceptually important and which could be erased or altered, ostensibly without being noticed. What are these lost sounds? Are they sounds which human ears can not hear in their original context due to universal perceptual limitations or are they simply encoding detritus? It is commonly accepted that MP3's create audible artifacts such as pre-echo, but what does the music which this codec deletes sound like? In the work presented here, techniques are considered and developed to recover these lost sounds, the ghosts in the MP3, and reformulate these sounds as art.

II. MP3 Compression

The MP3 standard, designed in the early 1990's by the Moving Pictures Experts Group, has become an interesting object of critique in contemporary technology studies (Sterne, 2006). How a standard which subtly reduces the audible quality of soundfiles has remained in place, despite massively increased bandwidths and storage capacity is impressive, and highlights the foresight (and fortune) of the format's creators.

Regardless, the MP3 is not always the most appropriate format for a given task, and a critical evaluation of the technology and its limitations is warranted. Many listeners today listen exclusively to MP3 files, even in settings where the gains from a higher fidelity format would be clearly perceptible. This lossy compression codec has thus come to dominate unanticipated listening spaces.

For example, white, pink, and brown noise, when compressed to the lowest possible MP3 bit rate, sounds very different from the original random signal.

Example 1. White, Pink, and Brown Noise - Uncompressed

In comparison, low-frequency sine tones sound quite good as an MP3 encoded at 320kbps MP3.

Still, some material has been left behind which, upon examination, is quite interesting.

Example 3. Sine Tone Chords - Uncompressed

Example 4. Sine Tone Chords - 320kbps MP3

Example 5. Sine Tone Chords - 320kbps MP3 "Ghost"

III. Finding the Ghost

Using the Bregman, pyo, and pydub libraries, along with the LAME MP3 encoder, I begin with an uncompressed WAV file and save it as an MP3 file, 128kbps in this example, which does quite well. I chose 128kbps for these examples because that was the "high-quality" bit rate used in the original MP3 development listening tests. In the music I've made (moDernist, etc.) using this process, I've used 320kbps MP3's.

Example 6. Tom's Diner - Verse 1 - Uncompressed

Example 7. Tom's Diner - Verse 1 - 128kbps MP3 Example

I then analyze, compare, and take the difference between both files.

Example 8. Tom's Diner - Verse 1 - 128kbps MP3 "Ghost" Example

Where the two files are the same or similar, the information in the original audio has been largely preserved in the MP3. However, corresponding time-frequency bins which differ significantly between the two files betray spots where information has been altered or deleted. Different extraction techniques are possible, each leading to slightly different output.

Different ways of handling phase estimation also lead to slightly different results.

IV. Artistic Overview and Background

As previously stated, the MP3 codec was refined using listening tests designed by european audio engineers and featuring the music they chose. In a sense, each of these songs acts as a resonant filter for every file encoded in the MP3 format. Tom's Diner by Suzanne Vega, Fast Car by Tracy Chapman, a Haydn Trumpet concerto... these songs carved out the space of sounds that could be successfully encoded as MP3's. To that end, these songs represent a kind of best-case scenario for an MP3 encoding. If anything can be encoded well by this format, it should be these files. And yet these files do leave a residue behind when encoded to MP3. Exploring these sounds helps to define a boundary case for MP3 salvaging.

V. moDernisT

As a preliminary foray into codec ghost composition, I am creating a series of pieces based on the songs used in the original MP3 listening tests. Today, I'd like to briefly discuss my treatment of Tom's Diner. After compressing the original audio to 320kbps MP3's, I begin by analyzing the song structure, interpreting the music and text, and I then attempt to arrange the most interesting recovered material via this framework.

As a case study of the techniques I've used, I'd like to discuss two verses in detail. Verse one finds the narrator in a bustling diner, making observations about her environment. The focus of this text is external to it's author, as opposed to later verses which exist in a more subjective, internal space. Using different settings to harvest the lost material, I was able to isolate both clear, pitched content and more ephemeral transient signals.

Using the python library headspace, and a reverb model of a small diner, I began to construct a virtual 3-d space. Beginning by fragmenting and scrambling the more transient material, I applied head related transfer functions to simulate the background conversation one might hear in a diner. Tracking the amplitude of the original melody in the verse, I applied a loose amplitude envelope to these signals. Thus, a remnant of the original vocal line comes through in its amplitude contour.

Having constructed this background, prominent pitches from the original melody appear and disappear, located variously in this virtual space. These ephemeral sounds hint at a familiar melody, playing with aural memory and imagination, a flickering apparition hovering at the border of consciousness.

Example 12. moDernisT - Verse 1

Verse 5 finds the narrator in a noticeably different psychological state. Instead of buoyantly attending to the activity of the room, she is lost in thought, remembering.

Example 13. Tom's Diner - Verse 5

Accordingly, I have given this material more space. It is less fragmented, the constant background conversation has receded, the virtual space has drawn closer, it feels more internal than external. Key phrases and snippets of the melody emerge more clearly, and then the outro arrives, once again obscuring the familiar melody, but hinting at it's former presence. We hear mostly transients, but internally we might fill in the rest.

Example 14. moDernisT - Verse 5

VI. Future

Moving forward, I am planning a series of related compositions, constructed first from the other songs involved in the listening tests, but then probing the space of MP3 compression in different ways, attempting to highlight even more explicitly the filtering effect of this codec. The songs used in developing the MP3 codec are notable for what they are not: they are not music from other cultures, not hip-hop or dance music, nothing with prominent low frequencies, nothing particularly noisy, no outright aggressive sounds, nothing lo-fi. Rather, these sounds have been broadly institutionally accepted and conform to accepted standards of production and recording technique. As MP3's have invaded more and more contemporary listening spaces, the class of privileged sounds which the format inadvertently creates has become more apparent. Originally developed for suboptimal listening environments, MP3's are now heard everywhere, at home, streaming in stores and public spaces, over high-fidelity car stereo systems. This format has become a curator for these spaces: allowing in a great deal of wonderful sound, yes, but at the exclusion of a vast territory in the available sonic terrain. Composing with these sounds and injecting them back into contemporary listening spaces is one possible act of resistance, one available mode of cultural critique.

VII. Conclusion

In conclusion, composing with MP3 files is an attempt to derive interesting material from the sounds that have been rejected from many of our contemporary listening spaces. MP3 ghosts and artifacts are difficult to predict and provide externally generated material to react to and work with as a composer, while not limiting the freedom of the artist to arrange, alter, and interpret these sounds.

Investigating a particular format for its aesthetic possibilities is inspired by musics built around previous technologies- "tape music", for example. I see "format music" as a contemporary analogue of these practices. Each of you reading this are involved with technology in your own way. I have found it extremely fruitful to question and explore the limitations of those technologies with which I find myself intertwined. I hope you have also found interest in questioning these limitations with me.

Special thanks to:
Tara Rodgers, Aden Evens, Larry Polansky, Michael Casey, Matthew Burtner, and many othersfor their help in conceiving and realizing this project