Share this story

In an age when Apple has become the top music retailer without selling a single physical disc, audio engineers are increasingly creating specially mastered versions of songs and albums designed to counteract the audio degradation caused by compression. Though audiophiles typically scoff at paying for compressed audio, preferring vinyl or high-end digital formats such as DVD-A, mastering engineers are doing their best to create digital masters that can pass through Apple's iTunes algorithms with minimal sonic corruption.

To highlight work done to improve the sound of compressed music files, Apple recently launched a "Mastered for iTunes" section on the iTunes Store. It now also provides a set of recommendations for engineers to follow when preparing master files for submission to the iTunes Store. To qualify for the "Mastered for iTunes" label, Apple says that files should be submitted in the highest resolution format possible, and remastered content should sound significantly better than the original.

How does this work? Ars spoke with Masterdisk Chief Engineer Andy VanDette, who recently completed a project remastering the bulk of Rush's back catalogue. As part of the process, VanDette created special versions of each song specifically for uploading to the iTunes Store. He described the often lengthy, trial-and-error process of trying to make iTunes tracks sound as close as possible to polished CD remasters.

The state of compressed audio

All music purchased from iTunes is compressed using a "lossy" compression algorithm called Advanced Audio Coding (AAC). Lossy compression algorithms toss out some of the information contained in a digital file in exchange for very small file sizes. Formats like AAC (and MP3) try to be intelligent about what information is tossed out in order to maintain fidelity with the original, uncompressed file. They do so by eliminating frequencies and harmonics least likely to be discerned by the average listener.

(The JPEG image format attempts to do the same thing with photos, eliminating details and colors that aren't likely to be noticed by the average viewer. This is why JPEGs can sometimes look blocky if saved at a high compression rate.)

A number of music industry luminaries, including Jimmy Iovine (head of Interscope-Geffen-A&M), Dr. Dre, and most recently Neil Young, have bemoaned the fact most music now plays back from a compressed file, resulting in a "degradation" of the sound an artist originally tried to create.

"We live in the digital age, and unfortunately it's degrading our music, not improving it," Young said in January during the D: Dive Into Media conference.

Young and his cohorts are attempting to make uncompressed, higher-end audio formats a common standard across the industry. Music throughout the last decade is typically recorded using 24-bit samples at 96kHz, and advances in computing power and hard disk space have recently made even higher quality, 24-bit 192kHz digital recording possible.

However, even the standard CD format comes in a much lower resolution—just 16-bit 44.1kHz. Compared to 24-bit 192kHz digital audio, a finished CD only has roughly 15 percent of the information captured during the recording process. Compressing the songs on a CD further into 256kbps AAC "iTunes Plus" format cuts the data down to just one-fifth of the size of CD audio, or as little as three percent of the original 192kHz recordings.

"We're working with [Apple] and other digital services—download services—to change to 24-bit," Iovine said. Young also admitted to working with Apple to make 24-bit audio standard across its mobile devices, though he suggested that no progress has happened since Steve Jobs—known for his love of classic rock—died last October.

As an audio engineer, VanDette is "hopeful" hardware and storage capabilities will one day make uncompressed, 24-bit audio a practical standard. For instance, digital music service HDtracks already offers a catalogue of 24-bit audio files at various sampling rates up to 192kHz. But such audiophile quality is only beneficial to those with expensive stereo equipment capable of reproducing the subtle nuances captured in these higher-quality files.

"I am encouraged to see a growing catalog at HDtracks, but being able to have your entire album collection in your pocket is cool, too," VanDette told Ars. As long as iPhones and iPods are the most common playback equipment, and the iTunes Store the top source for music, compressed audio files are, practically speaking, here to stay for the foreseeable future.

If you can't beat 'em, join 'em

Want an uphill battle? Try pushing the bulk of consumers to embrace niche audiophile formats and upgrade to capable equipment. Instead, audio engineers have taken to mastering versions of songs and albums specifically for the iTunes Store.

A similar mastering process is already done to prepare albums for other physical formats. As previously noted, recording is typically done in a digital 24-bit 96kHz format. However, audio released in CD format is 16-bit 44.1kHz quality, requiring a conversion from the original source. Engineers adjust equalization, levels, compression, noise filters, and other parameters to cram as much of the source material into those limits.

(Returning to our earlier photo analogy, the process is similar to converting a 14-bit RAW file from a DSLR into a standard 8-bit TIFF.)

Recording can also be done at varying bit-depths and sampling rates. Sometimes it's still done using vintage analog gear (see recent Grammy winners, The Foo Fighters). Albums are still released on analog vinyl format, and in some cases are made available in high-end digital formats such as Super Audio CD (SACD) or DVD-Audio (DVD-A). A mastering engineer will take whatever source material is provided—analog or digital—and optimize it for each release format, taking into account each format's unique strengths and limits.

VanDette explained how mastering varies depending on the age of the original recordings as well as the final output format. Many master recordings for Rush albums are from vinyl's heyday, he said. "Back then we would try and hide as much top end as possible, knowing that the end users' styli would be crap."

"Most listeners today swear they love the bottom end on vinyl, but I remember in the heyday of vinyl, it was all about top end," VanDette told Ars. "'If we could only have a clear top end without all those pops and clicks' we thought," he said, noting the tendency of low-end record players to introduce unwanted noise. "Back then, bottom was the enemy. It made the grooves [in the vinyl] too wide, and forced us to turn down the overall level of the disc."

The constraints of vinyl aren't a concern when mastering for a CD, so it's possible to boost overall levels as well as low frequencies without ruining the rest of the mix. "While remastering the classic Rush albums, I added as much LF as I could, always aware not to cloud the classic 'ping' on Neil's snare, muddle Geddy's voice, or bury Alex's guitar," he said.

"These are some finely balanced mixes, even 35 years later," VanDette said. "I wanted to make sure the listener still heard the classic album come through, without it being too loud, boomy, or modern sounding."

Digital/data compression of audio files has absolutely nothing to do with the analog compression applied to recordings at the mastering stage. (Example: "24-bit" versions of the last three RHCP records will still sound like shit, because the dynamic range is flat as a pancake.)

Indeed, most of these industry experts should stop looking for another way to sell us the same crap for the seventh time, and re-issue--free of charge--correct-sounding versions of what we already paid for.

I'd take a low-bit-rate, high-dynamic-range track over a high-bit-rate, squashed track any day of the week, and twice on Sunday. I have plenty of 128-bit AACs (even MP3s) that sound orders of magnitude better than losslessly-compressed records.

The take home here is that studios are starting to pump up the bass on iTunes track so that they sound "better". Great.

And people wonder why LPs sound so great in spite of hiss and pops back when they didn't mess up the sound. On the upside, it sounds like they're getting ready to sell everyone the original, unprocessed tracks back for more money. Progress!

Nice read. Granted, given to how I listen to music, I never heard all the "good stuff," even on CD. I'm glad this is being done for audiophiles, people with better equipment, and everyone else with better ears than I got ^_^

So clearly this is great news for people like me who buy iTunes songs. In the same way that I hope 1080p video will replace 720p, I bought a song to have a listen and see how these new tracks differ.

Unfortunately, I'm a little bit perplexed - I have two copies of "Between the cheats" by Amy Winehouse. One is matched with iTunes Match and is 8.3mb. The other is the newly purchased "mastered" copy and is 7.4mb. Can someone explain to a newb why this is? They also have identical sampling rates which I wasn't expecting, perhaps someone could help me out? :)

"However, even the standard CD format comes in a much lower resolution—just 16-bit 44.1kHz. Compared to 24-bit 192kHz digital audio, a finished CD only has roughly 15 percent of the information captured during the recording process."

That's pretty misleading. If we could hear higher than 20KHz or needed 144dB of dynamic range, this would be a useful measurement. But CD captures 100% of information under 20KHz, and has 96dB of dynamic range, which is far, far more than almost any audio recording ever released (except perhaps those demo discs of cannons being shot off). 24-bit and 192KHz is totally overkill for music listening by humans.

Having lots of overhead is useful in the recording process, since doing processing on the files often yields a better result at 24 bits. But as an end-user delivery format, it's like calling a letterboxed 4x3 image better than a native 16x9, because the 4x3 has all that extra empty picture in it.

Still waiting for some blind tests where anyone, anywhere can tell the difference between uncompressed and 256 kbit MP3/AAC. Been waiting 10+ years.

Nobody can tell the difference when driving a car with 2% more horsepower. It's still faster.

Point being, bits are cheap, and all things being equal, why not remove any question of subtle but hard to quantify perceptual differences? It's true that it's not demonstrated in blind tests, but 1) tests are by definition atypical listening situations, and 2) it's very hard to rule out transient passages in any music, for any listener, being discernible.

Unfortunately, I'm a little bit perplexed - I have two copies of "Between the cheats" by Amy Winehouse. One is matched with iTunes Match and is 8.3mb. The other is the newly purchased "mastered" copy and is 7.4mb. Can someone explain to a newb why this is? They also have identical sampling rates which I wasn't expecting, perhaps someone could help me out?

Assuming they're VBR files (and not just the same track encoded at different bitrates), often remastered copies will compress better since they tend to have the dynamics crushed out. Less dynamics == less data to store == smaller files.

FWIW, its pretty rare that I prefer newer masters to originals. The trend these days seem to be to blow the loudness out of proportion by crush dynamics. Great for listening on an iPod with crappy headphones in a noisy room, but terrible otherwise.

Indeed. If they'd simply offer ALAC, or some other lossless format, there'd be no need to put so much effort into dealing with AAC's "quite quirky" psychoachoustic model.

Unless I'm quite confused, the psychoacoustic model is peculiar to the encoder and not the compression standard. So if Quicktime's AAC output is strange, why doesn't Apple fix that instead of doing this awkward charade.

"Most listeners today swear they love the bottom end on vinyl, but I remember in the heyday of vinyl, it was all about top end," VanDette told Ars. "'If we could only have a clear top end without all those pops and clicks' we thought," he said, noting the tendency of low-end record players to introduce unwanted noise. "Back then, bottom was the enemy. It made the grooves [in the vinyl] too wide, and forced us to turn down the overall level of the disc."

The constraints of vinyl aren't a concern when mastering for a CD, so it's possible to boost overall levels as well as low frequencies without ruining the rest of the mix. "While remastering the classic Rush albums, I added as much LF as I could, always aware not to cloud the classic 'ping' on Neil's snare, muddle Geddy's voice, or bury Alex's guitar," he said.

Reminder that the "constraints of vinyl" are about 12 bits dynamic range (give or take). Yet the albums still sound great compared to most 16 and 24 bit stuff these days. The article implies that the solution is to add ever more bits to digital, when in reality we already have enough. We just need good mastering back in the day, not this loudness, bass boosting, made for iTunes crap. Fact is you could sell people 64 bit audio and it'd still suck if they keep mastering it as loud as possible.

Lossless compression formats have existed for over a decade. There is absolutely zero excuse for this. Just switch to a lossless format - file size will be bigger, but all the data will still be there. People have broadband now in developed countries. Get your shit together and act like it.

Edit: This is why I still pirate most of my music. Better quality, and I don't have to wait for these idiots to catch up with perfectly good technology that is more than adequate for the job.

Unless I'm quite confused, the psychoacoustic model is peculiar to the encoder and not the compression standard. So if Quicktime's AAC output is strange, why doesn't Apple fix that instead of doing this awkward charade.

Codec design limits the sort of psychoachoustic models one can implement. AAC is more flexible than MP3. And, someday there's likely to be something even better than AAC. So instead of revising tracks "three, four, even five times until I got something that compared well with the CD", the smart thing would be to give customers the same data as the CD. Then, when the "Super Advanced Audio Coding" arrives, Apple can make it a transcoding option and people won't need to re-buy new and improved versions of all their songs.

The whole article should be taken down because it contains inane ramblings with no basis in reality. Subjective claims that indirectly comment on state of digital audio should not be published as a featured article on ArsTechnica.

Neat article. I didn't know there was "mastering for iTunes" going on, but it makes sense these days.

I'd argue that, for the vast majority of consumers, readily available music reproduction is of a much higher quality today than anything that was available prior to the advent of the Compact Disc. So I don't really feel there's much for Neil Young to "bemoan". It's only the relatively small number of people with equipment "capable of reproducing the subtle nuances" that are in any way worse off than they used to be.

If the industry is interested in getting everyone to switch to 24bit consumer delivery (or whatever constitutes "non/less-degraded music"), then cheap, readily-available equipment needs to be able to reproduce those subtle nuances. Otherwise, there just aren't enough people who are going to care. Since Apple is far and away the most popular producer of music playback equipment these days, they could make this happen by improving the audio playback of their devices across-the-board.

I think the availability of cheap, large, decent-quality high definition television sets the last few years has certainly forced more producers to raise the bar on the video they deliver to consumers. I don't see why this wouldn't happen to audio given similar circumstances.

"However, even the standard CD format comes in a much lower resolution—just 16-bit 44.1kHz. Compared to 24-bit 192kHz digital audio, a finished CD only has roughly 15 percent of the information captured during the recording process."

That's pretty misleading. If we could hear higher than 20KHz or needed 144dB of dynamic range, this would be a useful measurement. But CD captures 100% of information under 20KHz, and has 96dB of dynamic range, which is far, far more than almost any audio recording ever released (except perhaps those demo discs of cannons being shot off). 24-bit and 192KHz is totally overkill for music listening by humans.

Having lots of overhead is useful in the recording process, since doing processing on the files often yields a better result at 24 bits. But as an end-user delivery format, it's like calling a letterboxed 4x3 image better than a native 16x9, because the 4x3 has all that extra empty picture in it.

This is a common misconception. You confuse the format's sample rate and bit rate with its ability to capture a certain range of frequencies and dynamic range. The overhead you speak of as being beneficial during production is still worthwhile to retain in the finished product. Otherwise, any 128kbps MP3 would be as good as a studio master: the frequency response for example s certainly there in the MP3.

Resolution is about "shades of gray", not a checklist of commonly offered colors that ignores what's in between. The JPEG analogies in the article have real merit.

And, someday there's likely to be something even better than AAC. So instead of revising tracks "three, four, even five times until I got something that compared well with the CD", the smart thing would be to give customers the same data as the CD. Then, when the "Super Advanced Audio Coding" arrives, Apple can make it a transcoding option and people won't need to re-buy new and improved versions of all their songs.

I think that's basically what Apple is saying to producers in their guidelines: "Apple suggests submitting high-resolution audio files will become more important down the road."

The whole article should be taken down because it contains inane ramblings with no basis in reality. Subjective claims that indirectly comment on state of digital audio should not be published as a featured article on ArsTechnica.

Everybody involved in recording music checks their mixes and masters on a variety of systems. Audio engineers and musicians use iPods and listen to music on laptop speakers as well, so they understand the need for a mix that translates well to all mediums. Mastering specifically for the iTunes application will definitely be a running joke in the pro audio world thanks to this pointless little marketing fiasco from Apple.

Unless I'm quite confused, the psychoacoustic model is peculiar to the encoder and not the compression standard. So if Quicktime's AAC output is strange, why doesn't Apple fix that instead of doing this awkward charade.

I'm assuming this is some of the misleading language in the article. My interpretation of this is that the output from psychoacoustic models currently in use is hard to predict in general, and not that AAC or Apple's implementation are any "stranger" than the other lossy standards currently available.

The overhead you speak of as being beneficial during production is still worthwhile to retain in the finished product.

Why? Does it provide any benefit for playback purposes? For encoding purposes?

Comparing the lower samplerate with the loss from information that occurs from compression is an apples to oranges comparison. There are benefits to having a lossless source (for format-shifting without quality degradation in the future for example) compared to having only a lossy one, even if the lossy one is transparent to the lossless.

Man, when the real audio guys see this article, they're going to rip you to shreds.

Quote:

(Returning to our earlier photo analogy, the process is similar to converting a 14-bit RAW file from a DSLR into a standard 8-bit TIFF.)

This is WRONG. Utterly, absolutely WRONG. It's magical thinking. It has nothing to do with the physical reality of digital audio. It's voodoo, and you're passing it off as science. You should drop this entire article, or else get y'all some learnin' from the audio guys and post a correction and update, because it is laughably wrong. On an intuitive basis, we think that's how sound SHOULD work -- I used to make arguments much like that. But I was wrong, and so are you.

A CD will reproduce any signal under 20Khz, and with 96db or less of dynamic range, perfectly. If your source signal is within those constraints, you absolutely cannot gain anything by going to higher bitrates or bit depths.

The reason they work in 24 bits is because, when mixing tracks, you get digital errors. In a 16-bit environment, that noise accumulates, and can become audible if you massage a track enough times. By working in 24 bits, you keep the errors way down below the audible range, and then they just disappear when you remaster to 16 bits.

Once it's actually mastered to disk, and it's not being massaged anymore, a 16-bit, 44Khz track is exactly identical to a 24-bit, 192Khz track, at least under 20Khz. The high-bitrate track will allow much higher treble frequencies, but there's little sign that humans can hear that high. And the extra bits? Those will avail you precisely nothing. An improved signal to noise ratio is meaningless when all the music is jammed into the upper 5% of the volume range.

All we really need is just a good mix into 16 bits. That would fix the problem forever. But then they couldn't keep selling you the same music over and over and over.

This is a common misconception. You confuse the format's sample rate and bit rate with its ability to capture a certain range of frequencies and dynamic range. The overhead you speak of as being beneficial during production is still worthwhile to retain in the finished product. Otherwise, any 128kbps MP3 would be as good as a studio master: the frequency response for example s certainly there in the MP3.

No, the bit rate in fact represents the ability to capture dynamic range and the sample rate represents the ability to capture frequency range, according to Nyquist. These things are directly and mathematically linked.

If you're capturing 16 bits at 44.1kHz, then by definition you're getting a bitstream of about 1.5Mbps. You can losslessly compress this by maybe half at best. To get down to 128Kbps, you're going to have use lossy compression, and sound quality will suffer noticeably.

Quote:

Resolution is about "shades of gray", not a checklist of commonly offered colors that ignores what's in between. The JPEG analogies in the article have real merit.

No, it isn't, at least when implying that a 16/44 file contains 15 percent of the *useful* information that a 24/192 file has. It's 15 percent as big, sure, but there's not much useful, audible data you can add with those kinds of rates. As another dubious analogy, it's like taking a 15-page document, adding 85 empty pages, and then saying the original document is only 15 percent as good. (I'll grant that upping the sample rate to 48 or even 88.2 might eliminate some extreme high-frequency goofiness caused by filtering, but 192 is like swatting a fly with a sledgehammer.)

A 500 megapixel badly-framed photo with bad lighting is still a worthless photo, no matter how worthy the subject is or how nicely it's displayed. 24 (or 32 or 64) bits of bass-boosted, overcompressed audio is still shit no matter how good the song or speakers are.

In general, my opinion has always been that content producers should always produce for ideal circumstances, and make available (at a minimum) that ideal version. Broadcasters should work to pass this along with as little modification as possible. The listeners are free to choose equipment, enhancements, and modifications to suit their intended usage scenario. In the end, it's much easier for consumers if they can simply look for "the best" equipment without having to take into account the possibility that a recording is non-standard.

This goes also for e-books, websites, video, and anything else really. You send me a standardized, no-assumptions-made format designed for one or a few ideal situations. I'll be the one to decide whether I want custom fonts, complete or partial CSS3 support, brighter colours, or a particular aspect ratio.

That study showed that with experience to crap, people will learn to prefer crap because it's all they know. That's not the same as saying they implicitly, and without any other influences, would continue to do so.

You can carry this to its logical extension and see how the students would eventually never want to hear live musicians perform because it wouldn't sound "as good" as their iPods. If that makes sense to anyone, than so does the notion that preferring a lower-quality recording is the worthy goal, instead of one that more faithfully recreates the original performance.

The question remains as it always has: strive for better, or be happy with junk and pull everyone else down as well, who might not appreciate that?

And that 8bit TIFF is exactly the same as that 14bit RAW for those colors that fall within the same color space. How is the comparison "utterly, absolutely wrong magical voodoo thinking"?

Eyes do not work the same way ears do, and don't have the same limitations. This is again an apples to oranges comparison. The equivalent of frequencies outside the hearing range for images would be the infra-red and ultra-violet wavelengths.

Nice read. Granted, given to how I listen to music, I never heard all the "good stuff," even on CD. I'm glad this is being done for audiophiles, people with better equipment, and everyone else with better ears than I got ^_^

Having a "Mastered for [lossy] iTunes" section is not doing audiophiles any favors. In fact, it is a bit insulting to their intelligence to promote these as tracks that let them "Experience music as the artist and sound engineer intended."

Unfortunately, I'm a little bit perplexed - I have two copies of "Between the cheats" by Amy Winehouse. One is matched with iTunes Match and is 8.3mb. The other is the newly purchased "mastered" copy and is 7.4mb. Can someone explain to a newb why this is? They also have identical sampling rates which I wasn't expecting, perhaps someone could help me out? :)

Assuming they're VBR files (and not just the same track encoded at different bitrates), often remastered copies will compress better since they tend to have the dynamics crushed out. Less dynamics == less data to store == smaller files.

FWIW, its pretty rare that I prefer newer masters to originals. The trend these days seem to be to blow the loudness out of proportion by crush dynamics. Great for listening on an iPod with crappy headphones in a noisy room, but terrible otherwise.

This is a common misconception. You confuse the format's sample rate and bit rate with its ability to capture a certain range of frequencies and dynamic range. The overhead you speak of as being beneficial during production is still worthwhile to retain in the finished product. Otherwise, any 128kbps MP3 would be as good as a studio master: the frequency response for example s certainly there in the MP3.

No, the bit rate in fact represents the ability to capture dynamic range and the sample rate represents the ability to capture frequency range, according to Nyquist. These things are directly and mathematically linked.

If you're capturing 16 bits at 44.1kHz, then by definition you're getting a bitstream of about 1.5Mbps. You can losslessly compress this by maybe half at best. To get down to 128Kbps, you're going to have use lossy compression, and sound quality will suffer noticeably.

Quote:

Resolution is about "shades of gray", not a checklist of commonly offered colors that ignores what's in between. The JPEG analogies in the article have real merit.

No, it isn't, at least when implying that a 16/44 file contains 15 percent of the *useful* information that a 24/192 file has. It's 15 percent as big, sure, but there's not much useful, audible data you can add with those kinds of rates. As another dubious analogy, it's like taking a 15-page document, adding 85 empty pages, and then saying the original document is only 15 percent as good. (I'll grant that upping the sample rate to 48 or even 88.2 might eliminate some extreme high-frequency goofiness caused by filtering, but 192 is like swatting a fly with a sledgehammer.)

Nyquist only covers one easily-digestible aspect of what a given sample rate accomplishes. Take a look at the number of samples taken to define one digital word. More samples is a more clearly defined facsimile of the natural waveform.

You remember the stairstep graphs, right? The ones with lots of tiny stair steps more closely approximating a curve? That's more samples taken, and is unrelated to frequency response.

And that 8bit TIFF is exactly the same as that 14bit RAW for those colors that fall within the same color space. How is the comparison "utterly, absolutely wrong magical voodoo thinking"?

Eyes do not work the same way ears do, and don't have the same limitations. This is again an apples to oranges comparison. The equivalent of frequencies outside the hearing range for images would be the infra-red and ultra-violet wavelengths.

Not taking into account how we perceive the difference of the resulting output, what happens to files at the binary level is similar. The comparison was maybe more incomplete than wrong, but it certainly wasn't utterly, absolutely wrong magical voodoo.

(And, no, of course eyes don't have the same limitations as ears. They have different limitations, but limitations just the same. There are plenty of ways you can compare the two.)