19 May, 2003, 12:08:44 PM

Folks,Interesting review of SACD vs. CD audio mastering of Pink Floyd's "Dark Side of the Moon" in the Stereophile web site. Seems that mastering quality plays a significant role in the sound difference between the two.

It is a nice article. Now could someone explain why "peak limiting" process damages the sound and how much? I use the Waves L2 Ultramaximizer plug-in at home (which is essentially a peak limiter and a dithering & noise shaping tool). I wrote about this here before. And when I read about what it does in its manual it seemed to me some kind of local replaygaining using some sort of look-ahead to figure out the peaks that should not affect the average sound volume. Therefore you can set the avergae volume to be at your regular listening level without worrying about the peaks. Of course how agressive it would be is up to you because you deterimine when those peaks are reached.

The author of the article mentions about "severely clipped transients" and the waveform of the CD layer having squared-off shape at the peaks (which is a result of peak limiting). I'd like to hear your opinions about how harmful this is to the fidelity.

The object of mankind lies in its highest individuals.One must have chaos in oneself to be able to give birth to a dancing star.

A very interesting article. One of my favorite albums of all time- so I've been wondering how they would handle it. I wouldn't hazard a guess if the differences were intentional (ok- wouldn't doubt it); but having purchased original cd's from the first few months they were offered to the public, up till the present (yeah, that makes me really old!), I can say with certainty that over the years the cd's have gotten louder and have more 'punch'. If you listen to an older cd, you want to increase the volume or adjust the equalization. So, seems like a foregone- and sad- conclusion- that most cd's have been butchered over time- to obtain the reported 'punch' and volume, etc. So much for the 'perfect' digital copy that was promoted as cd's selling point when they first came out- and maybe why lp's sounded warmer. Anyway- just an old guy's observation- thanks for an interesting link.

When I first heard about the re-release, I was actually expecting something like this to happen - the CD layer compressed to hell and sounding much inferior, either to show the world how much of an improvement SACD is sound quality-wise (ha ha), or just to follow the current trend.

This is what the 1992/1994 remastered version looks like, by the way. Heh. Apart from the multi-channel sound (which would be artificially introduced afterwards and not intended by the artist in this case), there's probably zero difference between this and the SACD (DSD) version (there's no tape hiss on the CD version that could have been removed etc., it's all good). The CD version just has the advantage of being playable anywhere and being less expensive - my choice is clear!

Quote

atici wrote:The author of the article mentions about "severely clipped transients" and the waveform of the CD layer having squared-off shape at the peaks (which is a result of peak limiting). I'd like to hear your opinions about how harmful this is to the fidelity.

Many people say that dynamic range compression (which is, essentially, peak limiting) takes the crispness and sharpness off transients. That's one point, but often it only accounts for a small change in sound (the change in volume left aside for a moment) - for example, I can't ABX the difference between the remastered and original versions of the loud instrumental part (using a 20 second clip thereof) of Dire Straits' "On Every Street". (I used ReplayGain on the excerpts before ABXing, of course.) Here are waveform graphs of the original and remastered versions - you be the judge whether or not this is actually "aggressive" compression. I think it's quite heavy already (it becomes quite obvious when you look at the waveform graphs of the replaygained versions [original / remastered] - the remastered version looks like the peaks have been sawed off in a straight line), but it's nothing like modern albums (track gain -2.62 dB vs. -6.20 dB).

[I just had a thought - perhaps we could organise a small listening test to find out whether anyone can actually tell the difference between remaster and original, using different samples off different albums.]

But what I personally find most annoying about excessive compression is that it just takes the fun out of the music! When listening to loud albums on my CD player, I find myself constantly turning down the volume - until I'm fed up and either turn it *really* low so I just hear some background noise, or turn it off completely. (Bruce Springsteen - The Rising is a good example. It's not LOUD by current standards [album gain approx. -7.5 dB IIRC], nor is it distorted, but nevertheless, except for one *very* quiet track that somehow doesn't fit in there loudness-wise, it's ALL AT THE SAME CONSTANT VOLUME. It gets boring after a few tracks, and I seldom listen through the album completely.)

Of course, if you really go over the top with peak limiting (like here), you're going to introduce horrible distortion. That makes any album sound like crap because it's plain bad quality. Add to that the fatiguing effect, and you won't get much fun out of your music at all.

So... the most annoying thing about dynamic compression is to me that the music gets boring if everything is at the same loudness level.

I'll have to look at my Mobile Fidelity "Ultradisc" of DSotM (circa 1989) to see how it was mastered re: compression. Anyone already looked at it?

The trend toward extreme compression is pretty annoying, actually. Dynamic contrast is a very powerful part of music expression - I guess a dynamic recording wouldn't sell well to the masses as it would get compressed by the radio stations anyway.

Perhaps some day the tendency to overcompress will fade and we'll have more involving recordings. Right now its like everything is "turned up to eleven."

It is a nice article. Now could someone explain why "peak limiting" process damages the sound and how much? I use the Waves L2 Ultramaximizer plug-in at home (which is essentially a peak limiter and a dithering & noise shaping tool). I wrote about this here before. And when I read about what it does in its manual it seemed to me some kind of local replaygaining using some sort of look-ahead to figure out the peaks that should not affect the average sound volume. Therefore you can set the avergae volume to be at your regular listening level without worrying about the peaks. Of course how agressive it would be is up to you because you deterimine when those peaks are reached.

The author of the article mentions about "severely clipped transients" and the waveform of the CD layer having squared-off shape at the peaks (which is a result of peak limiting). I'd like to hear your opinions about how harmful this is to the fidelity.

The most damaging form of "peak limiting," aka "dynamics compression," is when you _hard limit_ peaks.

With hard limiting, every amplitude above a certain level, say -3 dB, gets converted to exactly that level, -3 dB in this case. Hard limiting can turn sine waves into waves with "flat heads," which adds a lot of not-before-present frequencies to the mix, and sounds a lot like clipping. (Actually, clipping *is* a hard limit, where the maximum value storable by a WAV is the limit.)

As more dynamics compression is used, the music can sound louder at the same (player) volume level, and sections of music sound more similar in loudness. Also, more frequencies are introduced, and it gets closer the virtual clipping effects of a hard limiter.

Really, when you think about it, dynamics compression is a way to get music *to* clip, but with less audible distortion.

As dynamic compression gets used more & more to make louder & louder music, it gets closer to sounding like plain clipping, which is what dynamics compression is trying to avoid in the first place.

LOL, that's a damn good question. Just because DVD-A can store more values, doesn't mean that the peak value will be any louder than the peak of a CD, so you'll probably still see dynamics compression even in 24-bit audio.

And since dynamics compression brings everything up to a normal volume, there will be no need for 24-bit's added dynamic range anyway. And the reduced quantisation noise will be drowned out compared to the distortion that extreme compression makes.

And since dynamics compression brings everything up to a normal volume, there will be no need for 24-bit's added dynamic range anyway. And the reduced quantisation noise will be drowned out compared to the distortion that extreme compression makes.

Hmm but in 24 bit audio after peak limiting there will be less problems related to the cutoff because the range is higher so the peak limiter does not need to be that agressive. Isn't it the case? (Assuming you listen at the same volume) Because after all the reason one uses peak limiter is the limited dynamic range to prevent clipping, otherwise a simple change in volume would solve it. Now at this point doesn't 24 bit audio also enlarge the interval of wave amplitude it encodes? You would have been right if it merely divides each interval of 16bit audio into 256 steps...

The problem is you want to listen to your music at some volume but the 16 bit range does not allow this so you clip the peaks so that the part that is away from the peaks is encoded more precisely at the expense of introduced problems at the peaking area. I think the number of people that prefers the CD layer is also an indication of this...

Is it indeed possible to ABX in a professional test setting between 24bit/96kHz and 16bit/44.1kHz downsample of the same song? It should depend on the sample but I think most members in HA think 24bit/96kHz is overkill and could not be ABXed.

Last Edit: 19 May, 2003, 10:11:19 PM by atici

The object of mankind lies in its highest individuals.One must have chaos in oneself to be able to give birth to a dancing star.

Hmm but in 24 bit audio after peak limiting there will be less problems related to the cutoff because the range is higher so the peak limiter does not need to be that agressive. Isn't it the case? (Assuming you listen at the same volume) Because after all the reason one uses peak limiter is the limited dynamic range to prevent clipping, otherwise a simple change in volume would solve it.

The range of a 24-bit recording is no "louder" than a 16-bit recording, it is just more accurate. As I said above, the peak value on a CD should be no louder than the peak value on a DVD-A.

You're assuming each amplitude level (65,536 for 16 bits, 16,777,216 for 24 bits) represents the same change in volume, which it doesn't. It *could*, but then DVD-Audio would be, oh, 256 times louder than CD at their respective peaks, with the quantization noise staying at the same volume level as 16 bits. (So you'd need to introduce dither, just like in 16 bits, assuming that no one would make the music on a DVD-As that insanely loud.)

Even if the volume levels of DVD-A and CD aren't equal, what's to prevent use of dynamic compression just to make the music sound the same volume level? Not to mention, the volume increasing has gone up & up on CDs, why should DVD-As be any different? People still want their music to be louder than the next. And eventually it'll reach that top bit, and eventually dynamic compression will be used of "nessesity."

It's not a format change that's nessesary to solve the overcompression problem, it's the whole cycle that needs to stop.

Quote

Is it indeed possible to ABX in a professional test setting between 24bit/96kHz and 16bit/44.1kHz downsample of the same song? It should depend on the sample but I think most members in HA think 24bit/96kHz is overkill and could not be ABXed.

Assuming you have a 24-bit processing & outputting card, probably so. Someone at least convinced ff123 that he could hear 24-bit vs. 16-bit on a 24-bit soundcard, with real-world-instrument music.

While some people can probably ABX 24bit vs 16bit, I'm certain you couldn't ABX 24bit/44.1KHz and 24bit/192KHz, unless you had extrordinarily high hearing & the sample was hi-passed. Even though some people can hear near or beyond 22.05Khz sound frequencies, the humanly audible level will be dropping significantly, and lower hi-frequencies from the music would definitely mask such borderline audible frequencies.

I've compared the "Shine On" boxed set version with the standalone version (still sold in stores today) and the redbook layer of the SACD, and I have some conclusions:

1. The SACD might be decent (can't test it), but the redbook layer of this CD is made from similar master tapes to the ones Shine On was made from, only they boosted whatever they thought would attract more FM radio listeners. As a result, not only there's tape hiss boosted, but there's so much less Parson-ism in it I can only feel sad for Alan Parsons and the fact that millions of young fans would be exposed to this version and not the Shine On one.

2. The standalone version sucks. EMI must have tried to make a quick buck here. There are sound drops in too many places and David Gilmour's guitars are much less noticeable and quite muffled. Shame.

3. I believe the MFSL version sounds just as good as the Shine On version, only different in some ways. If anyone would like to post 5-10 seconds from "Time" or "Money" (.wav), I'd appreciate it.

And if Warhol's a genius, what am I? A speck of lint on the ***** of an alien

The range of a 24-bit recording is no "louder" than a 16-bit recording, it is just more accurate. As I said above, the peak value on a CD should be no louder than the peak value on a DVD-A.

You're assuming each amplitude level (65,536 for 16 bits, 16,777,216 for 24 bits) represents the same change in volume, which it doesn't.

Are you sure about that? I think 24 bit audio also encodes a larger interval of amplitude. I am not assuming that it each level represents the same change in volume. But I guess you assume that each step is exactly 1/256 times the change in volume in 24 bit case compared to 16 bit.

Because otherwise peak limiting makes sense to me. Using 16 bits in your hand you want to encode your music as precisely as possible. And since the peaks occupy a few percent of your music, you can compromise on that for a more accurate encoding of the rest. Because when you increase the volume and encode you can now distinguish between two points that used have close volume levels and therefore encoded by using the same 16 bit sequence before. But all my assumption is based on the above fact that 24 bit audio also increases the highest volume that could be encoded.

Quote

It's not a format change that's nessesary to solve the overcompression problem, it's the whole cycle that needs to stop.

I think that is stereotyping all the people doing the mastering. I am sure there're people (especially in the classical music business) that know what they're doing and would make the most of 24bits. Just because Average Joe likes louder music doesn't mean the record companies are dumb to produce a master with severly cut off peaks. And I don't understand the idea behind "Average Joe likes louder" either. Who could be that dumb not to touch the volume button if the recording volume is low. If you like it loud, make it loud.

The object of mankind lies in its highest individuals.One must have chaos in oneself to be able to give birth to a dancing star.

I bought the SACD version (even though I already own the CD ), and I don't find it that bad. IMO, some CD releases are worse than the SACD CD layer (I can't test the DSD layer, since I have no SACD equipment).

Of course, nothing beats the sheer superiority of Shine On version. (Thanks for the hints, Seed)

I have been pondering about peak limiting issue in a naive sense I came across this. Frank Klemm again!

Now given any analog wave we aim to quantize it in x-bits. What volume should we set? If we set the volume low then the during quantization we'd lose more information then we otherwise would because our ADC wouldn't be able to distinguish the difference as accurately as it could if we had increased the volume. However we don't want to increase the volume so high so that our signal goes beyound the boundaries of the amplitude range we represent with our x-bits. So the most conservative solution is to increase it so that the highest peak in the signal (during a single mastering cycle) sits at the highest amplitude our x-bit system could represent. Naturally this is not the best solution, because there would be (and there's with most of the music I listen to) some peaks that does not reflect the general flow of the signal. Because there're probably very few of them and these peaking regions constitute a little fraction of the piece, at the expense of losing quality (because of the peak limiting and therefore cutting off the transients) at these regions we'd like to gain quality (higher bit resolution) for the rest of the signal. I guess that sums up the peak limiting process in my mind. Therefore peak limiting is essential no matter you encode with 16 bits or 24 bits and should not be denounced. But it is no panacea and should be used with caution. Because as opposed to the most conservative solution above in which we don't achieve the highest potential of x bits, we can raise the volume so high and then cut them off to be left with a totally distorted piece of s**t.

Now for the SACD vs. CD case: Probably the signal amplitude range a 24 bit SACD could represent is higher than the 16 bit CD (could someone clarify this? but that's why most professional plugins use >16 bit internal processing techniques: mainly to prevent clipping withing the plug-in processing) therefore the volume of SACD should have been higher which is confusing me Because even in the most conservative sense I mentioned earlier we want the highest peak to sit at the maximum representable amplitude which is higher for the SACD. So I don't understand why the CD layer sounds at a higher volume

Please explain me if I do any argumentation mistake...

Last Edit: 20 May, 2003, 02:18:39 AM by atici

The object of mankind lies in its highest individuals.One must have chaos in oneself to be able to give birth to a dancing star.

Peak limiting has nothing to do with resolution. Resolution in bits is just related to the max. SNR (signal to noise ratio) of the signal. Loudness just depends on how you set your amp volume level, nothing to do with the nº of bits.

So, in practice, resolution just sets the mimimum value of the the noise floor (quantization noise). With 16 bits, noise floor can be as low as around -94 dB below full scale. With 24 bits, noise floor can be as low as -144 dB below full scale, but in practice quite less due to real world electronics.

Peak limiting is not needed at all, given how low is this quantization noise floor. It can be useful to avoid extreme transients, and to make the cd's sound louder.

An example of this is Dire Straits "Brother in Arms" original 1982 cd. It is one of the first good cds produced. It sounds very quiet in comparison with most cd's, I'd say because the 16 bit dynamic range is wisely used, and there is little peak limiting or compression. If you pump a little up the volume, the cd will sound fine, you won't hear any noise floor (because 16 bits gives plenty of headroom for this), and you have acess to great dynamics. This happens also with good classical music recordings, but is very infrequent in today's pop/rock music.

Severe peak limiting and overcompression are used to give a characteristic "homogeneous", FM-type sound to the music, and to make it sound as louder as possible in comparison with other cds at the same amp.volume. But dynamics are also wrecked as consequence of this.

I meant the effective resolution, i.e. it sounds like... Squezzing the most of our bits.

Quote

Loudness just depends on how you set your amp volume level, nothing to do with the nº of bits.

So doesn't a given CD have intrinsic sound level? That contradicts same recordings having different volumes (take this album for instance).

Quote

With 16 bits, noise floor can be as low as around -94 dB below full scale. With 24 bits, noise floor can be as low as -144 dB below full scale, but in practice quite less due to real world electronics.

This is due to the inherent thermodynamic noise in the DACs I guess, right?

The object of mankind lies in its highest individuals.One must have chaos in oneself to be able to give birth to a dancing star.

I meant the effective resolution, i.e. it sounds like... Squezzing the most of our bits.

Yes, but today's overcompressed cd's have an effective resolution (=dynamic range) of as much as 20 dB, and 16-bit allows 94 dB!!

Quote

Quote

Loudness just depends on how you set your amp volume level, nothing to do with the nº of bits.

So doesn't a given CD have intrinsic sound level? That contradicts same recordings having different volumes (take this album for instance).

True, but the final loudness is set by you and your player electronics, not the cd. And there's no reason why 24 bit should sound louder that 16 bit, that just depends on the nominal output level that your player electronics produce at digital full scale. In 24 bit sound cards, this full scale output level is usually the same when playing 16 of 24 bit data, just the quantization noise level is what changes.

Quote

Quote

With 16 bits, noise floor can be as low as around -94 dB below full scale. With 24 bits, noise floor can be as low as -144 dB below full scale, but in practice quite less due to real world electronics.

This is due to the inherent thermodynamic noise in the DACs I guess, right?

atici wrote:but that's why most professional plugins use >16 bit internal processing techniques: mainly to prevent clipping withing the plug-in processing) therefore the volume of SACD should have been higher which is confusing me

That's one thing I don't quite understand either (probably because I lack basic knowledge about the concepts of digital audio). If the maximum volume is the same in 24-bit and 16-bit audio, how can a sample that would be clipped in 16-bit audio not clip in 24-bit? What I'm imagining is something like this (grossly oversimplified):

Im thinking that 24-bit audio just allows for more precise sample values, which is correct. But there's something I'm missing - if anybody could enlighten me, I'd be grateful.

(Someday I'm gonna order a book on this subject, I surely want to become more educated on these matters.)

Seed:

Quote

I've compared the "Shine On" boxed set version with the standalone version (still sold in stores today) [...]

Uh? I always thought the "standalone" version and the Shine On version would have be identical, because the standalone versions of the Pink Floyd Remasters say "Digital Remasters (P) 1992", which is the same year the Shine On set was released AFAIK. Hmm, that's very interesting.

Now given any analog wave we aim to quantize it in x-bits. What volume should we set? If we set the volume low then the during quantization we'd lose more information then we otherwise would because our ADC wouldn't be able to distinguish the difference as accurately as it could if we had increased the volume. However we don't want to increase the volume so high so that our signal goes beyound the boundaries of the amplitude range we represent with our x-bits. So the most conservative solution is to increase it so that the highest peak in the signal (during a single mastering cycle) sits at the highest amplitude our x-bit system could represent. Naturally this is not the best solution, because there would be (and there's with most of the music I listen to) some peaks that does not reflect the general flow of the signal.

But they do represent the amplitude of the actual waveform at that point. To change them is to distort it.

You're arguing that it's better to distort that one moment, than to put up with lower resolution during the majority of the music (due to lower volume). For this latter problem to be at all significant, the music has got to be getting down below 70dB BELOW that peak value. With classical music, this may happen very occasionally. With pop and jazz and rock, it virtually never does.

So, your approach will distort that moment's transient, for an inaudible gain in resoution (i.e. reduction in noise) during the rest of the music. What if you can hear the effect of reducing the amplitude of that transient? Then you've added an audible distortion to remove an inaudible problem - madness!

In practice, the records that could (just, possibley) benefit from this (wide dynamic classical recordings) can't use it because the loudest peak transient isn't a single sample, but (within 1-2dB) peaks that loud for quite a while. Reducing the peaks is audible, so not acceptable.

There are plenty of audiophile releases which maintain the peaks, and have a very low average volume. But turn your volume control up, and they sound great. No resolution or noise problems - just very high quality sound. I'm thinking of Chesky, Telarc etc etc etc

More recently, all these remastered albums were made available separately; and additionally, most other Floyd albums have been remastered in the same manner. The only exceptions are the compilation album Works and the most recent albums which have no need for touching up (DSoT, TDB, and p.u.l.s.e).

These remasters are based on the original master tapes, and were done by Doug Sax (supervised by James Guthrie) at the Mastering Lab, in Los Angeles. They generally represent a higher level of quality than the previous Harvest discs (which in turn were generally superior to the Capitol and CBS discs sold in the US). In addition to the heightened sound quality, the remastered editions feature (in almost all cases) expanded booklets with new artwork and lyrics (even on the early albums!); the discs themselves are all picture discs.

NOTE: There has been some disagreement over whether the new EMI discs that have _Shine On_ counterparts are or are not identical. The general consensus is that they are; and if they are not, then they were at least done by the same people, at the same location, with the same equipment, at the same time, and for the same company.

I was puzzled when reviewing the DSotM disc: the CD layer sounded more aggressive than the hybrid's SACD tracks. Not having access to the test equipment JA has on hand, I chalked the differences up to varying characteristics of the two analog-to-digital converters (one PCM-based, the other DSD) used for each layer and the more laid-back qualities of SACD sound.

This is so typical of Hi-Fi reviewing, and what makes me distrust 99% of it. "I chalked the differences up to varying characteristics of the two analog-to-digital converters". Surely anyone who claims to be qualified to judge the sound quality of CDs should know the sound of bad CD mastering and overcompression and peak limiting etc etc and be able to pin-point it and recognise it. They need to know (a) what CD is capable of, and (B) how it's usually abused.

To assume that the difference between the two versions was due to the different formats is exactly what the audiophile crowd want to believe. Maybe it's unfair to blame Jon, since he had been told that the masters were identical - what else should he think?

However, since the record companies have a vested interest in making SACD sound better than CD (and, thus, CD sound worse than SACD!), any review of this nature must check for foul play. Not to do so is negligent. I'm glad that the two reviews (sound and technical) were published, to make the picture clear. As Jon says, "EMI and Sony have conspired to place DSD in a more audiophile light with this manipulation—which is troubling when you start to ponder which other hybrids might have been altered in this manner." - too right! How many have been unquestioningly reviewed?

I'd suggest this: when reviewing something that's supposed to be "better than CD", copy it onto a CD! Get a decent CD recorder or sound card+CD burner, and make an analogue>digital copy. Then play it in a decent CD player. Does it still sound "better than the CD"? If so, suspect that things are not as they seem. Does the CD-R sound significantly worse than the "better than CD" format? If not, please have the guts to say so!

If the CD-R in this exercise actually sounds better than all released CD versions (quite likely) then it shows how shoddy typical CD mastering is. Is the answer a new audio format? Well, it can't hurt the record companies. Then in 20 years time they can just do the same thing to DVD-A and/or SACD and start the whole cycle over again!

Uh? I always thought the "standalone" version and the Shine On version would have be identical, because the standalone versions of the Pink Floyd Remasters say "Digital Remasters (P) 1992", which is the same year the Shine On set was released AFAIK.

Yes, I agree. But there was an older version before that.If I recall correctly they were not only remastered but (partly?) digitally remixed (from the multi tracks). I'm not 100% sure about DSotM but at least Atom Heart Mother was clearly a new mix.

Anyway thanks for the enlightening pics B) and I'm glad I have that early 90's version too. In that period the label "Digitally Remastered" meant often a better sounding version. On recent issues it's more like a warning "This has been messed with" .I'm pretty sure EMI and others will replace many "Classic Albums" like this with a SACD (hybrid) version. For new releases it will become standard practice. And then it will true, the SACD layer will always sound better to the audiophiles (you know why).

On a side note: The original CD version of Animals (recorded 1977) had a highest peak of only 75%. This tells you at least that it was mastered before the "loudness race"

That's one thing I don't quite understand either (probably because I lack basic knowledge about the concepts of digital audio). If the maximum volume is the same in 24-bit and 16-bit audio, how can a sample that would be clipped in 16-bit audio not clip in 24-bit? What I'm imagining is something like this (grossly oversimplified):

[...]

Im thinking that 24-bit audio just allows for more precise sample values, which is correct. But there's something I'm missing - if anybody could enlighten me, I'd be grateful.

Again, the volume of recording/playback depends on the equipment, not the data format. If a signal gets recorded that clips at 16 bits, it will also clip at 24 bits.

What a higher bit depth allows, however, is recording engineers can set the volume lower without worrying about quanitization noise being too high compared to the recorded signal. Otherwise, during the mixing of the tracks, compression, and normalizing, the recorded clips will add, modify, and raise their quantization noise. (Quantization noise on 24 bit is 256 times quieter than 16 bit.) Tracks will be recorded at 24 bits, processed, mixed, compressed at 24 bits, and then converted to 16 bits as the final step for producing the CD tracks. The quantization noise on the 24 bit file will probably be lower than anything the 16 bit file can represent, and even if not, it will be drowned out by the much louder (comparitively) quanitzation noise of the conversion to 16 bit. And since the LAST stage is the bit depth conversion, the noise doesn't get amplified.

This topic, which partly addresses the possible quality of the CD format, encouraged me to cover a couple of points about dither and the difference between dynamic range and signal to noise ratio and how to visualise it more like the ear does, instead of using waveform view to develop rules of thumb that aren't quite true of what you can hear. So it's a bit long, with pictures, but I will return to SACD at the end.

I saw the Dark Side Of The Moon SACD with red book CD layer on sale recently and suspected I'd soon read about it on HA.org for falling foul of this trend towards overloud remastering of the CD version, and here it is!

By the way, to visualise the effects of clipping, a pure full-scale sine tone at 1760Hz (as used below) amplified by just +1.0 dB (12% amplitude) and peak-clipped to full scale introduces numerous harmonics (overtones) at multiples of the original pure 1760 Hz frequency:

1760 Hz sinewave, +1.0 dB FS amplitude, clipped, frequency analysis.

Another point, as I understand it, is that SACD's DSD encoding is completely unlike PCM (which is used for CDs, and DVD audio) and doesn't have a simple maximum sample value as a hard limit. For that reason the SACD spec includes a loudness measurement on the released music to ensure they don't fall outside the spec. In a way this is more like the analogue limits imposed on vinyl mastering to prevent excessive demands on stylus movement (except you don't have to turn down the bass to fit within the limits, then boost it back in a turntable pre-amp). In a way, having a vague limit (and specific measurements) can prevent the record companies from abusing their mastering engineers as a means to make it sound "hot" and louder than everything else on the CD changer, so SACD, rather like vinyl cannot so easily become subject to this loudness war.

PCM (CD, WAV, etc) samples the waveform a fixed number of times per second and represents the analogue value with a fixed number of bits, like 16 or 24, giving 2^16 or 2^24 instantaneous levels.

Dither is essential to minimise harmonic distortion and make the digital audio suitably analogue, and flat dither tends to add about 3 dB more noise, reducing the SNR by about 3 dB accordingly.

Noise shaped dither will reduce the perceived noise by about 15 dB for 44.1 kHz sampling and about 36 dB for 96 kHz sampling, according to Frank Klemm's information), but the measured noise on waveform view will be larger, reducing the measured SNR. At the same time the perceived SNR has improved, as has the perceived dynamic range.

You can visualise this effect by looking at the spectral plot (e.g. Frequency Spectrum in Cool Edit or Exact Audio Copy), which is relatively close to way the ear's cochlea works (different parts detect different frequencies). With about 1024 sample FFT size and a Blackman window function, you get a pretty good picture. A logarithmic frequency scale from about 20 to 20000 Hz is also more ear-like, but doesn't show the energy-density of noise shaped dither clearly, so I'm using linear frequency here.

The tone was created in Foobar2000 v0.62a as Add Location... tone://1760,2 with ReplayGain values manually set to 0dB for gain, 1.0 for peak, with only Mono to Stereo DSP used, converting (Diskwriter) to 16-bit PCM dithered, flat dither (no noise shaping), tone generator set to 44100 S/s, oversample 32x. It was analyzed in EAC's WAV editor (which only accepts 44100 S/s stereo, 16 bit WAVs)

Even on flat-dithered 16 bit audio, as above, you'll note that the noise in each spectral bin is below -96 dB. For an FFT size of 1024 with a Blackman Window Function (giving similar time resolution/averaging time to the ear) you'll note that noise is about -120 dB in each frequency bin for flat dither. This means the normalised power (not amplitude) in each frequency bin is 1e-12 (10 to the power of -12 or 10^(-12)), and over the whole power spectrum of 512 bins, this adds up to 5.12e-10 of total dither noise power). Converting to dB (10 x log(5.12e-10), this comes to -93 dB, which is the expected signal to noise ratio after adding dither. (Note a 1024-point FFT gives 512 bins in the power spectrum because of negative frequencies, which make sense in complex mathematics but are ignored for the power spectrum)

Now, with Garf's strong ATH noise shaping (recommended) dither applied to 16-bit audio in Foobar2000 v0.62a, sample values can reach a peak sample value of +32 and have an average RMS power of -70 dB, but although this is about 26 dB more power, it's perceived to be considerably quieter (perhaps 15-20 dB quieter) than flat dither because most of that extra power is concentrated in high frequencies to which the ear is insensitive, while there's less power in the frequencies where the ear is most sensitive:

That example includes the same 1760 Hz tone at 0 dB FS. This one is silence plus strong ATH noise shaping dither, just for illustration of the dither spectrum alone:

Silence, strong ATH noise shaping dither, frequency analysis.

You could imagine how much more noise we could fit in up to 24 kHz if we had a 48 kHz sampling frequency. This might buy a further few dB less noise in the 1-4 kHz region while still preventing truncation distortion as dither ought to.

Add up the power of all the components and the total dither power comes to -70dB over the full spectrum, but in the important regions it's well below the -120 dB per bin we had with flat dither. It's more like -138 dB per bin, at about 1-4 kHz where the ear is very sensitive (e.g. babies crying!). 18 dB less is about 1/64th of the power spectral density.

Incidentally, the 1760 Hz tone when scanned in FB2K, generates a replaygain of -15.22 dB, so that tells you how piercingly loud the full scale signal would sound if played without RG, partly because it's in the region where the ear is very sensitive.

Now, let's consider how the ear manages to perceive a sine wave supposedly below the noise floor of 16-bit audio, at -102 dB (amplitude = 0.25 bits, peak-to-peak = 0.5 bits). Remember, the simple Signal to Noise Ratio (SNR) is 96 dB, so a signal at -102 dB FS, by that rule of thumb, ought to be simply lost. Thanks to dither and the way the ear works similarly to the frequency spectrum, that's not the case, and the dynamic range for perceiving tonal frequencies is greater than the SNR.

This is where dither is essential. I'll manually set the RG values to -102 dB to create this tone.

Using flat dither (no noise shaping), you cannot discern the 25.06 sample period of the 0.25-bit high sine wave when you zoom in on the waveform:

This is a much better representation of how the ear perceives things than the waveform view where your eye can't pick out the correlated timings of those sporadic spikes in the least significant bit to notice the frequency that's actually present there. The ear contains resonant detectors which can pick out the frequency, rather like a spectrum analyzer.

So people who worry that a -96 dB sine wave would disappear when applying -6 dB of Replaygain, might be reassured by this demonstration (which requires dither to be guaranteed to work, although other tones in music often effectively provide partial dither even with undithered reproduction, such as simple MP3 decoders used in portables which don't suddenly start losing the quiet tones within the music when you apply mp3gain or do deep fadeouts using mp3directCut). You can try various things out more audibly with Foobar2000, by playing in the preferences. You may be surprised how good 8-bit 44.1 kHz audio with strong ATH noise shaped dither can sound! 8-bit audio uses sample values from -128 to +127 and has a -48 dB SNR (whereas 16-bit uses 1/256th the step size, giving -32768 to +32767).

With strong ATH noise shaping dither, the same signal looks even more deeply buried in noise on the waveform view, but the absence of noise spectral components at the ear's most sensitive frequencies actually helps it stand out even more clearly in the spectral view compared to the noise level at frequencies around it:

Remember that the original 0 dB FS sinewave had a perceived volume (replaygain calculation) of 89 dB SPL + 15.22 dB = 104.22 dB SPL. The -102 dB FS one, shown above, has a perceived loudness of about 2.22 dB SPL (sound pressure level), and you can see that there's scope for a fair bit more reduction before it sinks into the noise floor on the frequency spectrum (even with flat dither, let alone strong ATH noise shaping).

Now, music isn't all tonal frequencies like sine waves, and it includes transients, percussive noises and vocal sibilants, which are far more noiselike and broad-band in the frequency spectrum.

To try this out, I generated some pink noise at 44100 S/s stereo, 16-bit. Foobar2000 reported the ReplayGain as -2.07 dB (peak = 0.757446), meaning the original was 91.07 dB SPL. (Programs like Cool Edit can generate noise)

pink noise, 91.07 dB SPL perceived loudness, frequency analysis.

I then silenced 0.5 seconds in the middle of the 2 second generated noise using zero-crossing adjustment, then faded in a further 0.2 seconds in the right channel only (from 0% to 100% linear) and saved this as pink_noise_edited_91.07dB_SPL.wav . This modulation of the noisy sound's volume is the sort of thing one needs to perceive (e.g. beating drum, cymbal attack/decay, vocal sibilant sound) and the left-right difference might equate to some impression of stereo image panning.

To bring the noise to a similar perceived loudness as the -102 dB FS tone, i.e. 2.22 dB SPL, I manually entered replaygain values of -88.87 dB (peak amplitude would now be 0.89 of a bit) and wrote it out using the diskwriter (with strong ATH noise shaping dither) then renamed that file pink_noise_edited_02.22dB_SPL.wav

On waveform view, the pink noise signal was completely hidden in the shaped dither, which looked like a flat signal with no variation in amplitude over time. I amplified it by 48.7 dB (by normalising to 25%) to hear it easily, making it equivalent to 8-bits. The sudden cessation of noise then its restoration on the left ear with rapid fade-in on the right was completely obvious despite the extremely low volume and peak amplitude.

Given the broadband noiselike nature of the signal, the frequency spectrum in the noise and when it disappears are difficult to see. However, the spectral view (spectrogram) is a WAV editor colour-codes the power of the frequency components as they vary with time, and after 48 dB amplification, it's loud enough to see quite clearly:

I redid the amplified noise as an 8-bit file, with the same dither type (now at 50.22 dB SPL perceived loudness) straight from fb2k, and it sounds the same, as you can hear in (this Monkey's Audio .APE compressed file, 59KB). With this sample, it's somewhere around 40 dB SPL that the variation becomes barely audible in an 8-bit file, so for 16-bit turned up loud, it would be around 48 dB lower, i.e. -8 dB SPL.

Given that the noise could reach a peak level of 1.0000 even without distortion and dynamic limiting, which is 2.41 dB louder than the 91.07 dB SPL, i.e. it's at 93.48 dB SPL, even for pink noise signals, the usable dynamic range is about 101.5 dB (and considerably more with dynamic compression).

For a mixture of noise and tonal signals (the latter may be perceived louder than noise-like signals, such as the 1760 Hz full scale sine tone perceived as +104.22 dB SPL), there's a usable range of around 112 dB before dynamic compression. For sounds with reasonable tonality (i.e. less like uncorrelated noise) they can be perceived over a wider dynamic range in 16-bit audio, of perhaps 120 dB (depending where you decide to call the cut-off).

I think this also demonstrates how sounds sink gracefully into the noise floor with dithered digital audio and aren't abruptly cut off as one might expect from looking at the waveform view.

Anyhow, getting back to SACD's DSD scheme, this operates at 2.8224 MHz sampling rate with a single bit of a fixed differential change up or down in voltage at each cycle per channel. Effectively, the resolution of the system is all created by dither (noise shaped) at inaudibly high frequencies (frequencies that may well be filtered out by the electronics and loudspeakers). It isn't necessary to use any brickwall filtering at around 22 kHz to prevent aliasing. This is 5645 kbps of data in stereo, most of which is involved in dithering at frequencies >30 kHz. Large amplitudes could be encoded at low frequencies, but such large amplitudes are not possible at high frequencies, so to preserve bandwidth and audible quality, limitations on the loudness of the signal are imposed by the SACD format. To follow even quite a loud 1 kHz sinewave upwards, one would have a number of up and down transitions every 0.35 µs, with the up transitions slightly outweighing the down transitions by a suitable amount so that the average position of the wobbly curve follows the sine wave as closely as possible given that it has to go up or down at each clock cycle. In essence, it's pulse density modulation.

CD has 1411 kbps of data, or about a quarter as much, and can achieve about 115-120 dB of effective dynamic range in the audible band using noise shaped dither etc on two channels. At 5800 kbps for two channels, SACD could effectively dither to an adaptable tradeoff between the effective bit depth (resolution) and the effective bandwidth. SACD is designed for 120 dB SNR in the audible bandwidth and up to 100 kHz maximum bandwidth (albeit with worse SNR outside the audible bandwidth than within it). Furthermore, with lossless compression, they can also put a 6 channel audio stream on the same disc.

The above article also suggests some of the uses of adaptable bandwidth versus resolution tradeoff for archiving various studio media digitally or to allow different mastering techniques to be used in future while remaining compatible with existing home audio equipment.

Personally, I think the inability to set a hard limit and light up those peak meters, combined with the ability to directly convert to noise-shaped dithered 16-bit 44.1 red-book PCM, may perversely be something that restores decent mastering values and dynamics to music if this format gets off the ground, even if it's not necessary to have quite so much dynamic range and frequency response. Also, it's harder to make a bad SACD player (or DVD-A player) than a bad CD player because the steep anti-alias filtering near the Nyquist limit and even the error correction and concealment aren't remotely as critical.

SACD cannot be maxed out to a hard limit the same way as CD, and if the production is done in the DSD domain with direct conversion to red book CD format for the second layer of the disk as described in the link above, the dynamics and mastering of the CD layer will be the same as for the SACD later. Alternatively, if the mass market are deemed to want their music heavily compressed, they can have the CD layer mastered that way while the SACD layer is mastered properly, so at least decent music is available (and hopefully on the same disc you bought before upgrading to SACD).

Really, what I hope for is a return to analogue thinking and discipline, and keeping an engineering safety margin to the limits of the medium rather than knowing you can go right up to full scale. I fear that DVD-A could in no time become as heavily compressed and boring as recent CDs and that remasters of old albums on DVD-A could be as badly damaged as some remastered CDs of late.

At the rec.audio.* newsgroups there has been some debate over this new SACD/CD DSOTM release, and the differences found comparing with previous CD releases. As you have discovered, the CD layer has been compressed and peak limited, comparing with previous CD releases. But people at these discussions have found that although the SACD layer is different to the CD layer in this case, the SACD layer has been also gone through some compression and peak limiting, in comparison also with previous CD releases. So, little advantage of DSD/SACD use here.

About being more difficult to make a bad SACD than a bad CD player due to lack of brickwall filter, SACD has other problems associated that CD/PCM doesn't, so this is not a clear issue:

First, some say that DSD converters could be much more sensitive to jitter, since any timing innacuracies of the converter clock become very important when this clock is of 2.8 MHz instead of 44.1 KHz. I am not sure of the relevance of this, since typical 44.1 oversampling DACs run at several MHz internally, too, and I don't know how the actual implementation deals with these posible problems. But one thing is for sure, in a SACD all digital electronics must run at 2.8 MHz, whereas in a CD player maybe not.

Second, a raw decoded DSD stream will have *lots* of ultrasonic noise at high frequencies, that can easily lead to intermodulation distortion products in amplifiers having some audible relevance, and to wasted amp power at playing high level ultrasonic noise. For this reason, SACD players require an analog filter that cuts the signal over 50 KHz to remove most of this noise, although in some players it can be disabled. So, a good analog filter is required, or you may run into problems.