On the way to work today I was listening to Metallica's latest album Death Magnetic. This is a famous example of a CD made so loud it distorts* because some of the sound information has been digitally clipped off. There are many other examples of new music and 'remastered' music that's been pushed to the limit and beyond.

I've read reports that growing public awareness has meant that the trend of pushing the volume has passed its peak (excuse the pun).

* not the beautiful analogue electrical distortion familiar in metal music, but as if drummer were frying eggs on the cymbals.

It appears volume is stored as a 16 bit signed binary parameter (I guess this gives about 32000 levels of volume; I don't know what the negative values - those starting with a 1 - would represent). This is an extract from an article on http://www.barrydiamentaudio.com/loudness.htm

The advent of the Compact Disc meant recording time was no longer related to recorded levels, so engineers could turn it up and leave it that way for the duration of the disc. Digital however brought its own limits to how loud the signal could be. Unlike analog tape and disks, which reached their overload (and hence distortion) point gradually as the level increased, digital has a maximum that can't be exceeded without resulting in gross distortion.

Audio signals converted to digital are stored as ones and zeros. The lowest level that can be represented in binary form (the "code" of digital storage) would be all zeros. In the 16 bit CD format, this would be 0000000000000000 (16 zeros) and would signify complete silence. A sound at an intermediate level would be represented by a digital "word" consisting of some combination of ones and zeros, depending on exactly how loud that sound is. An example of such a digital word might be 0100100110110111. The highest level would be 0111111111111111 (a zero followed by 15 ones). This represents "full scale", also called 0dBFS (zero decibels, full scale). That's it. There aren't any twos in binary code so this is the loudest you can go. (For technical reasons which are beyond the scope of this article, the loudest value is not a series of 16 ones. Those who wish to delve further into this should look up "twos complement" as it relates to CD encoding.)

If we were to take the conversation and hand claps we talked about earlier and wanted to record them digitally without suffering any distortion, the hand claps (i.e. the peaks) might end up at zero on the digital meter (0dBFS) and the conversation, being say 20 decibels lower in level than the peaks, might end up at -20 on the digital meter. Our average level, the conversation, would be -20, with our peaks, the hand claps, at 0.

How does one remain "competitive" and make louder records once their recording is already hitting the digital ceiling? Our competitive engineer might want to make their record average at a level higher than -20. Let's say they wanted to raise the level by 3dB, a very easily audible loudness increase. If they merely raised the overall level by 3dB so the signal now averaged at -17, the peaks which are 20dB louder would now be 3dB over the 0dB distortion free maximum. Since there is no way to digitally represent a signal that exceeds 0dB, the peaks would be "clipped", meaning the natural shape of the sound wave's peak would be cut off at the top and instead of looking like a mountain, would look more like one of the flat topped mesas in southern Utah. Clipping in a music signal sounds quite harsh and "distorted", so our engineer has to find another way. This is where the compression comes in.

That's what happened to the Digitmovies release of Miklos Rozsa's 'Sodom and Gomorrah'. The tapes were either sampled too high, or equalised absurdly by pushing up volume beynd clipping point. Or possibly, by the engineer not noticing that the remastering had increased the volume.

It appears volume is stored as a 16 bit signed binary parameter (I guess this gives about 32000 levels of volume; I don't know what the negative values - those starting with a 1 - would represent).

Not to get technical here, but 16-bit encoding allows for 2^16 or 65,536 possible levels for each sample. "Signed" means that the audio samples are assigned both positive and negative values; that is, above and below the X-axis of a traditional waveform, respectively.

So for audio CDs, the range of binary values is as follows:

0000000000000000 to 0111111111111111 is 0 (X-axis or no amplitude) to +32767 (maximum positive value) [from the X-axis up to the maximum positive value]

1000000000000000 to 1111111111111111 is -32768 (maximum negative value) to -1 [from the maximum negative value up to just under the X-axis]

(from the article)The highest level would be 0111111111111111 (a zero followed by 15 ones). This represents "full scale", also called 0dBFS (zero decibels, full scale). That's it. There aren't any twos in binary code so this is the loudest you can go.

Per the above, there are actually two "highest levels" - the maximum positive peak and the maximum negative peak.

(from the article)(For technical reasons which are beyond the scope of this article, the loudest value is not a series of 16 ones. Those who wish to delve further into this should look up "twos complement" as it relates to CD encoding.)

I've represented this above. Basically, any sample value with a first bit (or "most significant bit") of "1" is below the X-axis and is assigned a negative value. To find the actual value of the sample, you change each "1" to a "0" and each "0" to a "1", and then add "1" to the value. Convert the resulting value to decimal and then slap the negative sign in front of it. Voila! You've just converted a twos complement binary value to decimal.