In a CD (and any other digital recording technology), the goal is to create a recording with very high fidelity (very high similarity between the original signal and the reproduced signal) and perfect reproduction (the recording sounds the same every single time you play it no matter how many times you play it).

To accomplish these two goals, digital recording converts the analog wave into a stream of numbers and records the numbers instead of the wave. The conversion is done by a device called an analog-to-digital converter (ADC). To play back the music, the stream of numbers is converted back to an analog wave by a digital-to-analog converter (DAC). The analog wave produced by the DAC is amplified and fed to the speakers to produce the sound.

The analog wave produced by the DAC will be the same every time, as long as the numbers are not corrupted. The analog wave produced by the DAC will also be very similar to the original analog wave if the analog-to-digital converter sampled at a high rate and produced accurate numbers.

You can understand why CDs have such high fidelity if you understand the analog-to-digital conversion process better. Let's say you have a sound wave, and you wish to sample it with an ADC. Here is a typical wave (assume here that each tick on the horizontal axis represents one-thousandth of a second):

When you sample the wave with an analog-to-digital converter, you have control over two variables:

The sampling rate - Controls how many samples are taken per second The sampling precision - Controls how many different gradations (quantization levels) are possible when taking the sample In the following figure, let's assume that the sampling rate is 1,000 per second and the precision is 10:

The green rectangles represent samples. Every one-thousandth of a second, the ADC looks at the wave and picks the closest number between 0 and 9. The number chosen is shown along the bottom of the figure. These numbers are a digital representation of the original wave. When the DAC recreates the wave from these numbers, you get the blue line shown in the following figure:

You can see that the blue line lost quite a bit of the detail originally found in the red line, and that means the fidelity of the reproduced wave is not very good. This is the sampling error. You reduce sampling error by increasing both the sampling rate and the precision. In the following figure, both the rate and the precision have been improved by a factor of 2 (20 gradations at a rate of 2,000 samples per second):

In the following figure, the rate and the precision have been doubled again (40 gradations at 4,000 samples per second):

You can see that as the rate and precision increase, the fidelity (the similarity between the original wave and the DAC's output) improves. In the case of CD sound, fidelity is an important goal, so the sampling rate is 44,100 samples per second and the number of gradations is 65,536. At this level, the output of the DAC so closely matches the original waveform that the sound is essentially "perfect" to most human ears.

Thanks to MattD for the following [DM]:

Bit depth and sample rate are two different things.

Sample rate deals with the time/frequency domain. The more samples per second, the more time-accurate the recorded data is. Higher sample rates tend to image better for this reason (compare 24/48 to 24/96 to see).

Bit depth has to do with amplitude. Each bit can either be a 0 or a 1. This process occurs x times a second (where x is the sample rate). For 16-bit data, each of those 16 bits can either be a 0 or a 1, yielding 2^16 or 65,536 possible ways of representing a sample. Furthermore, each bit represents about 6 dB of audio, so 16-bit audio has a maximum theoretical dynamic range of 6*16 = 96 dB.

On the other hand, each 24-bit audio sample can be represented in 2^24 or 16,777,216 different ways and has a theoretical maximum dynamic range of 6*24 = 144 dB.

Bit depth and sample rate are independent of each other. The reason that people do not record 16/96 and such is that high frequency information is typically at much lower levels than the main part of the signal. On a 16-bit AD, this would be near the theoretical bottom of the dynamic range (for example, I have 24/96 recordings that have information from 20-30 kHz at about -80 dB.

This description of sampling frequency is wrong. This is the logical way one might believe that sampling frequency works, and your graphs are very well laid out, but in the real world, its completely wrong. Think of it this way, if you have any 2 points on a circle, you can draw that exact circle. Having 3 points on the circle will not make it any more accurate. Nyquist Thrm shows us that we can reproduce EXACTLY any wave form with two pieces of data within the sampling frequency. Therefore, a 10,000 hz sound is exactly the same whether produced by 96000 samples, 44100 samples, or 22000 samples. When sampling we do have to remove everything that is more than half the sampling frequency. For example, we have to put a steep low pass filter starting at 20,000hz so that we remove all data above 22,050 hz when recording at 44.1khz. This filter might cause audible problems in the audible band, which is why higher sampling can improve audio- you can make a much more relaxed filter that starts well out of audible range and isnt as steep causing less phase shift down into audible range.

The common argument against this is to look at a sine wave. Lets picture a 20khz sine wave. Sampling at 44.1k will only give us 2 points in this wave, yet we can reproduce it perfectly. Someone will often then draw a notch up or down in that sine wave and show that our sample wont include that notch. The overlooked flaw here is that the notch is of a higher frequency than 20khz. Hope this is helpful to those looking to gain a deeper understanding of digital sampling theory.

This description of sampling frequency is wrong. This is the logical way one might believe that sampling frequency works, and your graphs are very well laid out, but in the real world, its completely wrong. Think of it this way, if you have any 2 points on a circle, you can draw that exact circle. Having 3 points on the circle will not make it any more accurate.[snip]

(bolding above is mine) Ironic that in pointing out the error, the provided example analogy is incorrect. Good analogy though. It takes 3 points to define a circle, no more but also no less.