I've avoided writing about this because it's "complicated", but people are starting to ask questions that indicate that they're confused so here goes. It's going to take several posts to cover this, so please bear with me.

So what IS volume, anyway?

Simply stated, volume is a measurement of the “loudness” of a sound (to a physicist or audio engineer, the answer is MUCH more complicated than that). There are lots of ways you can calculate volume, one measurement is in decibels (dB), which is a measure of the sound pressure level (SPL) emitted by a speaker.

In general, when discussing volume, there are two terms typically used – attenuation and gain. Attenuation represents a reduction in the amplitude of an audio signal, gain represents an amplification of that signal. If you look at a pro audio receiver, you’ll notice that the receiver represents its loudness as a negative number of decibels (-20dB, for example). This indicates that the receiver is attenuating the input signal by 20dB. By tradition, attenuation is measured in negative decibels, amplification is measured in positive decibels.

Audio signals flow through an electrical path and at different points in the path, there are opportunities to either amplify or attenuate the signal – the locations at which the amplification or attenuation occur are called “gain stages”, and they can occur in either analog or digital signals (an amplifier represents a gain stage, as does a potentiometer).

How does volume relate to digital audio?

When converting an analog signal to digital, the analog signal is sampled – the system measures the amplitude of the signal with fairly high resolution (44,100 samples per second for CD audio), then converts the samples to a digital value (a 16 bit integer for CD, or potentially a 32bit floating point value). In both cases, there is a reference range of legal samples – for this example, let’s assume that values range from -1.0 to 1.0 (it makes things easier).

Consider the following waveform:

When the sample is digitized, it is converted to individual samples like below:

Attenuation simply reduces the amplitude of the digital samples, and amplification simply increases the value of the signals.

For the reference sample, if you attenuate the sample by 50%, you get something like this:

Note that the waveform hasn’t changed shape, it’s just smaller.

If, on the other hand, if you amplify the same signal by 50%, you get:

Note that the samples that went beyond the +1 and -1 range were “clipped” – the samples can’t be represented digitally, so they were cut off. This clipping is very bad, and causes significant audio distortions. The new waveform doesn't really reflect the original waveform.

In addition, if a fixed point digital signal is attenuated then later digitally amplified, the signal resolution will be degraded – if, for example you apply a -6dB attenuation (which reduces the volume by 50%), you divide each of the samples in half (32767 becomes 16383). If you later amplify the signal digitally, you get 32766 – and thus you’ve lost some of the original signal information.

If, on the other hand, you’re using a floating point digital signal, you can attenuate and amplify with less worry – if you apply the same -6dB attenuation and amplification to a floating point sample, the division and multiplication cancel out (.75 becomes 0.375, becomes .75).

This is a large part of the reason the audio pipeline was converted to floating point in Vista – a floating point pipeline allows significantly more resolution and higher accuracy when manipulating the digital samples.

Btw, this loss of fidelity doesn’t happen when amplifying analog signals. That’s why it’s important that any amplification be done using analog signals, not digital signals – digital amplification is always lossy.

For audio hardware in Windows, the audio driver specification requires that for all hardware volume controls on the system that 0dB represent a full fidelity pass-through of audio samples – audio hardware can support either amplification or attenuation (or neither), but 0dB always represents “don’t change the samples”.

Please note that some audio hardware on the market does NOT follow this recommendation. We’ve seen audio devices that support a volume range of +0dB to +96dB. We’ve also seen devices that support volume ranges of +10dB to +60dB (mostly these are microphones).

Ok, so much for the basics on "volume", tomorrow I'll start discussing how volume works in the audio engine on Vista.

Each of the WASAPI instances in this picture represents an audio stream, the streams are combined together in a mixer and streamed to the audio driver. The Stream, Simple, and Channel volumes are implemented with a single volume APO inserted on the individual audio streams, the endpoint volume is implemented either with a volume APO inserted post mix or with a hardware volume control depending on the capabilities of the users audio solution.

So how do these various volumes interrelate?

First off, the stream volume. The stream volume is a multi-channel volume control that is applied to each audio stream. It's intended for use by applications that want to do really simple 3d effects (using the multichannel per-stream volume to simulate position information when bouncing an animated ball across the screen, for example). It also can be used if your application wants to let the system handle managing volume control for individual audio streams, but the scenarios for this are relatively rare.

Next, the channel volume. The channel volume is a per-session volume and is applied to all the streams in a session (if you remember from the "big picture" post, a session is a container for audio streams). The channel volume exists for one reason only: the waveOutSetVolume API - since the wave volume is a multichannel (ok, it's only stereo) volume, we needed to have an analog in WASAPI. Again, the scenarios for using this volume are relatively rare - changing the channel volume is typically done as a system setup task (room correction) from the control panel (so it affects all applications, not just one application).

Then there's the "simple" volume. The simple volue is a per-session volume and is applied to all the streams in a session. The simple volume is a uniform volume control - it applies to all channels equally. This is the volume control that we expect most applications to use - it provides a very simple mechanism to control an applications volume and mute state, which should be sufficient for the vast majority of applications out there. It's also the volume that is shown in the per-application volume slider in the Vista sound mixer.

Logically you can consider these three volumes as being applied in series (it's not really true, they're all applied at the same time) to create the final volume for each audio stream.

Finally, there's the endpoint volume. As I've mentioned before, the endpoint volume represents the "master volume" for outputs. It is applied to the post-mix audio stream, and functions as the master volume for the particular endpoint.