Introduction

I have noticed that the Audio section of the tutorials is a hard section to write for. All the articles written for this section receive really bad reviews. I expect, that will happen to this one once everyone realizes what it's about:

The old Windows Multi-Media extensions library or WINMM.DLL. This tutorial will not talk about Direct Sound or any other new audio system. In this tutorial, we will learn what a sound is and how to play it using WINMM.DLL. In fact, we shall write a library that we could easily use in our code to play this sound!

To be warned, I am generally not an audio person and I don't know much about the subject. This article is not for the hard core audio enthusiasts who want to program their own DSP. This article is for the rest of us!

What is a CODEC?

CODEC stands for COmpression DECompression. A CODEC simply knows how to compress and decompress a given format. Though CODECS are generally thought of in the context of video and audio, they are not limited to this scope.

What is an audio codec driver?

You have heard of or have seen "ACM" files known as "Audio Converter Modules". These are drivers installed on your system which export methods that can be performed on a particular audio stream. This includes converting audio from one type of stream to another. An example of this is perhaps an MP3 ACM which could convert MP3 audio to PCM audio.

What attributes make up an Audio Stream?

The "Bits Per Sample", "Number Of Channels", and the "Sample Rate". The "Bits Per Sample" is how many bits are in each sample of sound. The number of channels can be thought of as threads in a process. They are basically separate streams of audio contained in one stream that play at the same time. You usually refer to this as "Stereo", "Mono" and "Surround Sound". The sample rate is how fast the audio is played. The lower the sample rate, the longer each sample is played. This obviously diminishes quality while saving space. The higher the sample rate, the faster each sample is played, increasing quality and consuming more space.

The sample rate is how many samples are taken per second. If a sample rate is "8000 Hz", that is 8,000 samples per second (This is also commonly expressed in Kilohertz, 8 Khz). If the sample rate is 44,100 Hz, that is 44,100 samples per second. Let's figure out the math.

If we are playing audio at 44,100Hz and we have 16 bit audio and 2 channels, how many bytes are we playing per second?

(44100 * 16 * 2)/8 = 176400 Bytes Per Second

So, the formula is:

(Hz * Sample Bit Rate * Channels)/(Bits Per Byte) = Bytes Per Second

To portray these settings, the Windows Multi-Media extensions define this audio structure:

Ignore the top comment on "PCM", the only difference is the addition of the cbSize as the last value in the structure. PCM is what we will be talking about in this demo.

What is PCM?

PCM is Pulse Code Modulation. This is basically an audio format that is pretty much "raw" audio. This is generally the format that audio hardware directly interacts with. Though some hardware can directly play other formats, generally the software must convert any audio stream to PCM and then attempt to play it.

An analog audio stream generally looks like a wave, with peaks and valleys that are called its amplitude. These are generally sent to the audio hardware and is then digitized. The conversion basically samples this data stream at a given frequency such as 8000Hz, etc. This sampling will generally measure the voltage passing through so many times a second and generate a value based upon this. Remember, I'm not an audio engineer so this is just a simplistic way of thinking about this. The PCM format, for example, I believe the base line is either 127 or 128 (in 8 bits per sample). This is the middle, silence. Anything going below is a low tone and above is a higher tone. If you take a bunch of these values and play them at a certain speed, they will make a sound.

So how does it work?

In Windows, we generally use a few functions to create an audio stream. We need to open the audio output device using waveOutOpen(), specifying the audio format we want to use. This will then query the hardware to determine if this format could be used. If it can, we then get a handle to the device and we're ready to go. If not, we need to choose another format.

Choose Another Format?

Generally, if you do not ask for a variant of PCM, then you will not be able to open the wave device. So, how do you play other audio formats? Simple. Convert them to PCM. You do not need to know how to convert an audio format to PCM or even know anything about the format anymore. You just need to make sure the system has the installed codec that does it for you, then simply perform the conversion. What you do need to know is how to read the file format the audio is stored in!

How do I use the codec to convert?

You simply use the Acm* APIs. These allow you to open a codec and convert to another codec. This is, providing the codec supports converting to that other codec! If not, then you have to find another codec. For example, say MP3 converts to GSM and GSM converts to PCM (this is not true, but it's an example). So, you convert MP3 to GSM then GSM to PCM. This can also be true on more detailed level than codec. Perhaps MP3 only supports converting to PCM 16 bit. So you must use the MP3 codec to convert to PCM 16 bit, then the PCM codec to convert to PCM 8 bit. This is just an example.

You do not need to worry about what codecs are installed on the system, you simply need to call the APIs and the system will handle loading them. If they are not on the system, you will get an error message. This tutorial will not be covering how to create a codec driver nor will it cover the Acm* APIs.

How can the system find the Codec?

The codecs are registered in the registry. Each codec, known as the "format tag" in the Microsoft Structure, is actually a registered value! This means if you come up with your OWN format and you want it to work (for all of time across all platforms without ever conflicting), you need to register your format with Microsoft. Microsoft will then give you a number and add your format to their header files. The currently registered formats are listed in MMREG.H. Here is an excerpt.

This is the first few codecs on the list, the "..." means there's more between, I just wanted to show the MP3 codec tag since it's generally a very popular format these days. Here's the comment in the code that specifies that you must register the format.

The PCM format is not listed, but its format tag is 1.

The Windows Audio Architecture

The audio architecture can be described as the Multi Media architecture and the Direct Sound Architecture. This tutorial will only cover the Windows Multi-Media architecture. It's interesting to know that the DSound library will actually revert to using the Windows Multi-Media API if the Direct Sound driver cannot be found.

The below is a simple text diagram of how the Windows MultiMedia layers work (there is a better diagram on MSDN, FYI).

This is a basic architecture. Your Windows application will call waveOutOpen(), waveOutWrite(), etc. which will filter through WINMM.DLL and all end up in wodMessage() of WDMAUD.DRV. WDMAUD.DRV will in turn queue this request to be processed on a separate thread created in your process space. It will then wait for an event signaling that it can return along with an error code.

The separate thread will open the kernel driver, read/write data to the kernel driver and even send IOCTLs to the driver. Reads/Writes are asynchronous and the driver notifies their completion using an APC callback (standard ReadFileEx/WriteFileEx). This callback filters back up to your application and signals the method you specified. You can specify an event, a window message, a callback function or whatever. We will get into those later.

What is an APC? An APC is Asynchronous Procedure Call. If a thread is in an alertable state, a function can be executed on the thread the next time it's scheduled. SendMessage() is an APC. APCs are beyond the scope of this article. For more information on APCs, please visit MSDN's following URL: MSDN.

What happens in the kernel?

We do not need to follow all details, however, let's first talk about mixing. If you're using Windows 9x/ME, you will notice that only one application can open and play audio at a time. If you're using Windows 2000, XP or 2003, you will notice that many applications can open the audio device and play audio at the same time. Why is this? This is because those Operating Systems support kernel mixing. The sound card can generally play one stream of data at a time. This is from a simplistic standpoint. Perhaps there's different audio hardware that can do more streams at once, some even support mixing sample streams (Ala the GUS). However, from our standpoint, let's say it supports one audio stream. This means that multiple applications cannot use the device at once. So, Windows 98, simply allows one stream to be played. Windows 2000 is different since it has a kernel mixer driver. This mixer will perform a convergence of all the audio streams and mix them into one before sending them out to the audio hardware. I'm sure that the audio is mixed in the hardware when it can be, but there is also software mixing. I don't know if there is a limitation as to how many audio streams can be opened at once, perhaps none, perhaps just as long as system resources are still available.

Now, the audio needs to be sent out the hardware ports to the sound card. It would be very inefficient if the software had to take each byte of data and send it out the hardware port. This would cause CPU usage to go up as well as deplete CPU resources as this thread would need constant attention to send audio to the device. The answer? DMA. Direct Memory Accessing.

What is DMA and how does it work?

DMA allows data to be moved between memory and a device. The data can go either way. The advantage of this is that this is done independent of the processor. That is more efficient since the CPU does not need to deal with the sending of data to the device or reading data from the device. How do you set up a DMA transfer? Well, first you need memory that is NOT PAGEABLE. This memory cannot be paged to disk. This is because DMA does not interact with the CPU and does not even know about Virtual Memory. If the memory is out to disk, DMA will not know. Will DMA trap? No, it is simply reading the memory directly and will attempt to play it. It COULD trap the system if it's filling memory and the CPU is using it for something else. DMA does not know anything about how the CPU has organized memory. It simply gets an address and goes with it. That's what brings us to the next rules about DMA. The memory must be continuous and you must tell DMA the PHYSICAL ADDRESS of the memory location. DMA does not know virtual memory addresses, so it needs the physical address. The last thing you need to do is program DMA to interact with the correct device and tell DMA a speed at which to send the data to the device. The device must also be setup to receive the data. This is all done with programming the PIC (Programmable Interrupt Controller).

How do I setup DMA and the Device?

This is outside the scope of this document. I have not done it in Windows nor looked up how to do it. Most likely, there's an API to perform it for you just like everything else in Windows. Windows has generally encapsulated the most common components of the PC for standard use by your drivers. I have done this in DOS a long, long time ago though. It may be a bit beyond the scope of this tutorial. However, if you want the source, I could give it to you, I still have it. Be warned that it was written 8 years ago for Watcom.

Windows Applications

OK, getting back on track, we are now in the world of Windows! This means, you don't have to worry about setting up the PIC, the audio hardware, setting up an ISR or making a DMA transfer yourself from an application that just wants to play a sound. You don't even have to know how to convert the audio codec, the only thing you may need to know is how to read the file format. Though, even that could be found in a library written by someone!

So, in our application, we simply want to open a .WAV file and play it. This will first require us to know the .WAV file format. This should be easy to find, there are many resources on the Internet that explain file formats. A simple google search should turn one up.

WAV File Format

The wave file format is broken up into RIFF chunks. Some wave files are non-standard and contain additional chunks and information. This program we are writing does not care for those and we will only honor the traditional chunks. Here is a simple breakdown of the .WAV format.

RIFF
<4 bytes>
WAVE

This is the first thing you generally see in the format and at least this is what I have found. I have tried to use those 4 bytes as a size, but, I've generally just found it better to skip these. So, in the program we write, I simply skip the first 12 bytes. Next, everything is done in RIFFs. A RIFF works like this:

So, you have a 4 byte identifier that tells you what this block is. The next 4 bytes tell you the size of this RIFF. The last chunk is the data which is the size of the RIFF minus the header, so RIFF SIZE - 8. The very first RIFF in a WAVE file will be "fmt ". This is described as this:

So, in the program, we simply skip the first 12 bytes and then read this information out of the file. Now, the only other RIFF we care about is "data". This is defined as so:

"data"4 Bytes
DATA SIZE 4 Bytes
<YOUR Data Audio> (DATA SIZE - 8)

In our program, we will simply loop through the RIFFs ignoring all RIFF information until we get to the data chunk. We will then read the data chunk into a buffer to be used as our audio. Very simple.

Make it a Library

The first thing we will do is make it a library. The sense of OO design, we will create a Create function and a Destroy function. The application will simply be able to call these functions to get a "handle" to the object. The object will then be used to call into the functions provided by the library. We will support another function, Pause in order to pause playing. The library we make will be simple and endlessly loop the sound.

Let's declare our exported functions. Here is the header file we will use:

The library will simply return a handle to the wave library. The calling application will not need to know anything about what the implementation is, nor what a HWAVELIB handle is. The library could just return a number or a pointer, or just about anything as long as the wave library itself can convert it into something that can be used to retrieve session information on that handle. In our case, we will be simple and return the pointer. If the application doesn't know what the data is, it can't operate on it. A user *could* reverse engineer and find out what it is but then again, this isn't national security here and you could do that with anything, even C++ objects. We're just playing a WAV file!

Source Formatting

You can generally use whatever type of styling you want, however, I would like to explain mine since you will be reading my source. I like to write functions in the form of <ModuleName>_<Function>. This makes it easy to read especially in a large project since you can easily know where the functions are located. I see the Module name and I can go directly to the source. If I'm debugging, these names show up on the stack trace. Sure, with the PDB, the sources, etc. may show up too, but looking quickly at a function stack you can easily find the locations. I just find it easier, you are entitled to your own methods.

Opening a WAV Device

To open the wave device, we simply fill in our WAVE FORMAT structure, decide on a callback mechanism and call waveOutOpen().

Playing The Sound

The problem with audio is that it's continuous. This means at no time do we want the audio driver to NOT have a buffer, or else you'll hear a skip or silence! So, what we do is divide the sound up into buffers of certain lengths. The method could be to divide the sound into 5 buffers each 1/5 of a second long. The audio callback will occur and we will reuse the buffer.

In the application I wrote, I just hard coded these to have 8 buffers of 8000 bytes each. In a more robust application, you should generally allocate these buffers and sizes based upon the audio being played. I would calculate how many bytes it would take to play in a certain period of time, 1 or 2 seconds for example, then break this up into 5 or more buffers. You should experiment to determine what values are right in order to prevent skipping!

So, in order to play an audio buffer, we just need to do the following:

You need to setup this structure with the data buffer and the audio size. You then simply call the waveOutPrepareHeader to initialize this structure. You can then call the waveOutWrite() to play the data. You then simply wait for the callback and find the buffer that is done playing, refill it with the next sample, and waveOutWrite() again! In the program I did, I simply used an event to signal my main thread to loop through the buffers. This was the simply way to help prevent any deadlock issues (described later in this article under "BEWARE"). This is the code I used. The WaveLib_AudioBuffer is a function I used to help parse through the audio buffer and continuously feed the .WAV sample in a loop to the audio device.

The last component is the loop that simply keeps feeding the audio data back into the device as it's freed. This waits on an event that is signaled when we get a WOM_DONE notification. The loop simply loops through all the buffers reading the flags. If the DONE bit is set in the flags, we remove it, repopulate it with sound, and play!

The clean up code simply sets bWaveShouldDie to TRUE, calls SetEvent on the object, then simply waits on the thread handle for it to become signaled so it can exit. The bWaveShouldDie needs to be set first or else our thread may miss scheduling and the thread may not die. Of course, that would only occur if the audio was paused since we should still be getting the event signaled if we're playing audio via the WOM_DONE notifications.

Closing The Device

The last thing to note is that you must close the device handle when you are done. Stop feeding buffers to the sound driver and call WaveOutReset(). This API will simply cancel all pending buffers and send them back to the application. WaveOutClose will FAIL if there are pending buffers in the driver! So, remember to stop feeding buffers, call waveOutReset, then call waveOutClose(). The code above calls the reset in another thread which is why the uninit does not have it. This is an FYI since it's not immediately apparent and I didn't spell it out. Generally, waveOutReset() will not return until all the buffers are returned, but I've seen code that after calling waveOutReset() will wait on buffers to have their DONE bits set. You can do however you like, but in my experience with the system, you just need to call the API, make sure you're not calling waveOutWrite() anymore, then call waveOutClose().

BECAREFUL

Do you remember what I told you about the audio driver creating a separate thread in your process to perform actions? These use critical sections and guards to guard a thread message queue. If you attempt to do certain calls in the thread callback routine, you can deadlock. One example is that if you attempt to circle data. Let's say you open the wave input device using waveInOpen(). You then open the output device. You then attempt on your WaveIn callback to send data out the waveOut device. Situations like this could deadlock. The reason is, especially when you create a thread which is attempting to play data, it's sending waveOutWrite() which needs data processed on the wave thread that the callback is issued on. Now that thread is waiting for the wave thread to perform the write. However, the callback function now attempted to perform an action that also needed to grab the critical section. This deadlocks! Watch out for this...

PlaySound()

So, this whole article could have been summed up with one line of code. Pass the file name to this API and it plays the sound for you. However, what fun would that have been and what would you have learned? This tutorial may not have taught you how to play MP3s. However, it is the basis of how you could play anything using the same architecture, just throw in file parsing and audio stream conversions. Besides, wasn't this even a little more interesting?

Conclusion

Audio is not as fun as it used to be when you had to work with hardware yourself, however it is easier for an application to use audio now. I hope you have at least learned something in reading this article!

P.S. I know there's a skip in the code during the replay loop on some audio files. See if you can fix it!

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Share

About the Author

Toby Opferman has worked in just about all aspects of Windows development including applications, services and drivers.

He has also played a variety of roles professionally on a wide range of projects. This has included pure researching roles, architect roles and developer roles. He also was also solely responsible for debugging traps and blue screens for a number of years.

Previously of Citrix Systems he is very experienced in the area of Terminal Services. He currently works on Operating Systems and low level architecture at Intel.

Comments and Discussions

how can we control the 2 speakers individually through seperate channel using waveoutopen()
for suppose say , User tones (knob clicks, soft bounds, touches, etc) are played at a fixed volume that is not user adjustable in one speaker mapped to one channel. Alarm tones are played at the alarm volume set by the user in a seperate speaker through different channel

Hello all,
am pretty new to this ..
am using PlaySound to play a wav file , now i need to change the volume
of the file where in am using a progress bar to change it.
this is how i play the file..

if(m_PlayMMSound)
{
m_PlayMMSound->PostThreadMessage( WM_PLAYMMSOUND_PLAYSOUNDPTR ,0,(LPARAM)m_PlayThread);
m_PlayMMSound->PostThreadMessage(WM_PLAYMMSOUND_PLAYFILE,0,(LPARAM)strName.GetBuffer(strName.GetLength()));
}
can any one help me with that..
thanx in advance,
Hari.

Hi, your sample is exactly what I need except for one feature: I have to play 2 Wav file and I need to control the volume for each file independently.
Can you tell me how I can control volume for each thread ?
If I call waveOutSetVolume( pWaveLib1->hWaveOut, dwVolume1 );
waveOutSetVolume( pWaveLib2->hWaveOut, dwVolume2 );

I get the same volume level even if dwVolume1 and dwVolume2 hade different values.

Even in Visual Studio, there should be a method of adding library files outside of the source and doing so allows the code to be more portable between environments. In the SOURCES/makefile file I just add the list of libraries to include there. Visual Studio should have an option to do this outside of the actual code and then the code would not need to be modified.

Added the C files to a Visual Studio 2005 C++ project then went to
Proprties and for WaveLib.c and wavetest.c set 'No-Pre-Compiled Headers' because it wouldn't work with stdafx.h and now I am getting
far less errors (20 instead of 200) but the first one is:

error 2061: syntax error: identifier HWAVELIB - wavelib.h line 15

It seems it cannot find the definition for the HWAVELIB type.

There are another 19 errors, but I think they all stem from this.

What have I missed?

Have been through your article very carefully and can't see what I have missed - there are no special configuration instructions in the tutorial, and no clues in the discussion forums.

The code comes with a hand crafted makefile and using VC++'s VCVARS.BAT you can type "nmake" to build it. The sources are also in C so if you renamed the files CPP or attempt to compiled as C++ you may run into problems as well since I've never compiled it this way and this code is from 2003 so it probably has coderot.

Try "TestDemo" it comes with an audio library similar to what is in this article, but its from the same year so I haven't used this in years. If you use the DDK I have ported this audio library with more options and put it into a 2D game. That one compiles with the DDK/WDK build environment if you wanted to use that.

hi
i want to play two sound file same time and want to listen the sound of these file
separtly on left and right speaker,means sound of one file should play on left
speaker and second file should play on right speaker

HI!
I noticed that the sound device isn't working exactly in the rate I specified, which is 8000Hz. It's important to me because I use this application for playing UDP data.
Is there a some kind of way to increase the bit rate accurancy of the sound device?