Introduction

I really like C#. It takes me about half as long to get the compiler to understand what I mean in C# as in C++. But because Managed DirectX is pretty new, it's not easy to find good examples of using it from C# and .NET managed code.

In this second revision of my article, I pass on some of the things I've learned about playing sounds in C# code, both from my own experiments and helpful input from the good folks who read the first one and commented. Thanks to all of you!

In particular, I've found out how to avoid an annoying bug that infests DirectSound's interaction with some sound-card drivers, that appears when streaming long sounds through a realistically-sized buffer, allocated with default properties. (I'll explain that somewhat-awkward clause later!)

Here's my motivation: I'm working in C# on some new tools for transforming signals, especially sounds. As a part of that work, I needed to be able to read in lengthy sound files, modify them, write them out, and listen to them in the .NET environment. It seemed that would be easy, so I searched for some example code and documentation about working with WAVE files and playing them with the Managed DirectX ("MDX") DirectSound classes. I found that the vast majority of DirectX examples are written in unmanaged C++ (or even in C, working at the Win32 API level!). I found C# examples of reading WAVE files, but they sometimes broke on legitimate files, and didn't really support writing out your own. I found articles on playing short sounds from static sound buffers (more explanation below, and in the DirectX documentation). I found an interesting article, "Building a Drum Machine with DirectSound" by Ianier Munoz, that shows one approach to streaming sampled sounds in MDX.

The DirectX Sample Browser that comes with the December 2005 SDK has one sample (helpfully named "CaptureSound Managed") that shows how to get sound into a streaming CaptureBuffer, which is a similar, but not identical, problem. Both these samples are useful, but I thought their approaches were unnecessarily complex for what I wanted to do -- stream long sound files through a MDX DirectSound buffer of reasonable size.

So I wrote my own classes, staying within the managed boundaries of .NET. I think my approach is both reasonably efficient and reasonably easy to understand and use.

Included in this project:

A RiffWaveReader class that wraps a RIFF WAVE file and parses it, so your C# code can easily get at its format properties and the WAVE data itself.

A RiffWaveWriter class that accepts sampled PCM WAVE data and assembles a RIFF-standard-conforming WAVE file to write to disk.

MdxSoundBuffer classes that wrap and manage MDX DirectSound "Secondary Buffers," which are the in-memory objects MDX uses to play sounds. There are both static and streaming sub-classes. The StaticSoundBuffer is for playing short sounds stored in-memory. The StreamingSoundBuffer and SimpleStreamingSoundBuffer are for arbitrarily long sound files, stored on disk, that just flow through a short, circular in-memory buffer to play. Each incorporates its own server thread that keeps the MDX Secondary Buffer full. I'll point out the difference between these two classes below.

A little dialog form, SimpleMdxPlayer, which works with SimpleStreamingSoundBuffer and shows just the very simplest and most reliable way to play both static and streaming sounds in MDX DirectSound.

Another dialog form, MdxSoundSurveyor, lets you try out and explore some of the interesting things that happen on your system when you set up DirectSound devices and buffers in various ways. It also demonstrates recovery from some kinds of errors, including "out of memory" errors, malformed .wav files, and spurious events caused by the bug mentioned above.

Background

WAVE Sound Files

The generally-encountered standard for storing media files (including, in the Windows world, AVI and WAV) is the "Resource Interchange File Format" (RIFF). There are several more-or-less clear articles about RIFF available here at The Code Project, and on the web more generally. Two I found most useful are this Wikipedia article and this MSDN article.

Although RIFF encompasses dozens of different kinds of resource files, the only one MDX DirectSound understands is WAVE (.WAV or .wav). Furthermore, MDX DirectSound only works with linear PCM (pulse code modulation) files, and, according to the DirectX 9.0 SDK documentation for WaveFormat.BitsPerSample property, only 8-bit or 16-bit samples. That's really an arbitrary restriction imposed by Microsoft's MDX architects, not inherent in the RIFF specification or in the PCM sampling technique. If you or I want to do something fancier, we're on our own!

Just a few bits of jargon: In the real world, "PCM" means the (sound) signal was sampled at regular intervals in time, and that each sample is represented entirely. It does not specify how these instantaneous values are approximated ("quantized") and stored. In this project I use 16-bit integers, and the standard stereo arrangement. That means that every time sample has two 16-bit numbers: 4 bytes. The set of two channels' values in a single time sample is called a "frame".

Two-channel, 16-bit linear PCM, sampled at 44100 Hz, is "standard" CD quality. It's very common and perfectly adequate for this example. In real life, however, floating-point samples are much easier to do math with. That's why I have my own version of WaveFormatTag. That way, I can process to my heart's content before finally converting to 16-bit linear PCM to feed to the sound card, without accumulating significant arithmetic errors.

I first developed this project using Visual Studio 2003 and .NET 1.1, and it still relies on the December 2005 MDX DirectSound documentation. Please refer to that for background and clarification of the many things I don't cover. We may find better documentation (and more-complete MDX classes) in later updates -- I hope so!

The value proposition for MDX is much faster development, with most of the performance of the unmanaged flavor. The downsides:

A modest performance hit, probably on the order of 5 or 10%.

Less-predictable timing than unmanaged C++ (even given the fact that Windows is by no means a real-time OS) due to the operation of the .NET garbage collector in the managed environment.

Notably sketchy and sometimes very confusing documentation.

A less-complete class library than for the unmanaged API.

Oh, and there's no surplus of good sample code. In fact, I found that it's necessary to consult the documentation for unmanaged C++ programming to get answers to even intermediate-level questions about MDX programming. Despite all that, I'm glad to be working with MDX in C#. Really.

The DirectSound Device Class

To play sounds, the DirectSound classes need a DirectSound.Device object, as discussed in the MSDN documentation. This is the basic programming interface to the properties, capabilities, and low-level drivers for a hardware sound-I/O device. There are lots of different sound-I/O devices, with a pretty confusing array of capabilities (see the Microsoft.DirectX.DirectSound.Caps structure documentation for more than one probably needs to know.) In my MdxSoundSurveyor_Load(*) method, I find the default Device, instantiate a Device object, and dump its Caps structure to the DebugConsole (see below). I use only the very most basic default capabilities, except for setting the cooperative level to "Priority," which is the only appropriate choice. Examining the properties of the default device can give you some insight into what your sound-I/O device (or "sound card") can do, and what changes later when you set up SecondaryBuffers.

Secondary Buffers

As a .NET programmer, you probably won't deal with feeding the Device and its associated primary buffer directly. Instead, you create DirectSound.SecondaryBuffer objects for each sound resource or file. Short sounds can be very simply loaded in their entirety into a "static" SecondaryBuffer. But audio occupies lots of memory, and a "short" song easily can occupy 30 MB. Try a Strauss waltz and you've got an "Out of Memory" exception. That's why there's the hard way: streaming data into a short SecondaryBuffer (perhaps 1-2 seconds worth) and keeping it filled with new samples as the sound plays.

The specification and operation of SecondaryBuffers can be pretty confusing. I'll try to explain what you need to know in fewer words than the SDK uses. But I'll be leaving out a lot of what you don't need in order to understand this article.

When you create a SecondaryBuffer, you specify the characteristics of the sound file, and the properties and "location" of the buffer, by creating and modifying a Microsoft.DirectX.DirectSound.BufferDescription object to pass to the constructor. When you are working with a modern, PCI-bus-based system, the "location" property doesn't mean what it might seem to. Because the PCI bus is fast enough to communicate many channels of sound to the sound card, the sound data in your SecondaryBuffers is really in the main system memory. (Many, if not most, modern sound cards have no sound-data-buffer memory of their own.) If you specify the LocateInHardware property, all that really happens is that the MDX framework tries to set up the sound card to read your sound data from memory and do the arithmetic to mix your sound into the sound stream you hear through your speakers.

It wasn't so very long ago that mixing sound in real-time could significantly burden the CPU. But these days, CPUs are so fast that letting them do the arithmetic to mix a dozen extra channels into the main sound output is no problem at all. And, as it turns out, merely specifying LocateInSoftware minimizes a problem with streaming buffers' Notification Events. (Those will be covered in the next section: please read on!) The unmanaged C++ documentation for DirectSound mentions this problem, but the MDX (C#) documentation does not. (SuperWill posted a message for the first revision of this article, pointing out a partial workaround. I suspect he was familiar with the C++ DirectX API!)

Unfortunately, if you let the framework decide, and your sound card supports hardware buffers, it may well choose LocateInHardware -- at least it does on my development system! Then you'll have to deal with lots and lots of spurious notifications.

And as it turns out, that's still not quite the whole story. Even if I use a software secondary buffer, I still get one lone spurious notification, within the first segment of the buffer, on each of my systems. The simplest way around that is just to load only 3 of the 4 buffer sectors at any one time.

Notifications During Playback

The SecondaryBuffer object can send Notification Events as it plays, when the play pointer has passed preset positions in the buffer. This involves just a brush with Win32 WaitHandles, a topic many .NET programmers would be happy to avoid. But it's really the most effective way to know when to refill sound data in the buffer. I notice that Ianier's drum machine code polls the position in the buffer at regular time intervals, rather than using the Notify events. For his application, with multiple streams, that's simpler. But because his timer-driven code may not be fired in precise lockstep with the real-time playing of sound, he must poll the current status of both the play and write pointers and pull variable amounts of sound data into the buffer to ensure it doesn't run dry. On the other hand, the DirectX SDK CaptureSound sample uses Notification Events, somewhat as suggested in the MDX DirectSound documentation on MSDN.

Specifically, the MSDN section on "Using Streaming Buffers" directs us as follows: "When all data has been written to the buffer, set a notification position at the last valid byte, or poll for this position. When the cursor reaches the end of the data, call Buffer.Stop." And indeed the CaptureSound sample has space for one extra notification event in its array of notification positions, so I think the Microsoft programmer intended to do this. The trouble is, trying to set a notification while the buffer is playing causes an INVALID_CALL exception. Oops.

Never fear: there's a workaround for that, too. There is always some pipeline delay between the executing code feeding the data to the SecondaryBuffer, and the real-world clock that converts it to sound in your sound card's Digital-to-Analog Converter ("DAC"). So one can take advantage of this latency and call SecondaryBuffer.Stop(), set the Notification event at the end of the data, and do SecondaryBuffer.Play() again, (probably) without breaking the sound stream. (Thanks to Aaron Lerch for this observation.) I make it a point to preset as much of this little dance as I can in the constructor, to minimize the time spent Stop()ped.

As all mathematicians and programmers know, boundary conditions bite. So it is with our project. In the CaptureSound sample, after the capture stops, the samples remaining in the buffer get handled "manually" instead of with an event. That's easy, because one knows where the end is, and can ignore the rest of the (stale) data in the CaptureBuffer. And it doesn't need to happen in real time!

Windows is many things, but it's not a hard-real-time OS. So, in our players, even though we can set the end-of-data event notification, we can't really guarantee that we can call SecondaryBuffer.Stop()before old data following it in the buffer gets played. So, just to be professional about it, I make sure there's a stretch of silence after the end of the sound, which will almost never get played. It will very rarely matter, but most of us know from experience that what can go wrong, will.

Errors

Yes, even perfect code (yeah, right) can experience errors. Some can be dealt with gracefully, some not. I mentioned above that you can get an OutOfMemoryException if you try to load too large a sound file into a static SecondaryBuffer, or if it just fits and then, naturally, you try to play it. (It's not too clear just why playing the buffer takes significantly more memory than loading it, but I've seen that cause the exception. Virtual vs. virtuous, perhaps!) In either case, you'd like to catch the exception, release the memory and other (device) resources associated with the sound file, stream, and buffer, and prompt the user to try something else. The best, most concise, and most effective treatment of deterministic finalization in .NET that I've seen is in Juval Lowy's book, Programming .NET Components. His Deterministic Finalization template, which I use for handling Dispose(), Close(), and the like, is available from the iDesign website.

I hereby recommend and acknowledge using his technique in my code.

Free! Console with Every Windows App

There's another bit of background you that may already know, but I find it so useful in development that I want to point it out here. If you build a Windows Forms app as a Console Application (set the output type property by right-clicking the project in Solution Explorer), you'll see a background console as in my screenshot. I write to the console in Debug mode using my DebugConsole class, which has the [Conditional("DEBUG")] attribute, so it evaporates if you do a Release build. My MdxSoundSurveyor class uses DebugConsole a great deal, to let you see lots of the gory details of devices, buffers, and the operation of the thread that streams data through the StreamingSoundBuffer. The MdxSoundSurveyor.exe included in the download was built as a Console App in Debug mode, so you will see everything when you run it on your own system.

A Tour of the Code

Namespace StreamComputers.Riff

First, in the namespace StreamComputers.Riff, I have several classes for handling RIFF WAVE audio files. The abstract class RiffWaveManager holds the reference to the Stream object for the file, and the format and other properties of the audio WAVE file. It supplies some helper methods that return multi-line strings useful for investigating file and format properties using, of course, the DebugConsole, or any other facility you'd like.

The RiffWaveReader subclass constructor takes a WAVE filename string, creates a FileStream for it, and a BinaryReader on the FileStream, then attempts to parse the file to extract its format and length properties. If it finds a broken file (try opening badRIFF.wav), it catches the error, disposes itself, and throws a custom RiffParserException upward to the object that constructed it, in our case, the Windows Forms app. That way, the app can inform the user that his musicmonger cheated him, and politely ask if he wants to try another file.

If the file is to its liking (even if the music isn't), the RiffWaveReader object that's constructed exposes the format and length properties, and supplies methods to read data from the file into transfer buffers. GetDataBytes(*) is useful for transferring to MDX buffers, while GetPCM16DataFrames(*) accesses the audio for further processing and perhaps writing to an output file. Both of these methods accept a starting position, so you can "seek" to a point inside the sound file to start reading.

The RiffWaveWriter class encapsulates a file stream and BinaryWriter for making a new WAV file. As I warned you, I'm only dealing with standard "CD audio" format -- but you should be able to add methods to handle other PCM formats without much trouble.

(No, MP3 is not a PCM format. Please don't ask me about making or playing MP3s -- I still have ears. If you want more from me on audio quality, please consider my article, "It Ain't Just Rocket Science" in Positive Feedback online.)

The SetCD16Info() method does just that, setting up the RIFF header for that format. WriteHeader() actually writes the header to the file stream, and WritePCM16DataFrames(*) does what it says, too -- that's the way you'd put your processed audio into the file. When you're done, call WriteFinal() which fixes up the two length fields, and then call Close() to flush the file to disk and close it before disposing of the writer, stream, and the RiffWaveWriter object itself.

Namespace StreamComputers.MdxDirectSound

The StreamComputers.MdxDirectSound namespace contains the Windows Forms application MdxSoundSurveyor. It's meant to be built as a Windows Console Application, in Debug mode, and that's what I did for you to make MdxSoundSurveyor.exe. It lets you explore some details of the workings of MDX DirectSound on your system. You use it by stepping through these actions:

Select a WAV file

If you wish, you can truncate it to some short length and write it out (very useful for testing!)

Create a static or streaming buffer to be "located" (really, mixed) in hardware or software, and, finally

Play the buffer with the classic three-button audio player interface: Play, Pause, and Stop

With the Free Bonus Console in the background, you can see fascinating facts about the sound device, WAV files, the buffers, and the progress of the streaming data transfer, if you choose that option.

As with any GUI, the hard part is sequencing the user through operations that make sense, and defending against nonsensical inputs. And as with any GUI, this one isn't perfect -- but it's usable.

When it first loads, the Surveyor finds the default sound device and dumps its Caps structure to the console. Note the number of free buffers and free hardware memory bytes before you create the secondary buffer. My Creative SB Audigy2 has 62 free "buffers" (mono mixing channels) and 0 free bytes when I'm not running another sound-enabled application. That's what I'd expect: Windows occupies 2 of the 64 maximum "buffers" (channels) for stereo system sounds, and there's no memory on-board.

Now, select a sound file, and see if RiffWaveReader OKs it. If not, try, try again.

Next, select whether you want your sound mixed into the output by the sound-card hardware ("Locate (Mix) in Hardware"), or the CPU ("Locate (Mix) in Software").

Now you can select either a static or streaming buffer to play it. When you hit Create Buffer, the label control displays the size of the buffer (unless it won't fit, in which case a message box tells you to try something else). I chose a small enough buffer size for the streaming case that almost any system should have enough free memory to use it. If you haven't got 256K, God bless you.

The the hardware device Caps should now reflect a smaller number of available "buffers" (channels) if you chose "Locate (Mix) in Hardware."

Now take a look at the code. The abstract class MdxSoundBuffer holds a reference to a MDX DirectSound SecondaryBuffer. What kind of SecondaryBuffer gets created is up to the subclasses. It also promises that its subclasses will implement the IPlayable interface: you guessed it, Play(), Pause(), and Stop().

The StaticSoundBuffer subclass is simple. It tries to create a SecondaryBuffer to hold the file, and catches exceptions that might reasonably occur: OutOfMemoryException if your tastes are bigger than your budget, and ArgumentException if the file is corrupt. It informs the user of the problem, and throws upward, so its client (the GUI) can dispose of it, and let the user try again.

Back at the GUI, if you chose the static buffer radio button, the buffer length is the size of the entire data payload of the WAVE file. When you hit Play, you should hear it from your system's default sound device. If you don't, check your system's sound settings -- it's not my fault. You can pause, (un)pause, and let it play to the end, or hit Stop. All very normal.

And Now the Fun Begins

If you select a streaming buffer, and hit Create Buffer, you'll see a much smaller buffer size shown. As I explain in the comments in the StreamingSoundBuffer subclass, I chose a nice power-of-two size for the buffer, 256K bytes, which at CD-audio rates is about 1.5 seconds of sound. (Eventually, we'll tell the buffer to play and loop back when it reaches the end.) Now the fun begins.

The constructor uses a RiffWaveReader to parse the file and ensure it's legit. Then it makes a DirectSound.WaveFormat struct with the file's format properties, and passes it to the constructor for a DirectSound.BufferDescription object. Then I set some of its properties differently than the defaults. Consult the MDX documentation for the vast array of possibilities, few of which concern me. At last, a different SecondaryBuffer constructor (there are seven!) sets up a streaming type buffer: one that does not load itself with sound data.

To load the buffer with data, we need some more mechanisms. As I mentioned in the background discussion, DirectSound offers a way to get the buffer to send us events when it has played some of its data, and we can safely fill in new data over the old. I use a single AutoResetEvent as a signal, and set four points in the buffer where I want to be notified. So I've defined four equal segments of the buffer, each holding about 370 ms of sound data. (This differs from the arrangement described in the MDX documentation, which seems to use multiple events.) A Notify object associated with the buffer gets the array of BufferPositionNotify structs as a parameter in its SetNotificationPositions(*) method. So, after we fill the buffer with the first 256K of sound and set it to playing, it'll fire events every 370 ms.

But the GUI won't be happy being bugged every 370 ms. So StreamingSoundBuffer.CreateDataTransferThread() makes a dedicated thread to handle the events and transfer data to the buffer in 64K byte blocks.

The thread will wait for an event, then dutifully execute its work function, the cleverly named DataTransferActivity().

At this point, I'd like to shift your attention to the SimpleStreamingSoundBuffer class. It's a simpler version of the StreamingSoundBuffer code we've been looking at, so it will be easier to follow.

(The only reason to keep StreamingSoundBuffer around is to handle the problems caused when (for some reason I haven't thought of) you need to have a streaming buffer handled and mixed by the sound-card hardware, or if you really want to see the wierd stuff that happens during play.)

SimpleStreamingSoundBuffer's thread's main job is TransferBlockToSecondaryBuffer(). It also must watch for two special situations.

First, it checks to see whether there's any more wave data available to transfer. If so, it checks to see whether it's been aborted (the user hit the Stop button or its SoundBuffer is being replaced by another). In that case, it returns immediately and the thread terminates. Otherwise, it waits for a notification event. When that happens, it has room to transfer another block to the buffer.

The TransferBlockToSecondaryBuffer() method calls RiffWaveReader.GetDataBytes(*). When the data stream from the sound file ends, it signals this fact by returning the number of bytes it actually read from the file, and then the thread notices it was less than the block size. (RiffWaveReader fills the remainder of the transfer buffer with 0's.)

We're almost done now. We could just set the end-notification event at the end of the sound data, wait for it, call Buffer.Stop(), and then call it a day. But we don't know just how long it will take to get the event and respond, and the data's end just might be right at the end of a sector. In that case, the buffer might play old data for a moment, before it actually stops. Just to be professional about it, let's set one more whole sector to silence (0s in the CD audio case), to avoid making a nasty noise. (That may seem unnecessary, but a single sample of bad digital audio data is much more perceptible than a bad pixel, or even a frame of video.)

The thread waits for play to reach the end, Stop()s the buffer, and puts the SimpleStreamingSoundBuffer into the Idle state. Then, it returns from its work method, and goes away to wherever dead threads go. (Yeah, I know...)

That pretty much covers the operation of the SimpleStreamingSoundBuffer.

Its older brother, StreamingSoundBuffer, is considerably more complicated, because of the bug I mentioned several times before.

It took me a day or so to figure out what was going on when I started experimenting with streaming buffers. There seems to be a bug in the way hardware-capable sound card drivers interact with DirectSound. When the streaming buffer is "located in hardware", that is, the sound card is in charge of accessing sound data from system memory and mixing it to the output stream, the notification event occasionally fires for no apparent reason. This happens unpredictably, but it seems to be correlated with other activities on my computer -- opening an IE window, moving files around, etc. Sometimes it seems to fire off just for fun. I can make it go crazy with extra events just by streaming some MP3 sound in Windows Media Player at the same time our application is running. If you examine the screenshot, from MdxSoundSurveyor, you'll see lots of these spurious events dutifully reported by the transfer thread. Individually, I could hear them as forward jumps of a song lyric -- just a fraction of a second. So I set a trap, as shown in the code. The good news is that the regular events are reliable. If they dropped out, the fix would have been harder! An extra event would make the thread write data over the segment that's now playing, an error. So the filter looks at the current play pointer, and refuses to transfer a block into the segment it's in.

(I think this is related to the fact that the sound card has only one interrupt to get the Windows OS's attention. Perhaps there's no reliable way to tell just which buffer (that is, channel or sound data stream) is in need of service. But I'd think some kind of vectoring or cause-identification would be implemented. If you know the details of sound-card device drivers and hardware, please let me know what's really going on!)

Fortunately, the confusion doesn't happen when the CPU does the data access and mixing. So the problem can be avoided by always selecting "LocateInSoftware" in the BufferDescription object used to construct a streaming SecondaryBuffer. That's wired into SimpleStreamingSoundBuffer , which the SimpleMdxPlayer uses. If you really don't need to spare the CPU the tiny extra burden of buffer handling and mixing, by all means use the simple version!

(One more side note: because static buffers, used for short sounds, don't use notification events, they don't have this problem. So, for instance, if you're coding up a game with lots of short sound effects or loops, you can let the DirectX framework decide for you to use the hardware to mix these buffers into the output stream, saving some CPU cycles while still avoiding the spurious event problem. That's what I do in the SimpleMdxPlayer.)

Well, the tour's over. Hope you liked it. On the other tentacle, if you think this is way too many words, just read the code.

Points of Interest

I recommend using the SimpleStreamingSoundBuffer unless you absolutely must use the sound-card hardware to mix your streaming sound. Just use SimpleMdxPlayer as an example.

If you do some exploring with MdxSoundSurveyor, please let me know if you ever get more than one spurious notification when the buffer is "Located in Software." Similarly, if your system does not get them when you use a streaming buffer in hardware, with Windows Media player playing sounds at the same time, let me know what brand and model of sound card you're using, what driver version, etc!

If you find any further bugs with DirectSound or my code, please let me know!

License

This article has no explicit license attached to it, but may contain usage terms in the article text or the download files themselves. If in doubt, please contact the author via the discussion board below. A list of licenses authors might use can be found here.

Comments and Discussions

Sir,
I have a doubt.
I am having two audio streams received to play..How to mix both the audio streams and then play using directx..
I would be really grateful if you can put some light.
Thanks
Ketan
ketansnadar@gmail.com