Streaming Music with XAudio2

Sam was the only member of the party who had not been over the river before. He had a strange feeling as the slow gurgling stream slipped by: his old life lay behind in the mists, dark adventure lay in front.

--- J.R.R. Tolkien, The Lord of the Rings

Streaming is the process of playing back an audio file while maintaining only a small portion of its data in memory, which allows for large audio files, such as background music, to be played back, with very little memory usage.

To stream an audio file, its data must be read in in chunks instead of completely loading all of it at once. To do so, the audio data is read asynchronously and the data chunks are stored in a queue of buffers. Once a buffer is filled, it is submitted to a source voice, which then processes the buffer, i.e. plays back the audio data. Once the source voice is finished playing the data inside a buffer, the buffer again becomes available for reading more data. This process allows for large audio files to be played back with minimal memory consumption. Obviously, to harness the power of this technique, the streaming code should be placed in a separate thread, where it can sleep while it waits for long-running disk and audio operations to finish. XAudio2 uses callback structures to wake those threads by triggering events when audio operations have finished.

In this tutorial we will learn how to implement the just described technique using the Windows Media Foundation's source reader in asynchronously mode.

Asynchronous Reading

The Source Reader operates either in synchronous mode or asynchronous mode. In the previous tutorial we used the Source Reader in synchronous mode, which is the default. In synchronous mode, the IMFSourceReader::ReadSample method blocks while the media source produces the next sample. The larger the audio file we are trying to load, the longer the calling thread is blocked. Obviously this is not what we want for a game.

In asynchronous mode, the ReadSample returns immediately and the work is performed on another thread. After the operation is complete, the Source Reader calls the application through the IMFSourceReaderCallback callback interface. To use asynchronous mode, a pointer to a callback structure must be provided on creation of the source reader.

The callback interface has the following three methods:

IMFSourceReaderCallback::OnEvent

This method is called when the source reader receives certain events from the media source.

IMFSourceReaderCallback::OnFlush

This method is called when the IMFSourceReader::Flush method completes.

IMFSourceReaderCallback::OnReadSample

This method is called when the IMFSourceReader::ReadSample method completes.

There isn't much to say about this structure. I won't talk about the COM stuff, as it is rather tedious, just note that whenever we read a sample, all we really have to do is to check whether we reached the end of the audio file or not.

SourceReaderCallback sourceReaderCallback

This is a callback structure similar to the above described callback structure for the source reader, but this one is used while playing the audio chunks. We will talk more about this in a moment.

StreamingVoiceCallback streamingVoiceCallback;

This is the callback structure for the source voice as explained above.

static const int maxBufferCount = 3

This member defines the maximal number of buffers to use during streaming.

bool stopStreaming = false;

This boolean member tells the streaming function whether it is time to go to bed or to happily continue streaming audio.

In the next section, we will talk about the following three functions in greater detail:

createAsyncReader

This method creates a source reader in asyncrononous mode.

streamFile

This method streams an audio file from the harddrive.

loopStream

This method is the actual workhorse for the audio streaming.

Asyncronous Source Reader

As for synchronous reading, to stream a file, we first have to attach a source reader to a file on the hard drive. To get asynchronous reading, we set the corresponding attribute of the source reader as follows:

Final Preparations

The streamFile method basically simply prepares the source reader for asynchronous reading and then loops over the audio data for as long as we desire, using an XAudio2 source voice to play back the audio chunks that are available. To play back those chunks, another callback structure is needed, a source voice callback structure:

Once again, we will ignore the tedious COM stuff; and we then realize that there really isn't much to do here - the only event we are interested in handling at the moment is when we reach the end of a buffer, this is done, what a surprise, in the OnBufferEnd method.

Looping

The actual work is done in the looping function. Once again, as in the synchronous case, we will enter an endless loop. This time though, we will have two actual breaking conditions: if the stream is specified to loop, the only way to get out of the forever loop is to set the stopStreaming boolean to true; else it is also possible to break the loop from reaching the end of the audio file.

Notice that this time we have set all but the first parameter of the ReadSample method to zero, indicating that we indeed want to read the audio data in asynchronous mode. Once the sample is read in, we check if we have reached the end of the file and, if that is the case, whether we should restart the stream (loop) or to stop streaming:

The restart function of the callback structure simply sets the end of stream boolean to false and empties the sample. PROPVARIANTs are used to set properties of objects, in this case, we simply reset the current position of the source reader to the beginning of the audio file (GUID_NULL). By passing VT_I8, we specify that the type of the property is an 8-byte signed integer in the little-endian byte order format.

Okay, with all the looping and breaking stuff out of the way, it is time to actually read the data and to prepare it to be played back by an XAudio2 source voice. This is actually similar to the synchronous case:

The beginning is just the same as for the synchronous case: we have to convert the sample data into a contiguous buffer. Once done, the data is copied into an array of bytes. The only real difference is here:

// wait until the XAudio2 source has played enough data
// we want to have only maxBufferCount-1 buffers on the queue to make sure that there is always one free buffer for the Media Foundation streamer
XAUDIO2_VOICE_STATE state;
for (;;)
{
sourceVoice->GetState(&state);
if (state.BuffersQueued < maxBufferCount - 1)
break;
WaitForSingleObject(streamingVoiceCallback.hBufferEndEvent, INFINITE);
}

If there is no free buffer, we have to wait until the source voice is done playing back the audio data before we can start filling the next buffer.

A stream event links to an audio files on the hard drive, obviously the audio file we want to stream, it defines whether it wants to be looped or not, and it specifies its type, i.e. whether it is a music file or a sound effect. This is important to send the source voice into the correct submix voice.

To be able to stream music files while playing the game, the audio component of our game calls the above implemented XAudio2 streaming function from a worker thread:

In C++-11, starting a new thread is quite easy, simply have to create a new std::thread. As parameters we have to input the function we want to be executed on the new thread, in this case, the streamFile method from the AudioEngine class, we then specify the instance of the class (engine), and finally pass the desired parameters for the streamFile function.

To stop a stream, we simply set the stopStreaming variable of the XAudio2 engine to true: