10 years ago I blogged that one of my most wanted C# language features was the ability to perform reinterpret casts between different array types (e.g. cast a byte[] to a float[]). This is something you frequently need to do in audio programming, where performance matters and you want to avoid unnecessary copies or memory allocations.

Span<T>

So I'm very happy that in .NET Core 2.1, the new Span<T> functionality gives me exactly what I wanted. It's very exciting to see the significant performance optimisations this is already bringing to ASP.NET Core and wider parts of the .NET framework.

I've now updated my app to use the final released bits, and published the code to GitHub, so here's a quick runthrough of the changes I made and their benefits.

IWaveProvider and ISampleProvider

The two main interfaces in NAudio that define a class that can provide a stream of audio are IWaveProvider and ISampleProvider. IWaveProvider allows you to read audio into a byte array, and so is flexible enough to cover audio in any format. ISampleProvider is for when you are dealing exclusively with IEEE floating point samples, which is typically what you want to use whenever you are performing any mixing or audio manipulation with audio streams.

Both interfaces are very simple. They report the WaveFormat of the audio they provide, and define a Read method, to which you pass an array that you want audio to be written into. This is of course for performance reasons. You don't want to be allocating new memory buffers every time you read some audio as this will be happening many times every second during audio playback.

Notice that both Read methods take an offset parameter. This is because in some circumstances, the start of the buffer is already filled with audio, and we don't want the new audio to overwrite it. The count parameter specifies how many elements we want to be written into the buffer, and the Read method returns how many elements were actually written into the buffer.

So what does this look like if we take advantage of Span<T>? Well, it eliminates the need for an offset and a count, as a Span<T> already encapsulates both concepts.

This not only simplifies the interface, but it greatly simplifies the implementation, as the offset doesn't need to be factored into every read or write from the buffer.

Creating Spans

There are several ways to create a Span<T>. You can go from a regular managed array to a Span, specifying the desired offset and number of elements:

var buffer = new float[WaveFormat.SampleRate * WaveFormat.Channels];
// create a Span based on this buffer
var spanBuffer = new Span<float>(buffer,offset,samplesRequired);

You can also create a Span based on unmanaged memory. This is used by the WaveOutBuffer class, because the buffer is passed to some Windows APIs that expect the memory pointer to remain valid after the API call completes. That means we can't risk passing a pointer to a managed array, as the garbage collector could move the memory at any time.

In this example, we allocate some unmanaged memory with Marshal.AllocHGlobal, and then create a new Span based on it. Unfortunately, there is no Span constructor taking an IntPtr, forcing us to use an unsafe code block to turn the IntPtr into a void *.

It's also possible to create a new Span from an existing Span. For example, in the original implementation of OffsetSampleProvider, we need to read samplesRequired samples into an array called buffer, into an offset we've calculated from the original offset we were passed plus the number of samples we've already written into the buffer:

But the Span<T> implementation uses Slice to create a new Span of the desired length (samplesRequired), and from the desired offset (samplesRead) into the existing Span. The fact that our existing Span already starts in the right place eliminates the need for us to add on an additional offset, eliminating a common cause of bugs.

Casting

I've said that one of the major benefits of Span<T> is the ability to perform reinterpret casts. So we can essentially turn a Span<byte> into a Span<float> or vice versa. The way you do this changed from the beta bits - now you use MemoryMarshal.Cast, but it is pretty straightforward.

This greatly simplifies a lot of the helper classes in NAudio that enable you to switch between IWaveProvider and ISampleProvider. Here's a simple snippet from SampleToWaveProvider that makes use of MemoryMarshal.Cast.

This eliminates the need for the WaveBuffer hack that we previously needed to avoid copying in this method.

Span<T> Limitations

There were a few limitations I ran into that are worth noting. First of all, a Span<T> can't be used as a class member (read Stephen Toub's article to understand why). So in the WaveOutBuffer class, where I wanted to reuse some unmanaged memory, I couldn't construct a Span<T> up front and reuse it. Instead, I had to hold onto the pointer to the unmanaged memory, and then construct a Span on demand.

This limitation also impacts the way we might design an audio recording interface for NAudio. For example, suppose we had an AudioAvailable event that was raised whenever recorded audio was available. We might want it to provide us a Span<T> containing that audio:

I'm not sure yet whether this approach is preferable to using Memory<T>. The recording part of my proof of concept application isn't finished yet and so I'll try both approaches when that's ready.

Next steps

There is still a fair amount I'd like to do with this sample to take full advantage of Span<T>. There are more array allocations that could be eliminated, and also there should now be no need for any pinned GCHandle instances.

There's also plenty more NAudio classes that could be converted to take advantage of Span<T>. Currently the sample app just plays a short tone generated with the SignalGenerator, so I'd like to add in audio file reading, as well as recording. Feel free to submit PRs or raise issues if you'd like to help shape what might become the basis for a future NAudio 2.0.

Span<T> and .NET Standard

Of course one big block to the adoption of Span<T> is that it is currently supported on .NET Core 2.1 only. It's not part of .NET Standard 2.0, and it seems there are no immediate plans to create a new version of the .NET Standard that supports Span<T>, presumably due to the challenges of back-porting all this to the regular .NET Framework. This is a shame, because it means that NAudio cannot realistically adopt it if we want one consistent programming model across all target frameworks.

Conclusion

Span<T> is a brilliant new innovation, that has the potential to bring major performance benefits to lots of scenarios, including audio. For the time being though, it is only available in .NET Core applications.

About Mark Heath

I'm a Microsoft MVP and software developer based in Southampton, England, currently working as a Software Architect for NICE Systems. I create courses for Pluralsight and am the author of several open source libraries. I currently specialize in architecting Azure based systems and audio programming. You can find me on: