"programming": ["audio", "low-level", "iOS"]

Tag Archives: audio

With this post I’m taking a slight diversion away from low-level DSP and plug-ins to share a fun little experimental project I just completed. It’s a game of Tic-Tac-Toe using the FMOD sound engine with audio based on the minimalist piano piece “Für Alina” by Arvo Pärt. In the video game industry there are two predominant middleware audio tools that sound designers and composers use to define the behavior of audio within a game. Audiokinetic’s Wwise is one, and Firelight Technologies’ FMOD is the other. Both have their strengths and weaknesses, but I chose to work with FMOD on this little project because Wwise’s Mac authoring tool is still in the alpha stage. I used FMOD’s newest version, FMOD Studio.

The fun and interesting part of this little project was working with the fairly unusual approach to the audio. I explored many different ways to implement the sound in a way that both reflected the subtletly of the original music while reacting to player actions and the state of the game. Here is a video demonstrating the result. Listen for subtle changes in the audio as the game progresses. The behavior of the audio is all based on a few simple rules and patterns governed by player action.

I’m both happy and relieved that progress on making the Match Envelope plug-in is proving to be successful (so far, anyway)! It’s up and running, albeit in skeleton form, on Audacity (both Mac and Windows) and Soundforge (Windows). As I was expecting, it hasn’t been without it’s fair share of challenges, and one of the biggest has been dealing with the UI — how will the user interact with the plug-in efficiently with the inherent limitations involved in the interface?

The crux of the problem stems from the offline-only capability of the Match Envelope plug-in. Similar to processes like normalization, where the entire audio buffer needs to be scanned to determine its peak before scanning it a second time to apply it, I need to scan the entire audio buffer (or at least as much as the user has selected) in order to get the envelope profile before then applying that onto a different audio buffer.

This part of the challenge I foresaw as I began development. I knew of VST’s offline features, however, and I planned to explore these options that would solve some of the interface difficulties I knew I would encounter. What I didn’t count on was that host programs widely do not support VST offline functions, and in fact, Steinberg has all but removed the example source code for offline plug-ins from the 2.4 SDK (I’m not currently up to speed on VST3 as of yet). Thus I have been forced to use the normal VST SDK functions to handle my plug-in.

So here is the root cause of perhaps the main challenge I had to deal with: the host program that invokes the plug-in is responsible for sending the audio buffer in blocks to the processing function, which is the only place I have access to the audio stream. The function prototype looks like this:

‘inputs‘ contains the actual audio sample data that the host has sent to the plug-in, ‘outputs‘ is where, after processing, the plug-in places the modified audio, and ‘sampleFrames‘ is the number of samples (block size) in the audio sample data. As I mentioned earlier, not only do I need to scan the audio buffer first to acquire the envelope profile, I need to divide the audio data into windows whose size is determined by the user. It’s pretty obvious that the number of samples in the window size will not equal the number of samples in sampleFrames (at least not 99.99998% of the time), effectively complicating the implementation of this function three-fold.

How should I handle cases where the window size is less than sampleFrames? More than sampleFrames? More than double sampleFrames? Complicating matters further is that different hosts will pass different block sizes in for sampleFrames, and there is no way to tell exactly what it will be until processReplacing() is invoked. Here is the pseudocode I used to tackle this problem:

The code determines how many windows it can process in any given loop iteration of processReplacing() given sampleFrames and windowSize and storing leftover samples in a variable that is carried over into the next iteration. Once the end of a window is reached, the values copied from the audio buffer (our source envelope) are averaged together, or its peak is found, whichever the user has specified, and that value is then stored in the envelope data buffer. The reasoning behind handling large windows separately from small ones is to avoid a conditional test with every sample processed to determine if the end of the window is reached.

Once this part of the plug-in began to take shape, another problem cropped up. The plug-in requires three steps (one is optional) taken by the user in order to use it:

Acquire source envelope profile from an audio track,

Acquire the destination audio’s envelope profile to use the match % parameter (optional),

Select the audio to apply the envelope profile onto and process.

It became clear that, since I was not using VST offline capabilities, the plug-in would need to be opened and reopened 2 – 3 times in order to make this work. This isn’t exactly ideal and wasn’t what I had in mind for the interface, but the upside is that its been a huge learning experience. As such, I decided to split the Match Envelope plug-in into two halves: the Envelope Extractor, and the Envelope Matcher.

I felt this was a good way to go because it separated two distinct elements of the plug-in as well as clarifying which parameters belong with which process. i.e. The match % or gain parameters have no effect on the actual extraction of the envelope profile, only during the processing onto the destination audio. Myself, like many others I assume, like fiddling around with parameters and settings on plug-ins, and it can get very frustrating at times when/if they have no apparent effect, and this can create confusion and possibly thoughts of bad design towards the software.

To communicate between the two halves, I implemented a system of writing the envelope data extracted to a temporary binary file that is read by the Envelope Matcher half in order to process the envelope, and this has proven to work very well. In debug mode I am writing a lot of data out to temporary debug files in order to monitor what the plug-in is doing and how all the calculations are being done.

From Envelope Extractor:

From Envelope Matcher:

Some of these non-ideal interface features I plan on tackling with a custom GUI, which offers much more flexibility than the extremely limited default UI. Regardless, I’m excited that I’ve made it this far and am very close to having a working version of this plug-in up and running on at least two hosts so far (Adobe Audition also supports VST and as far as I know, offline processing, but I have not been able to test it as I don’t own it yet).

After this is done, I do plan on exploring other plug-in types to compare and contrast features and flexibility (AU, RTAS, etc.), and I may find a better solution for the interface. Of course, the plug-in could work as a standalone app where I have total control over the UI and functionality, but it would lack the benefit of doing processing right from within the host.

I decided to start this little blog about my current endeavors into audio programming because since I started, I’ve already learned a great deal of fascinating and wonderful things relating to audio in the analog and, especially, the digital domain. Some of these things I already knew but my understanding of them have deepened, and other concepts are completely new. Sharing this knowledge, the discoveries and the challenges I encounter along the way, seemed like a good idea.

Sound is such an amazing thing! I’ve always known (and been told as I’m sure we all have) that math is a huge part of it — inseperable. But precisely how much, and to what complexity, I didn’t fully know until I dove into audio programming. Advanced trigonometry, integrals, and even complex numbers are all there in the theory behind waveforms and signal processing. Fortunately, math was consistently my best subject in school and trigonometry was one of my favorite areas of it.

What further steered me in this direction was my growing fascination with audio implementation in video games. As I taught myself the various middleware tools used in the industry (FMOD, Wwise and UDK) it really became clear how much I loved it and how interested I was in how the process of implementation and integration of audio in video games could add to the gameplay, immersion and the overall experience.

With that little introduction out of the way, I’ll end this first post with a little example of what I’ve picked up so far. I’m reading through the book “Audio Programming” (Boulanger and Lazzarini), and early on it walks through the process of writing a real-time ring modulator. Building on this I adapted it to accept stereo input/output as it was originally mono. You then input two frequencies (one for the left channel and one for the right channel) that are then modulated with the carrier frequencies of the stereo input signal, and this results in a ring-modulated stereo output signal (ring modulation is a fairly simple DSP effect that just multiplies two signals together producing strong inharmonic partials in the resulting sound, which is usually very bell-like). Here is a snippet of my modified code in which I had to create my own stereo oscillator structure and send it to the callback function that modulates both channels:

And here is a recording of my digital piano being played into the real-time ring modulator (which I did with a single microphone, so the recording is in mono unfortunately):