"programming": ["audio", "low-level", "iOS"]

Tag Archives: compression

In part 1 I detailed how I built the envelope detector that I will now use in my Unity compressor/limiter. To reiterate, the envelope detector extracts the amplitude contour of the audio that will be used by the compressor to determine when to compress the signal’s gain. The response of the compressor is determined by the attack time and the release time of the envelope, with higher values resulting in a smoother envelope, and hence, a gentler response in the compressor.

The compressor script is a MonoBehaviour component that can be attached to any GameObject. Here are the fields and corresponding inspector GUI:

The two most important parameters for a compressor are the threshold and the ratio values. When a signal exceeds the threshold, the compressor reduces the level of the signal by the given ratio. For example, if the threshold is -2 dB with a ratio of 4:1 and the compressor encounters a signal peak of +2 dB, the gain reduction will be 3 dB, resulting in the signal’s new level of -1dB. The ratio is just a percentage, so a 4:1 ratio means that the signal will be reduced by 75% (1 – 1/4 = 0.75). The difference between the threshold and the signal peak (which is 4 dB in this example) is scaled by the ratio to arrive at the 3 dB reduction (4 * 0.75 = 3). When the ratio is ∞:1, the compressor is turned into a limiter. The compressor’s output can be visualized by a plot of amplitude in vs. amplitude out:

Plot of amplitude in vs. amplitdue out of a compressor with 4:1 ratio.

When the ratio is ∞:1, the resulting amplitude after the threshold would be a straight horizontal line in the above plot, effectively preventing any levels from exceeding the threshold. It can easily be seen how this then would exhibit the behavior of a limiter. From these observations, we can derive the equations we need for the compressor.

All amplitude values are in dB for these equations. We saw both of these equations earlier in the example I gave, and both are pretty straightforward. These elements can now be combined to make up the compressor/limiter. The Awake method is called as soon as the component is initialized in the scene.

Here is the full compressor/limiter code in Unity’s audio callback method. When placed on a component with the audio listener, the data array will contain the audio signal prior to being sent to the system’s output.

First off, there are a few utility functions that I included in the component that converts between linear amplitude and dB values that we can see in the function above. Pre-gain is applied to the audio signal prior to extracting the envelope. For multichannel audio, Unity unfortunately gives us an interleaved buffer, so this needs to be deinterleaved before sending it to the envelope detector (recall that the detector uses a recursive filter and thus has state variables. This could of course be handled differently in the envelope detector, but it’s simpler to work on single continuous data buffers).

When working with multichannel audio, each channel will have a unique envelope. These could of course be processed separately, but this will result in the relative levels between the channels to be disturbed. Instead, I take the maximum envelope value and use that for the compressor. Another option would be to take the average of the two.

I then calculate the slope value based on whether the component is set to compressor or limiter mode (via a function delegate). The following loop is just realizing the equations posted earlier, and converting the dB gain value to linear amplitude before applying it to the audio signal along with post-gain.

This completes the compressor/limiter component. However, there are two important elements missing: soft knee processing, and lookahead. From the plot earlier in the post, we see that once the signal reaches the threshold, the compressor kicks in rather abruptly. This point is called the knee of the compressor, and if we want this transition to happen more gently, we can interpolate within a zone around the threshold.

It’s common, especially in limiters, to have a lookahead feature that compensates for the obvious lag of the envelope detector. In other words, when the attack and release times are non-zero, the resulting envelope lags behind the audio signal as a result of the filtering. The compressor/limiter will actually miss attenuating the peaks in the signal that it needs to because of this lag. That’s where lookahead comes in. In truth, it’s a bit of a misnomer because we can obviously not see into the future of an audio signal, but we can delay the audio to achieve the same effect. This means that we extract the envelope as normal, but delay the audio output so that the compressor gain value lines up with the audio peaks that it is meant to attenuate.

This entry in my making of a plug-in series will detail what went into finalizing the prototype program for the Match Envelope plug-in. A prototype of this kind is usually a command-line program wherein much of the code is actually written to implement functionality and features, and then later transferred into a plug-in’s SDK (in my case, the VST SDK). Plug-ins, by their very nature, are not self-executable programs and need a host to run, so it is more efficient to test the fundamental code structure within a command-line program.

Having now completed my prototype, I want to first share some of the things I improved upon as well as new features I implemented. One of the main features of the plug-in that I mentioned in part 1 was the match % parameter. This effectively lets you control how strongly the envelope you’re matching affects the audio, and rather than being a linear effect, it is proportional to the difference between the amplitude of the envelope and the amplitude of the audio. Originally this was the formula I used (from part 1):

We could see that this mostly gave me the results I was after, but if we look closely at the resulting waveform, there is some asymmetry in comparing it to the original (look at the bottom of the waveforms). One of the mistakes in this formula was in comparing the interpolated value ‘ival’ with the actual sample amplitude of the ‘buffer’. To remedy this, I now also extract the envelope of the destination audio that we are applying the envelope on to (with the same window width used to extract the source envelope) and use this value to compare the difference with ‘ival’. This ensures a more consistent and accurate comparison of amplitudes.

The other mistake in the original formula was to linearly affect ‘a‘, the alpha value that is input by the user in %, by the term that calculates the difference between the amplitudes. So while the a value does affect the resulting ival proportionally, a itself was not. The final equation then, just became:

I use two strategies when deciding on an appropriate mathematical formula for what I need. One is considering how I want a value to change over time, or over some range of values, and turning to a kind of equation that does that (i.e. should it be a linear change, exponential, logarithmic, cyclical, etc.). This leads to the second method, and that is to use a graph to visualize the shape of change I am after; this leads to an equation that defines that graph.

Here is a quick graphic and audio to illustrate these changes using the same flute source as the envelope and triangle wave as its destination from part1:

Shortly, we will be seeing some much more interesting musical examples of the plug-in at work. But before that, we can see another feature at work above that I implemented since last time: junction smoothing.

In addition to specifying the length of the envelope, the user also specifies a value (in msec) to smooth the transition from the envelope match to the original, unmodified audio. Longer values will obviously make the transition more gradual, while shorter makes it more abrupt. The process of implementing this feature turned out to be reasonably simple. This is the basic equation:

where ‘ival’ is the interpolated value and ‘jpos’ is the current position within the bounds of the junction smoothing specified by the user. ‘jpos’ starts at 0, and once the smoothing begins, it increments (within a normalized value) until it hits 1 at the end of the smoothing. The larger ‘jpos’ gets, the less of the actual interpolated value we end up with in our ‘jval’ result, which is used to scale the audio buffer (just as ‘ival’ does outside of junction smoothing). In other words, when ‘jpos’ hits 1, ‘jval’ will be 1 and so we multiply our audio buffer by 1; then we have reached the end of our process and the original audio continues on unmodified.

Before we move on, here is a musical example. This very famous opening of Debussy’s “Prelude to the Afternoon of a Faun” seemed like a good excerpt to test my plug-in on. This is the original audio (the opening is very very soft as most classical recordings are of quiet moments to preserve dynamic range, so I had to amplify it which is why there is some audible low noise):

Using this as the source envelope, I used a window size of 250msec at 90% match to apply on to this flute line that I recorded in Logic, doubling the original from the audio above (the flute sample is from Vienna Special Edition).

Junction smoothing was of course applied during the process to let the flute line fade out as in its original incarnation. Without smoothing it would have abruptly cut off. This gives us a seamless transition from the end of the flute solo into the orchestral answer.

It was very important in this example to specify a fairly large window duration, because we don’t want to capture the tremolo of the original flute solo as this would fight against the tremolo of the sampled recording that we are applying the envelope on to. We can hear a little bit of this in places even with a 250msec window, so this will be something I intend to test further to see how this may be avoided or at least minimized.

This brings me to the other major challenge I faced in developing this plug-in since part 1: stereo handling. Dealing with stereo files isn’t complicated in itself, but there were a few complexities I encountered along the way specific to how I wanted the plug-in to behave. Instead of only allowing a 1-to-1 correspondence (i.e. only supporting mono to mono, or stereo to stereo), I decided to allow for the two additional situations of mono to stereo and stereo to mono.

The first two cases are easy enough to deal with, but what should happen if the source envelope is mono and the destination audio is stereo, and vice versa? I decided to allocate 2-dimensional arrays for the envelope buffers to hold the mono/stereo amplitude data and then use bitwise flags to store the states of each envelope:

This saves on having multiple variables representing channel states for each envelope, so I only have to pass around one variable that contains all of this information that is then parsed in the appropriate places to retrieve this information.

As we can see, only one variable (‘envFlags’) is used in the extraction of the source envelope, and we can find out mono/stereo information by using bitwise AND with the corresponding enum definition of the flag we’re after. Furthermore, in the case of the source envelope being stereo but destination audio being mono, I combine the amplitude data of the two channels into one, according to either average or peak extraction method (also specified by user).

The difficulty in implementing this feature wasn’t so much in how to get it done, but how to get it done more efficiently, without a massive number of parameters passing around and a whole lot of conditional statements within the main processing loop to determine how many channels each envelope has. We can see some of this at work within the main process:

I use another variable (‘stereo_src’) that extracts some information from the bitwise flags to take care of the case where the source envelope is mono but destination audio is stereo. Since the loop covers the channels of the destination audio (in interleaved format), I needed a way to restrict out of bounds indexing of the source envelope. If the source envelope is mono, ‘stereo_src’ will be 0, so the indexing of it will not exceed its limit. If both envelopes are stereo, ‘stereo_src’ will be 1, so it will effectively “follow” the same indexing as the destination audio.

For the next, and last, musical example of this entry, we change things up a whole lot. I’m going to show the application of this plug-in to electronic dance music. This match envelope plug-in can emulate, or function as a kind of side chain compressor, which is quite commonly found in EDM. Here is a simple kick drum pattern and a synth patch that goes on top:

By applying the match envelope plug-in to the synth pattern using the kick drum pattern above as the source envelope (window size of 100msec and 65% match), we achieve the kind of pumping pattern in the synth so characteristic of this kind of music. The result, and mix, are as follows:

This part has really covered the preliminary features of what I’m planning to include in the Match Envelope plug-in. As I stated at the start, the next step is to transfer the code into the VST SDK (that’ll be part 3), but this also comes with its share of considerations and complications, mainly dealing with UI. How should this appear to the user? How do you neatly package it all together to make it easy and efficient to use? How should be parameters be presented so that they are intuitive?

Most all plug-ins/hosts offer up a default UI, which is what I’ll be working with initially, but eventually a nice graphic custom GUI will be needed (part 4? 5? 42?).