Example 3: Streaming

Introduction

This example shows how to process streaming audio a block at a time,
rather than operating on a complete recording at once as in the previous
examples.

This program doesn't do anything particulary useful — it just
inverts the phase of the signal, but not using the obvious method of
changing the sign of each sample but by changing the sign of each
spectrogram coefficient. Consider the expression coef =
-coef a placeholder for your own streaming filter or effect
code.

The next couple of lines work around a design flaw in
libsndfile. By default, when reading a 16-bit
audio file as floating point data and then writing them
as another 16-bit audio file, libsndfile will use slightly
different scale factors on input and output, and the output will
not be bit-identical to the input. To make it easier to verify
that this example actually yields correct results to within
the full 16-bit precision, we select a non-normalized floating
point representation, which does not suffer from this flaw.

Spectrum Analysis Parameters

As in Example 1, the parameters are chosen for analyzing music, but
to reduce the latency, the number of frequency bands per octave is reduced
from 48 to 12 (one per semitone), and the lower frequency limit of
the bandpass filter bank is raised from 20 Hz to 200 Hz.

Calculating Latency

The spectrogram coefficients are calculated by applying symmetric
FIR filters to the audio signal. This means a spectrogram coefficient
for any given point in time t is a weighted average of samples
from both before and after t, representing both past and future
signal. The width of the filter impulse response depends on the
bandwidth, which in turn depends on the center frequency of its band.
The lowest-frequency filters have the narrowest bandwidths, and
therefore the widest impulses response, and need the greatest amount
of past and future signal. The width of the filter impulse response
is called its support, and the worst-case (widest) support of
any analysis filter can be found by calling the function
gaborator::analyzer::analysis_support(). This returns
the one-sided support, the width of the impulse
response to each side of its center, as a floating point number.
To be on the safe side, let's round this up to the next integer:

size_t analysis_support = ceil(analyzer.analysis_support());

Similarly, when resynthesizing audio from coefficients, calculating
a sample at time t involves applying symmetric FIR
reconstruction filters, calculating a weighted average of both past and
future spectrogram coefficients. The support of the widest reconstruction
filter can be calculated by calling
gaborator::analyzer::synthesis_support():

size_t synthesis_support = ceil(analyzer.synthesis_support());

In a real-time application, the need to access future signal
samples and/or coefficients causes latency. A real-time audio
analysis application that needs to examine the coefficients for
time t can only do so when it has received the input samples up
to time t + analysis_support, and therefore has a minimum latency of
analysis_support. A real-time filtering or effect
application, such as the present example,
incurs latency from both analysis and reconstruction
filters, and can only produce the output sample for time t once
it has received the input samples up to
t + analysis_support + synthesis_support,
for a minimum latency of analysis_support + synthesis_support.
Let's print this total latency to standard output:

In a practical real-time system, there will be additional latency
caused by processing the signal in blocks of samples rather than a
sample at a time. Since the block size is a property of the overall
system, and causes latency even if the Gaborator is not involved, that
latency is considered outside the scope of this discussion.

Streaming

To mimic a typical real-time system, the audio is processed
in fixed-size blocks (here, 1024 samples). If the size
of the input file is not divisible by the block size, the last block
is padded with zeroes.
The variable t_in keeps track of time, indicating
the sampling time of the first sample of the current input block,
in units of samples.

Now we can spectrum analyze the current block of samples. Note how
the time range,
t_in...t_in + blocksize,
is explicitly passed to analyze().

analyzer.analyze(buf.data(), t_in, t_in + blocksize, coefs);

The call to analyze() updates the coefficients
for the time range from t_in - analysis_support to
t_in + blocksize + analysis_support. The oldest
blocksize samples of this time range,
that is, from t_in - analysis_support to
t_in - analysis_support + blocksize, were now updated for
the last time and will not be affected by future input blocks.
Therefore, it is now safe to examine and/or modify these
coefficients as required by your application. Here, by way
of example, we simply change their signs to invert the phase of the signal.
Note that unlike the earlier filter example where apply()
applied a function to all the coefficients, here it is applied only to
the coefficients within the time range specified by the last two arguments.

Next, we will generate a block of output samples. To get correct results,
we can only generate output when the coefficients that the output samples
depend on will no longer change. Specifically, a resynthesized audio
sample for time t will depend on the coefficients of the
time range t - synthesis_support...t +
synthesis_support. To ensure that the resynthesis uses only
coefficients that have already been processed by
the apply() call above, the most recent block of samples
that can safely be resynthesized ranges from t_out = t_in -
analysis_support - synthesis_support to t_out +
blocksize.