Preamble

The Gaborator is a header-only library — there are no C++ files
to compile, only header files to include.
The core spectrum analysis and resynthesis code is in
gaborator/gaborator.h, and the code for rendering
images from the spectrogram coefficients is in
gaborator/render.h.

#include <gaborator/gaborator.h>
#include <gaborator/render.h>

The program takes the names of the input audio file and output spectrogram
image file as command line arguments, so we check that they are present:

Reading the Audio

The audio file is read using the libsndfile library
and stored in a std::vector<float>.
Note that although libsndfile is used in this example,
the Gaborator library itself does not depend on or
use libsndfile.

The Spectrum Analysis Parameters

Next, we need to choose some parameters for the spectrum analysis:
the frequency resolution, the frequency range, and optionally a
reference frequency.

The frequency resolution is specified as a number of frequency
bands per octave. A typical number for analyzing music signals is 48
bands per octave, or in other words, four bands per semitone
in the 12-note equal tempered scale.

The frequency range is specified by giving a minimum frequency;
this is the lowest frequency that will be included in the spectrogram
display.
For audio signals, a typical minimum frequency is 20 Hz,
the lower limit of human hearing. In the Gaborator library,
all frequencies are given in units of the sample rate rather
than in Hz, so we need to divide the 20 Hz by the sample
rate of the input audio file: 20.0 / fs.

Unlike the minimum frequency, the maximum frequency is not given
explicitly — instead, the analysis always produces coefficients
for frequencies all the way up to half the sample rate
(a.k.a. the Nyquist frequency). If you don't need the coefficients
for the highest frequencies, you can simply ignore them.

If desired, one of the frequency bands can be exactly aligned with
a reference frequency. When analyzing music signals, this is
typically 440 Hz, the standard tuning of the note A4.
Like the minimum frequency, it is given in
units of the sample rate, so we pass 440.0 / fs.

The parameters are held in an object of type
gaborator::parameters:

gaborator::parameters params(48, 20.0 / fs, 440.0 / fs);

The Spectrum Analyzer

Next, we create an object of type gaborator::analyzer;
this is the workhorse that performs the actual spectrum analysis
(and/or resynthesis, but that's for a later example).
It is a template class, parametrized by the floating point type to
use for the calculations; this is typically float.
Constructing the gaborator::analyzer involves allocating and
precalculating all the filter coefficients and other auxiliary data needed
for the analysis and resynthesis, and this takes considerable time and memory,
so when analyzing multiple pieces of audio with the same
parameters, creating a single gaborator::analyzer
and reusing it is preferable to destroying and recreating it.

gaborator::analyzer<float> analyzer(params);

The Spectrogram Coefficients

The result of the spectrum analysis will be a set of spectrogram
coefficients. To store them, we will use a gaborator::coefs
object. Like the analyzer, this is a template class parametrized
by the data type. Because the layout of the coefficients is determined by
the spectrum analyzer, it must be passed as an argument to the constructor:

gaborator::coefs<float> coefs(analyzer);

Running the Analysis

Now we are ready to do the actual spectrum analysis,
by calling the analyze method of the spectrum
analyzer object.
The first argument to analyze is a float pointer
pointing to the first element in the array of samples to analyze.
The second and third arguments are of type
int64_t and indicate the time range covered by the
array, in units of samples. Since we are passing the whole file at
once, the beginning of the range is sample number zero, and the end is
sample number mono.size(). The fourth argument is a
reference to the set of coefficients that the results of the spectrum
analysis will be stored in.

analyzer.analyze(mono.data(), 0, mono.size(), coefs);

Rendering an Image

Now there is a set of spectrogram coefficients in coefs.
To render them as an image, we will use the function
gaborator::render_p2scale().

Rendering involves two different coordinate
spaces: the time-frequency coordinates of the spectrogram
coefficients, and the x-y coordinates of the image.
The two spaces are related by an origin and a scale factor,
each with an x and y component.

The origin specifies the point in time-frequency space that
corresponds to the pixel coordinates (0, 0). Here, we will
use an origin where the x (time) component
is zero (the beginning of the signal), and the y (frequency) component
is the band number of the first (highest frequency) band:

render_p2scale() supports scaling the spectrogram in
both the time (horizontal) and frequency (vertical) dimension, but only
by power-of-two scale factors. These scale factors are specified
relative to a reference scale of one vertical pixel per frequency band
and one horizontal pixel per signal sample.

Although a horizontal scale of one pixel per signal sample is a
mathematically pleasing reference point, this reference scale is not
used in practice because it would result in a spectrogram that is much
too stretched out horizontally. A more typical scale factor might be
210 = 1024, yielding one pixel for every 1024 signal
samples, which is about one pixel per 23 milliseconds of signal at a
sample rate of 44.1 kHz.

int x_scale_exp = 10;

To ensure that the spectrogram will fit on the screen even in the
case of a long audio file, let's auto-scale it down further until
it is no more than 1000 pixels wide:

while ((n_frames >> x_scale_exp) > 1000)
x_scale_exp++;

In the vertical, the reference scale factor of one pixel per
frequency band is reasonable, so we will use it as-is. In other words,
the vertical scale factor will be 20.

int y_scale_exp = 0;

Next, we need to define the rectangular region of the image
coordinate space to render. Since we are rendering the entire
spectrogram rather than a tile, the top left corner of the
rectangle will have an origin of (0, 0).

int64_t x0 = 0;
int64_t y0 = 0;

The coordinates of the bottom right corner are determined by the
length of the signal and the number of bands, respectively, taking the
scale factors into account.
The length of the signal in samples is n_frames,
and we get the number of bands as the difference of the end points of
the range of band numbers:
analyzer.bandpass_bands_end() - analyzer.bandpass_bands_begin().
The scale factor is taken into account by right shifting by the
scale exponent.

The right shift by y_scale_exp above doesn't actually
do anything because y_scale_exp is zero, but it would be
needed if, for example, you were to change y_scale_exp to
1 to get a spectrogram scaled to half the height. You could also make a
double-height spectrogram by setting y_scale_exp to -1,
but then you also need to change the
>> y_scale_exp to
<< -y_scale_exp since you can't shift by
a negative number.

We are now ready to render the spectrogram, producing
a vector of floating-point amplitude values, one per pixel.
Although this is stored as a 1-dimensional vector of floats, its
contents should be interpreted as a 2-dimensional rectangular array of
(y1 - y0) rows of (x1 - x0) columns
each, with the row indices increasing towards lower
frequencies and column indices increasing towards later
sampling times.

Writing the Image File

To keep the code simple and to avoid additional library
dependencies, the image is stored in
pgm (Portable GreyMap) format, which is simple
enough to be generated with just a few lines of inline code.
Each amplitude value in amplitudes is converted into an 8-bit
gamma corrected pixel value by calling gaborator::float2pixel_8bit().
To control the brightness of the resulting image, each
amplitude value is multiplied by a gain; this may have to be adjusted
depending on the type of signal and the amount of headroom in the
recording, but a gain of about 15 often works well for typical music
signals.

Postamble

To make the example code a complete program,
we just need to finish main():

return 0;
}

Compiling

If you are using macOS, Linux, NetBSD, or a similar system, you can build
the example by running the following command in the examples
subdirectory.
You need to have libsndfile is installed and supported by
pkg-config.

Compiling for Speed

The above build command uses the Gaborator's built-in FFT implementation,
which is simple and portable but rather slow. Performance can be
significantly improved by using a faster FFT library. On macOS, you
can use the FFT from Apple's vDSP library by defining
GABORATOR_USE_VDSP and linking with the Accelerate
framework:

On Linux and NetBSD, you can use the PFFFT (Pretty Fast FFT) library.
You can get the latest version from
https://bitbucket.org/jpommier/pffft,
or the exact version that was used for testing from gaborator.com:

Running

Running the following shell commands will download a short example
audio file (of picking each string on an acoustic guitar), generate
a spectrogram from it as a .pgm image, and then convert
the .pgm image into a JPEG image: