Spectralextractor uses frequency variation to discriminate between pitch and noise.
The frequency of each bin is tracked and used to create a signal measuring the rate at
which the bin is changing frequency; the higher this rate of frequency of change the more
the bin is associated with noise rather than pitch components—that is, the bin’s frequency
instability becomes a correlate for noise. A frequency change threshold is set to extract
pitch or noise; when the rate of change falls below the threshold the bin is identified
with pitch components, above, with noise. A response time control (lowpass filter) slows
the rate at which the signal is allowed to cross the threshold, thereby preventing gurgle
noise. A lowpass filter (threshold accumulator) is applied to the tracked bin frequency as
well to smooth out artifacts from the process. While specrtalextractor functions differently
than spectwarper, its results are very, very similar. I prefer spectwarper which has less
grit to it, although both are interesting. Spectralextractor is newer—a work-in-progress—and
requires more refinement—its parameters of control are not yet general enough, changing their
effect when the FFT size is modified (beware).

Amplitude Change Response Time in Seconds

Amplitude Reports Print Mode

Two flags are provided for controlling the output amplitude statistics;
one turns the statistics on or off, and the other sets how often they will
be reported. The statistics provide the peak output level in amplitude and
decibels. With integer format output files, output values exceeding the
normalized peak amplitude of 1. (0 dB) are clipped to a value of 1.0, and
the statistics placed in clip mode; in clip mode reports are made only for
frames where clipping occurs. The peak amplitude, its time, and the number
of clipped samples are reported at the end of processing. With floating-point
format output files, output values exceeding the normalized peak amplitude
of 1. are not clipped since they will be rescaled in the second pass; output
statistics proceed normally throughout. The levels before and after rescaling
are reported at the end of processing.

0 turns amplitude reports off, 1 turns them on.

Analysis Frames per Second

This controls how often the phase vocoder will perform an analysis on the
signal. It is a translation of the classic decimation control that specifies
how many samples to skip between analysis frames. More frames increases
the resolution of time but decrease speed. 200 frames per second is a good
reference point. If you expand time you should increase this proportionately
to maintain about 200 or more frames per second.

Begin Time in Seconds

The time, in seconds, at which to begin processing the soundfile.

End Time in Seconds

The time, in seconds, at which to stop processing the soundfile. 0 or less
is equivalent to the duration of the soundfile.

Low/High Shelf Equalization

Equalization has been provided at various points in routines to allow for
the needed adjustment of spectra. The EQ consists of low and hi shelf segments,
whose width is adjusted through control of the shelf breakpoint frequency.
The region between the shelf segments is represented by a linear decibel
gradient between the decibel levels of the two shelves. Some routines implement
the EQ before pitch changes, others after. EQ placed before pitch changes
(pre-transpose/shift) will cause the EQ to be transposed with the pitch
changes, whereas afterwards (post-transpose/shift) will keep them fixed
as shifts and transpositions occur.

Low Shelf Gain

Determines how the amplitude of sounds below the low shelf frequency will
be affected.

High Shelf Gain

Determines how the amplitude of sounds above the high shelf frequency will
be affected.

Low Shelf Frequency

Determines the frequency below which the low shelf gain will be used.

High Shelf Frequency

Determines the frequency above which the high shelf gain will be used.

FFT Length

The FFT size must be a power of 2. Larger FFT sizes resolve frequencies
better but transient behavior more poorly. Choose your FFT size according
to the sound you are working with. A size of 1024 or 2048 works well in
most cases.

Frequency Change Threshold

Frequency Shift Factor

With the frequency shift control, a constant or function value is added
to all the bin frequencies to produce a nonlinear pitch domain translation
of the spectrum. Frequency shift is related to things like ring modulation
and their similarly nonlinear shifts of pitch characteristics. Use this
to create small distortions of the harmonic integrity of a sound.

Gain in Decibels

The output and other components can be gained. 0 dB represents unity gain,
no change. A change of +/- 6 dB represents a doubling or halving of the
amplitude. Increments of 10 dB are loosely associated with one change in
dynamic level.

Oscillator Resynthesis Threshold in Decibels

The phase vocoder resynthesizes the signal using one of two methods, depending
on the type of changes made to the FFT. If the changes are only to the magnitudes
(amplitudes), then the faster overlap/add method is used. If however changes
in frequency are made, then the FFT integrity is compromised, necessitating
use of the oscillator bank method in which each bin is synthesized as a
sine wave changing in frequency and amplitude. This method is slower, although
a resynthesis threshold is available that can be used to increase the computation
speed by turning off bins whose amplitude falls below the threshold. A threshold
of -60dB is appropriate, although safety warrants using a lower threshold
if the spectrum is thin and its decays exposed; use your ear.

Output Format

The output sound file is written as a NeXT/Sun format sound file in either
16-bit short or 32-bit floating point format, of one or more channels. The
channels are processed one at a time beginning with the first channel. The
first pass writes zeros in the channels yet to be processed, replacing them
when processing proceeds to those channels.

0 tells PVCX to use the format of the input file, 1 equals integer format,
and 2 equals rescaled floats.

Peak Rescale Level

Selection of the floating-point, output-file format invokes an amplitude
rescaling feature. Once processing is complete, a second pass through the
sound file is made to rescale the values to the decibel level specified.
A dB rescale level of 1 causes rescaling to the level of the original input
file.

Pitch Transposition in Semitones

With the pitch transposition control, a constant or function value is multiplied
against all bin frequncies. This is classic transposition, here specified
in semitones of transposition (12 semitones equals an octave). Conversion
is made to produce the appropriate frequency multiplier.

Resynthesis Channel

All routines allow both monophonic and multi-channel input files to be
processed. With multi-channelled files, you can either select one channel
and produce a monophonic output file, or process all the channels. Channels
are numbered beginning with 1. Processing of multi-channelled files is done
one channel at a time beginning with channel 1, with zeros written to channels
which have yet to be processed. Processing one channel at a time requires
less memory and allows you to audition the output sooner than if you did
all channels at once.

Use 0 to process all channels.

Spectrum Warpshape Index

Many of the routines employ the principle of warping in which a distribution
of values is transformed by an identity function. In these places an exponential
function is employed to remap a 0-1 range of values into a new orientation
that preserves the minima (0) and maxima (1) while bringing the distribution
closer to either extreme as a result of the curvature of the exponential
function selected. The curvature of the exponential function is selected
through a warp index. Specifically, warp index w will reorient the input
x through the function below (^ = exponentiation).

y = (1. - (e^(x * w))) / (1. - (e^w))

In this function, the warp index of 0 produces a linear function and an
untransformed output. Positive warp index values of increasing magnitude
produce curves of increasing concavity (increasing slope) that draw values
towards the 0-valued minima, and reduce the function integral. Negative
values do the opposite, drawing values towards the maxima of 1, increasing
the integral.

The practical use of this mechanism is found in various places. One such
place is the reshaping of the frequency response distribution characteristics.
In this, positive warp indeces cause the peaks of the response to be accentuated
while the weaker frequencies are expanded out (i.e. pushed towards 0). Negative
values have the opposite effect as they compress the dynamic range of the
response and raise the relative level of the weaker noise components. Another
place where warp applies is in the remapping of FFT amplitudes through the
spectrum warpshape. In this, the sucessive FFT frames have their amplitudes
remapped by the identity function, similiarly expanding or compressing the
dynamic range depending upon the warp specified; 0 (linear warp function)
leaves the amplitudes unchanged.

Threshold Accumulator Response Time in Seconds

Time Expansion/Contraction Factor

Once the spectral modifications are made to the FFT analysis, an inverse
FFT is invoked to produce the samples of a time-domain signal. The classic
phase vocoder paradigm controls the number of samples through the interpolation
value and its relation to the decimation. The arcane relationship of decimation
and interpolation is here translated into the parameter of time expansion/contraction,
allowing for the direct scaling of time. Use values greater than 1 to expand
time, less than 1 contract it.

Time Interval Between Reports

Window Size in Samples

The window size is a less opaque parameter; like the FFT, it must be a
power of 2. Windows twice the size of the FFT work well. Larger window sizes
may resolve frequencies better. Specifying 0 for the window size will automatically
set the window to twice the FFT size.

Window Type

The FFT and inverse FFT are computed using a window. Like the FFT size,
the shape of the window used can effect the quality of the analysis and
resynthesis. (See F.R.Moore, Stieglitz, or Roads for further explanation.)
A variety of windows are available including: Hamming, Rectangular, Blackman,
Triangular, and Kaiser (in 8 different forms as related to 8 different alpha
values). Blackman (-w2) or Kaiser (-w8) are recommended for most applications.
In some unusual cases where transient behavior is being lost, consider using
other windows such as the Rectangular, although take care to assure that
it is not producing pops or a buzzy sound.