Applications

The simplest application of PARSHL is as an analysis tool since we
can get a very good picture of the evolution of the sound in time by
looking at the amplitude, frequency and phase trajectories. The
tracking characteristics of the technique yield more accurate
amplitudes and frequencies than if the analysis were done with an
equally spaced bank of filters (the traditional STFT implementation).

In speech applications, the most common use of the STFT is for
data-reduction. With a set of amplitude, frequency and phase
functions we can get a very accurate resynthesis of many sounds with
much less information than for the original sampled sounds. From our
work it is still not clear how important is the phase information in
the case of resynthesis without modifications, but McAulay and
Quatieri [12] have shown the importance of phase in the
case of speech resynthesis.

One of the most interesting musical applications of the STFT
techniques are given by their ability to separate temporal from
spectral information, and, within each spectrum, pitch and harmonicity
from formant information. In §7, Parameter
Modifications, we discussed some of them, such as time scaling and
pitch transposition. But this group of applications has a lot of
possibilities that still need to be carefully explored. From the few
experiments we have done to date, the tools presented give good
results in situations where less flexible implementations do not,
namely, when the input sound has inharmonic spectra and/or rapid
frequency changes.

The main characteristic that differentiates this model from the
traditional ones is the selectivity of spectral information and the
phase tracking. This opens up new applications that are worth our
attention. One of them is the use of additive synthesis in
conjunction with other synthesis techniques. Since the program allows
tracking of specific spectral components of a sound, we have the
flexibility of synthesizing only part of a sound with additive,
synthesis, leaving the rest for some other technique. For example,
Serra [22] has used this program in conjunction with LPC
techniques to model bar percussion instruments, and Marks and Polito
[11] have modeled piano tones by using it in
conjunction with FM synthesis. David Jaffe has had good success with
birdsong, and Rachel Boughton used PARSHL to create abstractions of
ocean sounds.

One of the problems encountered when using several techniques to
synthesize the same sound is the difficulty of creating the perceptual
fusion of the two synthesis components. By using phase information we
have the possibility of matching the phases of the additive synthesis
part to the rest of the sound (independently of what technique was
used to generate it). This provides improved signal ``splicing''
capability, allowing very fast cross-fades (e.g., over one frame
period).

PARSHL was originally written to properly analyze the steady state of
piano sounds; it did not address modeling the attack of the piano
sound for purposes of resynthesis. The phase tracking was primarily
motivated by the idea of splicing the real attack (sampled waveform)
to its synthesized steady state. It is well known that additive
synthesis techniques have a very hard time synthesizing attacks, both
due to their fast transition and their ``noisy'' characteristics.
The problem is made more difficult by the fact that we are very
sensitive to the quality of a sound's attack. For plucked or struck
strings, if we are able to splice two or three periods, or a few
milliseconds, of the original sound into our synthesized version the
quality can improve considerably, retaining a large data-reduction
factor and the possibility of manipulating the synthesis part. When
this is attempted without the phase information, the splice, even if
we do a smooth cross-fade over a number of samples, can be very
noticeable. By simply adding the phase data the task becomes
comparatively easy, and the splice is much closer to inaudible.