Ph.D. Dissertation

December 2, 1998

Scott Levine's doctoral thesis is presented here in
PostScript and Adobe Acrobat (.pdf) format. The PostScript version has been compressed
using Gzip.
The PostScript files can be viewed for example with Ghostscript. The
Adobe
Acrobat viewer is available online.

Abstract

In the world of digital audio processing, one usually has
the choice of performing modifications on the raw audio
signal, or data compressing the audio signal. But,
performing modifications on a data compressed audio signal
has proved difficult in the past. This thesis provides a
new representation of audio signals that allows for both
very low bit rate audio data compression and high quality
compressed domain processing and modifications. In this
context, processing possibilities are time-scale and
pitch-scale modifications.

This new audio representation segments the audio into
separate sinusoidal, transients, and noise signals. During
determined attack transients regions, the audio is modeled
by well established transform coding techniques. During the
remaining non-transient regions of the input, the audio is
modeled by a mixture of multiresolution sinusoidal modeling
and noise modeling. Careful phase locking techniques at the
time boundaries between the sines and transients allow for
seamless transitions between representations. By separating
the audio into three individual representations, each can be
efficiently and perceptually quantized.

Sound Examples

Following are some sound (*.wav) examples referred to in
Appendix A of the thesis. In this set of sound examples,
the sines+transients+noise compression scheme at 32 kbps/ch
described in the thesis is compared to MPEG-AAC also at 32
kbps/ch. The MPEG-AAC examples were encoded using source
code from FhG
during October 1998.

The next several examples show the ability to perform pitch
and time-scale modifications in the compressed domain. That
is, while the audio is being decoded, it is also being time
and/or pitch scaled. There is no need for external
post-processing modification algorithms. All the following
examples use the pop example, It Takes Two. The first two
examples show the sound quality difference in slowing down
music using the quantized sines+transients+noise system in
this thesis versus the quality from commerically available
software, Cool
Edit.