Why The Fourier Transform
• Natural for visualizing audio signals: The ear performs a kind of Fourier analysis • Spectral models can be very compact and ﬂexible: – MPEG audio coding – Sinusoidal modeling (“additive synthesis”) ∗ AES talk4 on history of spectral modeling at CCRMA and elsewhere. • Any Linear Time Invariant (LTI) system can be implemented in the frequency domain by means of the Fourier Transform • Eﬃcient FFT implementations exist which make it possible to implement very large LTI systems in real time, e.g., room impulse-response convolutions of length 10,000 to 100,000

Introduction to Audio Spectrum Analysis
Spectrum analysis of real-world signals typically occurs in short segments. We are therefore most interested in short-time spectrum analysis: • Spectral content typically varies over time. • The human ear uses less than one second of past sound to form a spectrum. • There is a limit to the length of signal we can analyze at once. To extract and analyze a sound segment, it is necessary to apply a window function. An unmodiﬁed segment extraction corresponds to a “rectangular window”. Everything we ‘look at’ will be through a ‘window’, hence it is important to realize what the window is doing to our underlying signal. Applications we’ll discuss ﬁrst: • Spectral Analysis for Display • FIR Filter Design
12

Example of Windowing
Lets look at a simple example of windowing to demonstrate what happens when we turn an inﬁnite duration signal into a ﬁnite duration signal through windowing. Complex Sinusoid: x(n) = ejωnT , Notes: • real part = cos(ωnT ) • The frequencies present in our signal are only positive. A fancy name for this is an ‘analytic signal’ This signal is inﬁnite duration. (It doesn’t die out as n increases.) In order to end up with a signal which dies out eventually (so we can use the DFT), we need to multiply our signal by a window (which does die out). 0 ≤ ωT < π

13

The following is a diagram of a typical window function:
Zero−Phase Window 1

0.9

0.8

0.7

0.6 Amplitude

0.5

0.4

0.3

0.2

0.1

0 −1000

−800

−600

−400

−200

0 200 Time (samples)

400

600

800

1000

This is loosely called a “zero-centered” (or “zero phase”, or “even”) window function, which means its phase in the frequency domain is either zero or π, as we will see in detail later. (Recall that a real and even function has a real and even Fourier transform.) The window is also nonnegative, as is typical.

14

We might also require that our window be zero for negative time. Such a window is said to be ‘causal’. Causal windows are necessary for real-time processing:
Linear Phase Window (Causal) 1

0.9

0.8

0.7

0.6 Amplitude

0.5

0.4

0.3

0.2

0.1

0 −1000

−800

−600

−400

−200

0 200 Time (samples)

400

600

800

1000

By shifting the original window in time by half its length, we have turned the original non-causal window into a causal window. The Shift property of the Fourier Transform tells us that we have introduced a linear phase term.

15

Putting all this together, we get the following: Our original signal (unwindowed, inﬁnite duration), is x(n) = ejω0nT , n ∈ Z A portion of the real part, cos(ω0nT ), is plotted below:
1

0.8

0.6

0.4

0.2 Amplitude

0

−0.2

−0.4

−0.6

−0.8

−1 −2000

−1500

−1000

−500

0 Time (samples)

500

1000

1500

2000

The imaginary part, sin(ω0nT ), is of course identical but for a 90-degree phase-shift to the right.

16

The Fourier Transform of this inﬁnite duration signal is a delta function at ω0: X(ω) = δ(ω − ω0)

The Convolution Theorem tells us that our multiplication in the time domain results in a convolution in the frequency domain. Hence, in our case, we will obtain the convolution of a delta function at frequency ω0, and the transform of the window. The result of convolution with a delta function is the original function, shifted to the location of the delta function. (The delta function is the identity element for convolution.)

18

0

−5 main lobe −10

−15

Amplitude − dB

−20

−25

−30

−35 sidelobes −40

−45

−50

−3

−2

−1

ωT

0

ω0 1

2

3

19

Summary

• Windowing in the time domain resulted in a ‘smearing’ or ‘smoothing’ in the frequency domain. We need to be aware of this if we are trying to resolve sinusoids which are close together in frequency. • Windowing also introduced side lobes. This is important when we are trying to resolve low amplitude sinusoids in the presence of higher amplitude signals. When we look at speciﬁc windows, we will be looking at this behavior. • A sinusoid at amplitude A, frequency ω0, and phase φ becomes a window transform shifted out to frequency ω0, and scaled by Aejφ. There are many type of windows which serve various purposes and exhibit various properties.

Since the DTFT of the rectangular window approximates the sinc function, it should “roll oﬀ” at approximately 6 dB per octave, as veriﬁed in the log-log plot below:
DFT of a Rectangular Window − M = 20 0

−6.0206 Ideal −6 dB per octave line −12.0412

−18.0618

Partial Main

Amplitude (dB)

−24.0824

Lobe

−30.103

−36.1236

−42.1442

−48.1648

−54.1854

0.1

0.2

0.4 0.8 1.6 Normalized Frequency (rad/sample)

3.2

6.4

As the sampling rate approaches inﬁnity, the rectangular window transform (asinc) converges exactly to the sinc function. Therefore, the departure of the roll-oﬀ from that of the sinc function can be ascribed to aliasing in the frequency domain, due to sampling in the time domain.

26

Sidelobe Roll-Oﬀ Rate In general, if the ﬁrst n derivatives of a continuous function w(t) exist (i.e., they are ﬁnite and uniquely deﬁned), then its Fourier Transform magnitude is asymptotically proportional to constant (as ω → ∞) |W (ω)| → n+1 ω Proof: Look up “roll-oﬀ rate” in text index. Thus, we have the following rule-of-thumb: n derivatives ←→ −6(n + 1) dB per octave roll-oﬀ rate (since −20 log10(2) = 6.0205999 . . .). This is also −20(n + 1) dB per decade. To apply this result, we normally only need to look at the window’s endpoints. The interior of the window is usually diﬀerentiable of all orders. Examples: • Amplitude discontinuity ←→ −6 dB/octave roll-oﬀ • Slope discontinuity ←→ −12 dB/octave roll-oﬀ • Curvature discontinuity ←→ −18 dB/octave roll-oﬀ For discrete-time windows, the roll-oﬀ rate slows down at high frequencies due to aliasing.
27

• As M gets bigger, the mainlobe narrows (better frequency resolution) • M has no eﬀect on the height of the side lobes (Same as the “Gibbs phenomenon” for Fourier series) • First sidelobe only 13 dB down from main-lobe peak • Side lobes roll oﬀ at approximately 6dB per octave • A phase term arises when we shift the window to make it causal, while the window transform is real in the zero-centered case (i.e., when the window w(n) is an even function of n)
28

Frequency Resolution
The next series of plots shows the eﬀect that an increased window length has on our ability to resolve 2 sinusoids. Two Cosines (“In-Phase” Case) • 2 cosines separated by ∆ω =
2π 40

One Sine and One Cosine (“Phase Quadrature” Case) All Four Resolutions Overlaid • Same plots as on previous page, just overlaid • Peak locations are biased in under-resolved cases, both in amplitude and frequency
0.5 M=20 M=30 M=40 M=80

0.45

0.4

0.35

0.3 Magnitude

0.25

0.2

0.15

0.1

0.05

0

0

0.5

1

1.5

2

2.5

3

Frequency ωT (rad/sample)

The preceding ﬁgures suggest that, for a rectangular window of length M , two sinusoids can be most easily
31

resolved when they are separated in frequency by ∆ 2π ∆ω ≥ 2ΩM ΩM = M This implies there must be at least two full cycles of the diﬀerence-frequency under the window. (We’ll see later that this is an overly conservative requirement—a more careful study reveals that 1.44 cycles is suﬃcient for the rectangular window.) In principle, arbitrarily small frequency separations can be resolved if • there is no noise, and • we are sure we are looking at the sum of two ideal sinusoids under the window. However, in practice, there is almost always some noise and/or interference, so we prefer to require sinusoidal frequency separation by at least one main-lobe width (of the sinc-function in this case, or the window transform more generally) whenever possible. The rectangular window provides an abrupt transition at its edge. We will later look at some other windows which have a more gradual transition. This is usually done to reduce the height of the side lobes.
32

Resolution Bandwidth (resolving sinusoids) Our ability to resolve two closely spaced sinusoids is determined primarily by the main lobe width of the Fourier transform of the window we are using. Let Bw denote the main lobe width in Hz, with the main lobe width deﬁned as the width between zero crossings:
7 6 5 4 3 2 1 0 -1 -2 -4

fs |f2 − f1| Thus, to resolve the frequencies f1 and f2 under a rectangular window, it is suﬃcient for the window length M to span at least 2 periods of the diﬀerence frequency f2 − f1, measured in samples, where 2 is the width of the main lobe, measured in sidelobe-widths. M ≥2 A rectangular window of length or greater is said to resolve the sinusoidal frequencies f1 and f2.