Consider the requirements for acoustically simulating a concert hall
or other listening space. Suppose we only need the response at one or
more discrete listening points in space (``ears'') due to one
or more discrete point sources of acoustic energy.

First, as discussed in §2.2, the direct signal
propagating from a sound source to a listener's ear can be simulated
using a single delay line in series with an attenuation scaling or
lowpass filter. Second, each sound ray arriving at the listening
point via one or more reflections can be simulated using a
delay-line and some scale factor (or filter). Two rays create a
feedforward comb filter, like the one in Fig.2.9 on
page . More generally, a tapped delay line
(Fig.2.19) can simulate many reflections. Each tap brings out one
echo at the appropriate delay and gain, and each tap can be
independently filtered to simulate air absorption and lossy
reflections. In principle, tapped delay lines can accurately simulate
any reverberant environment, because reverberation really does
consist of many paths of acoustic propagation from each source to each
listening point. As we will see, the only limitations of a tapped
delay line are (1) it is expensive computationally relative to
other techniques, (2) it handles only one ``point to point'' transfer
function, i.e., from one point-source to one ear,4.1 and (3) it should be changed when the source,
listener, or anything in the room moves.

Figure 3.1 depicts the general reverberation scenario for three
sources and one listener (two ears). In general, the filters should
also include filtering by the pinnae
of the ears, so that each echo can be perceived as coming from the
correct angle of arrival in 3D space; in other words, at least some
reverberant reflections should be spatialized so that they
appear to come from their natural directions in 3D space
[248]. Again, the filters change if anything changes in
the listening space, including source or listener position. The
artificial reverberation problem is then to implement some
approximation of the system in Fig.3.1.

Figure 3.1:
General reverberation simulation for three sources
and one listener (two ears). Each filter can be implemented
as a tapped delay lineFIR filter.

where denotes the order of FIR filter . Since many
of the filter coefficients are zero (at least for small
), it is more efficient to implement them as tapped delay
lines (§2.5) so that the inner sum becomes sparse.
For greater accuracy, each tap may include a lowpass
filter which models air absorption [314] and/or
spherical spreading loss (see §2.3).
For large ,
the impulse responses are not sparse, and we must either implement
very expensive FIR filters, or approximate the tail of the impulse
response using less expensive IIR filters; this subject--``late
reverberation'' approximation--is taken up in §3.4.

For music, a typical reverberation time4.2is on the order of one second. Suppose we choose exactly one second
for the reverberation time. At an audio
sampling rate of 50 kHz, each filter in
Fig.3.1 requires 50,000 multiplies and additions
per sample, or 2.5 billion multiply-adds per second. Handling
three sources and two listening points (ears), we reach 30 billion
operations per second for the reverberator. This computational load
would require at least 10 Pentium CPUs clocked at 3 Gigahertz,
assuming they were doing nothing else, and assuming both a multiply
and addition can be initiated each clock cycle, with no wait-states
caused by the three required memory accesses (input, output, and
filter coefficient). While these numbers can be improved using FFTconvolution instead of direct convolution (at the price of introducing
a throughput delay which can be a problem for real-time systems), it
remains the case that exact implementation of all relevant
point-to-point transfer functions in a reverberant space is very
expensive computationally.

While a tapped delay lineFIR filter can provide an accurate model for
any point-to-point transfer function in a reverberant environment, it
is rarely used for this purpose in practice because of the extremely
high computational expense. While there are specialized commercial
products that implement reverberation via direct convolution of the
input signal with the impulse response, the great majority of
artificial reverberation systems use other methods to synthesize the
late reverb more economically.

One disadvantage of the point-to-point transfer function model
depicted in Fig.3.1 is that some or all of the filters must
change when anything moves. If instead we had a computational model
of the whole acoustic space, sources and listeners could be moved as
desired without affecting the underlying room simulation.
Furthermore, we could use ``virtual dummy heads'' as listeners,
complete with pinnae filters, so that all of the 3D directional
aspects of reverberation could be captured in two extracted signals
for the ears. Thus, there are compelling reasons to consider a full
3D model of a desired acoustic listening space.

Let us briefly estimate the computational requirements of a ``brute
force'' acoustic simulation of a room. It is generally accepted that
audio signals require a 20 kHz bandwidth. Since sound travels at
about a foot per millisecond (see §B.7.14 for a more
precise value), a 20 kHz sinusoid has a wavelength on the order of
1/20 feet, or about half an inch. Since, by elementary sampling
theory, we must sample faster than twice the highest frequency present
in the signal, we need ``grid points'' in our simulation separated by
a quarter inch or less. At this grid density, simulating an ordinary
12'x12'x8' room in a home requires more than 100 million grid points.
Using finite-difference (Appendix D) or waveguide-mesh techniques
(§C.14,Appendix E) [518,396],
the average grid point can be
implemented as a multiply-free computation; however, since it has
waves coming and going in six spatial directions, it requires on the
order of 10 additions per sample. Thus, running such a room simulator
at an audio sampling rate of 50 kHz requires on the order of 50
billion additions per second, which is comparable to the three-source,
two-ear simulation of Fig.3.1. However, scaling up to a
100'x50'x20' concert hall requires more than 5 quadrillion operations
per second. We may conclude, therefore, that a fine-grained physical
model of a complete concert hall over the audio band is prohibitively
expensive.

The remainder of this chapter will be concerned with ways of reducing
computational complexity without sacrificing too much perceptual quality.