This 2004 article, written by S. V. Rice and
S. M. Bailey, tells the story of FindSounds.

Sounds are selected and incorporated
into theatrical productions and radio and television programs.
Music is composed of sounds, and music combined with dialogue and sound
effects forms the movie soundtrack. Sounds are vital to animation
and computer games.

Sounds for
Theatre, Film, Radio, and Television

Sound effects were used in the ancient Greek
theatre of Aeschylus, Euripides, and Sophocles. In Elizabethan
theatre, scripts called for the sounds of alarms, chimes, and gunshots,
and skilled vocalists imitated the baying of hounds and crowing of
roosters. Many theatres utilized "thunder runs," sloping wooden or
iron alleys down which cannon balls were rolled to produce the sound of
thunder [1]. In 1708, John Dennis devised an improved method for
making thunder: shaking a metal sheet that is suspended by wires.
His "thunder sheet" was widely copied by others, whom he accused of
"stealing his thunder," originating the expression.

Silent films
were accompanied by a pianist or organist and often by sound-effects
artists working their craft. In the 1930s, the production of sound
effects for "talkies," theatre, and radio increased in sophistication.
Thousands of prerecorded sounds became available on 78 rpm phonograph
records, and "manual" sound effects were created by clever use of an
enormous variety of objects and devices. A 1936 "how-to" guide,
written by a stage director of the Old Vic Theatre in London, instructs
the "effectsman" in the art of creating "noises off" (off-stage sound
effects) including household, machine, nature, and "explosive" sounds
[2]. A 1940 guide for radio describes how to make sounds using
"gadgets that can be found in most attics or basements" and mentions an
"alphabetical glossary" at NBC Radio containing thousands of techniques
for sound generation [3]. Such a cookbook might include the
following recipes.

fire —
crinkle cellophane; the faster you crinkle, the bigger the fire

rain —
sprinkle salt on paper

walking in mud —
handle a soggy newspaper

The Warner Brothers were especially creative
in their cartoons. The sound of the Road Runner flipping his
tongue was produced by rapidly popping a finger out of a bottle five
times. An inertia starter for an old prop plane created the sound
of the Tasmanian Devil spinning wildly.

In Raiders of the
Lost Ark (1981), the sound of face punches came from slapping a
leather jacket onto the hood of an old fire engine and by dropping
overly ripe fruit onto concrete; the sound of the giant rolling boulder
is the sound of a Honda station wagon rolling down a gravel slope.
The ghost sounds in Ghostbusters II (1989) were produced by a
rice steamer [4,5].

"Sound is 50% of the motion
picture experience." — George Lucas

Sounds for Music

The musical
instruments available to composers of Western music were essentially
unchanged throughout the 18th and 19th centuries. By the beginning
of the 20th century, composers sought to enlarge the palette of sounds.
The percussion section, home to unconventional instruments, was expanded
by Debussy and Strauss. In addition to innovative use of
percussion, Stravinsky and Bartók devised novel techniques for playing
traditional instruments to obtain new sounds.

Russolo and
Marinetti of the Italian Futurist movement presented a concert in Milan
in 1914 that employed "bumblers, exploders, thunderers, and whistlers."
Satie's Parade, incorporating sirens, starting pistols,
typewriter, and foghorn, caused a scandal in Paris in 1917; conservative
listeners considered it blasphemous for music to include such sounds.
A similar furor erupted in New York in 1927 when Antheil's Ballet
Mécanique was performed by an ensemble of pianos, anvils, bells,
buzzers, saws, car horns, and airplane propellers.

In the 1920s,
Edgard Varèse crusaded for the right to make music with any and all
sounds. His sentiments were echoed in the 1930s by John Cage,
whose First Construction in Metal (1939) utilizes five
differently-pitched thunder sheets and four brake drums. Pierre
Schaeffer, the "Musician of Sounds," led a group of Paris musicians
known as Musique Concrète. His pioneering composition Étude
aux Chemins de Fer (1948) is a fascinating montage of sounds
recorded at the Paris train depot and demonstrated that any sound is raw
material for creative use.

The arrival of electronic sound
synthesizers was heralded by many composers. The RCA Electronic
Music Synthesizer of the 1950s could generate a sequence of sounds and
the composer could specify the pitch, volume, color, articulation, and
duration of each sound. Varèse extolled the "electronic medium"
for adding "an unbelievable variety of new timbres to our musical
store," and for "the possibility of obtaining any differentiation of
timbre, of sound-combinations, and new dynamics far beyond the present
human-powered orchestra." The Moog synthesizer of the 1960s, made
famous by Wendy Carlos in Switched-on Bach (1968), was the
first synthesizer to be mass produced, and by the early 1970s, the use
of synthesizers was widespread [6-8].

"I don't care too much about music. What I
like is sounds." — Dizzy Gillespie

Transforming Sounds to Create More Sounds

Phonographs with variable speed control were needed in the 1920s to
play "78 rpm" records because the speed at which they were actually
recorded ranged from 70 to 85 rpm. Interesting sounds can be
created by slowing down or speeding up a recording, so the speed control
became a valuable tool of the sound designer. Hindemith and Toch
composed short pieces using phonographic speed change by 1930, and
Varèse, Cage, and Schaeffer experimented considerably with the
technique. In his book on sound effects, Robert L. Mott recounts
how a single recording of a waterfall, when played at different speeds,
was used to create the sounds of ocean surf, city traffic, a jet
airplane, an atomic bomb explosion, and a printing press [9]. In
Indiana Jones and the Last Crusade (1989), a recording of
chickens was speeded up and used as the sound of a cave filled with rats
[10]. Walter Murch, regarded as the dean of sound designers, would
change the speed of a sound (for example, the outboard motor in
Godfather II, 1974) so that it would harmonize with the background
music and prevent dissonance [11].

The physical environment in
which sounds are recorded can have a great influence on the recording.
A carpeted living room, a tiled bathroom, a suburban backyard, and an
urban alley alter sounds in distinct ways. Sounds can be recorded
through a window, open or closed. The sound of Luke Skywalker's
land speeder in Star Wars (1977) is the sound of a Los Angeles
freeway recorded through a vacuum-cleaner tube [10]. Ann Kroeber's
Common Sounds Heard in Uncommon Ways (2000) includes sounds
captured by microphones placed inside a steam iron and a soda
machine.

In a process known as "sweetening," sounds are layered
to create new sounds. For King Kong (1933), the
pioneering Murray Spivack devised the sounds of the giant ape by
blending recordings of lions and tigers, some played in reverse and at
different speeds. For the 1998 version of Godzilla, sound
designers developed the monster's roar by combining musical instrument
and animal sounds with the original roar from the 1950s Japanese films.
The voice of Chewbacca in Star Wars was constructed from bear,
dog, lion, and walrus vocalizations. The sound of the sandworms in
Dune (1984) was a mixture of speed-altered recordings of a
baboon, horse, puma, and several pigs [12]. The sounds of
torpedoes in The Hunt for Red October (1990) were layered with
"animal growls and shrieks, a Ferrari engine, and a screeching screen
door spring" to "imbue the weapon with a vengeful purpose [5]."

Sounds may be transformed electronically by a variety of techniques
including equalization, filtering, reverberation, modulation, chorusing,
flanging, and phasing. Digital audio workstations make it easy to
edit sounds and to juxtapose and overlay them in unlimited ways, what
David Sonnenschein has so aptly termed "the digital sculpting of sounds
[5]."

"Choice is the beginning of art." — Igor
Stravinsky

Storage and Retrieval of Sounds

The phonograph was the first device for audio storage and retrieval.
For a 1930s radio drama, a sound-effects artist would play sound-effects
records using three or more turntables, each with two tone arms and
speed and volume controls. Effects were marked on the records with
chalk for fast cueing. The dexterity of the 1930s artist would
impress today's hip-hop disc jockeys, for whom the turntable is a
musical instrument in its own right.

In the early 1950s, the tape
recorder became a tool for creative use. Tapes could be speeded up
and slowed down, and could be cut and spliced for editing.
Multi-track tapes facilitated sound mixing. In the 1960s,
cartridge and cassette tapes emerged along with the "cart machine" for
triggering the playback of cartridge tapes.

Audio went digital
with the arrival of the compact disc (CD) in 1983 and digital audio tape
(DAT) in 1987. Digital samplers also arrived in the 1980s,
enabling brief digital recordings or "samples" to be played by pressing
the keys of a piano-style keyboard. (The first sampler, the
Mellotron, was developed in the 1960s and assigned a tape to each key.)
By the 1990s, digital recordings were commonly stored in computer disk
files, and affordable software became available to play, record, and
edit them.

Sequential listening to audio recordings is a tedious
way to search for sounds. The printed liner notes of records,
tapes, and CDs provide descriptions of recordings, and in electronic
form, they can be searched by keyword to locate recordings of interest.
However, the value of this technique is limited by the fact that sounds
are so difficult to describe.

Onomatopoeia is the formation of
words to imitate sounds, for example, buzz, crunch, hiss, pop, screech,
and thud. People who catalog sounds have raised onomatopoeia to an
art form in desperate attempts to describe sounds. The following
descriptions appear in a current sound-effects catalog: gedunk, kablam,
kabong, pingy wobbles, wiggle bowang, zing. And catalogers work
overtime to find the right adjectives: "searing harmonic slashes,"
"industrial amorphous textured presence," "incendiary fuzz mutations."
Such descriptions convey little information, do not translate well to
other languages, and are nearly useless for keyword searches.

Describing the source of a sound, if known, is far easier than
describing the sound itself, and most catalogers resort to this
approach. Most of us know the sounds of a "Honda Accord idling,"
"several coins dropped on a tile floor," and a "roller coaster passing
by." Source descriptions are less useful if we are unfamiliar with
the sounds, for example, "llama vocalizing," "slab of steel emerging
from a furnace," and "water lock gates opening."

Although easier
to describe, the source of a sound is of little interest to a sound
designer who intends to use the sound for something else. In fact,
knowing the source makes it harder to evaluate the sound. It is
difficult to imagine that a cat can create the sound of a monster, but
if you don't know that a sound came from a cat, you can listen to it
objectively. Mott encourages sound designers to "disassociate the
names of the sounds with the sounds themselves" and to "concentrate on
the sound" and "ignore its source [9]." Legendary sound designer
Ben Burtt makes it a practice to play sounds for the director without
telling him their source so that he will listen to them without being
influenced by their origin [4]. Gary Rydstrom, another renowned
designer, believes the most important talent for sound design is the
ability to separate what a sound is from how it is made [5].

If
the source of a sound is a synthesizer, then how should it be described?
Consider a synthesizer sound used in a Star Trek movie to warn
that the dylithium crystals are going to overload [4]. "Weird
electronic sound" and "dylithium crystal alarm" are clearly inadequate
for retrieval purposes. A synthesizer can generate thousands of
sounds that cannot meaningfully be expressed in words.

The
limitations of searching for sounds by searching their text descriptions
have inspired computer scientists to develop methods for content-based
audio retrieval. In a "sounds-like search" or "query by sound
example," a computer algorithm identifies the sounds in a collection
that are most similar to an example or prototype sound. Recordings
are retrieved based on how they sound, regardless of how or if they have
been described in words. The example sound may be all or part of
any recording. It may be an ad hoc recording of the user's voice
or props mimicking a desired sound, or a recording that has been
retrieved by a prior sounds-like search or keyword search.

The
Comparisonics® "sound-matching" algorithm was
developed in 1997. In the "indexing" step, digital audio data is
analyzed by the algorithm and characterized by "signatures," where each
signature is a vector of perceptual features encoded as a 16-byte
quantity. In the comparison step, a signature is derived from the
prototype and compared with the signatures computed for an indexed
collection. For each indexed sound, a score is determined
indicating the degree of similarity between the sound and the prototype,
ranging from 0 (least similar) to 100 (most similar, i.e., identical).
The sounds most like the prototype are displayed for the user in order
of decreasing score, so that the best matches appear first in the list.
The time required to compute the signature of a recording is less than
one percent of the recording's playing time; therefore, sounds may be
indexed in real time, as they are being recorded. In the
comparison step, similarity scores can be computed for more than two
million pairs of signatures per second.

This algorithm emulates
the human perception of sound similarity. Computers lack ears and
human intelligence, so it is a challenge to develop an algorithm that
hears sounds like humans. Ultimately, humans are the judge of its
accuracy. The Comparisonics algorithm is designed to work for all
possible sounds and can compare recordings even if they differ in their
duration, sample rate, file format, resolution, or compression.

Searching the Web for Sounds

FindSounds.com
is a free Web site developed by Comparisonics Corporation where visitors
can search the Web for sounds. It is a Web search engine like
Google, but on a smaller scale and with a focus on sounds. Each
month it processes more than one million sound searches for more than
100,000 unique visitors. Since its debut on August 1, 2000, it has
processed more than 35 million sound searches. FindSounds.com
appeals to the general Internet audience and is especially valuable to
sound designers, musicians, filmmakers, videographers, animators, and
game developers.

Like other Web search engines, queries are
processed using a precomputed index of Web files. However, rather
than indexing HTML pages or image files, the FindSounds index stores
information about audio files. In response to a query, a list of
"hits" provides links to audio files. Clicking on a link causes an
audio file to be downloaded and played by an audio player program on the
user's computer (e.g., Windows Media Player). Any file may be
saved to the user's hard drive. Like any Web content, files may
contain copyrighted material and it is the user's obligation to obtain
copyright clearance if required for the intended use.

Keyword
searches are performed by entering any word or phrase in a search box,
or by clicking on one of the 500 "keyword links" that appear within
categories on the
Sound
Types page. For example, clicking on the "elephant" link is a
shortcut for typing "elephant" into the search box. The results of
a keyword search for "bell" are shown in Figure 1 below. Up to 200
hits may be retrieved and are displayed ten to a page. Clicking on
a URL or play icon downloads and plays a file. A short description
of the sound appears in bold lettering below the URL, followed by the
file size, number of channels, resolution, sample rate, and duration.
Clicking on the "show page" link displays a Web page that refers to the
file and may contain copyright information. The "e-mail this
sound" link makes it easy to e-mail the file's URL.

Figure 1. List of hits for a keyword search
at FindSounds.com.

Notably, above each URL is a Comparisonics waveform
display. This is an audio waveform display that has been color
coded to convey the frequency content of the recording. Reds
signify high frequencies, greens denote middle-to-high frequencies,
blues represent low-to-middle frequencies, and dark colors indicate low
(bass) frequencies. Similar sounds are mapped to similar colors,
and changes in sound are seen as changes in color. This display
serves as a "thumbnail" image providing information about the sounds in
a file. Users learn to "read" the waveform, that is, they can get
an impression of what a file will sound like simply by inspecting its
waveform, which helps them to decide which files to download and play.

To the right of the play icon is the sounds-like search icon.
Clicking on this icon launches a sounds-like search that utilizes the
Comparisonics sound-matching algorithm to
locate sounds on the Web that are similar to this sound. The 200
best matches are returned, ten to a page, in order of decreasing
similarity to the prototype. The matches are determined based
entirely on their audio characteristics, uninfluenced by file names and
text descriptions. As a result, the sound of a revving engine may
match a growling tiger, screeching tires may match a ranting chimpanzee,
and a tympani roll may match a rumble of thunder. Such matches are
of interest to sound designers but would never be discovered from text
descriptions. A sounds-like search is a tool for browsing,
exploring, and discovering sounds.

A "combined" search is both a
sounds-like search and a keyword search. After performing a
sounds-like search, the user can limit the display of matches to those
that have been described using a particular keyword. For example,
if the prototype is the sound of an engine, the user might choose to
limit the display of matches to those labelled "engine."
Creatively applied, a combined search can find coyote howls that sound
like a siren and saxophone samples that resemble an elephant's bellow.

The FindSounds index is highly selective. It does not include
speech or song recordings, although it does include non-speech
utterances of the human voice (e.g., a grunt or scream) and samples of
notes, chords, and beats that could be incorporated into a song.
Because speech and song recordings are excluded, a keyword search for
"elephant" returns only elephant sounds. By contrast, an
indiscriminant indexing of audio files produces a list of hits in which
elephant sounds are interspersed with recordings of people speaking
about elephants and with songs about elephants (e.g., Henry Mancini's
Baby Elephant Walk).

The FindSounds index is created by
a semi-automated process. First, the FindSounds "spider" program
finds audio files on the Web and downloads them for analysis.
FindSounds.com is focused on short recordings, so files longer than 10
seconds are rejected. A file will also be rejected if it has an
invalid format or unsupported compression, or is a poor-quality
recording (i.e., is too quiet, has an excessive DC offset, or has a
sample rate below 8kHz). The analysis automatically rejects about
90% of the files. The remaining 10% proceed to the auditioning
phase in which a human listener rejects any file that contains at least
one spoken word (to exclude speech recordings) and any file that
contains a sequence of at least three different notes or chords (to
exclude song recordings). Any file deemed obscene is also rejected
(to make FindSounds.com safe for children to use). About 85% of
the auditioned files are rejected.

Text descriptions cannot
reliably be derived in an automatic way from audio file names or from
text that surrounds links to audio files; therefore, accepted files go
through a labelling process in which a human cataloger listens to each
file and enters a description for it, if it is possible to do so.
These descriptions appear in bold lettering in a list of hits and are
used to answer keyword queries. However, many sounds defy
description. About 58% of the files in the index are described in
words; the remaining 42% are unlabelled, yet can be retrieved by a
sounds-like search.

Automatic duplicate detection is an essential
part of the indexing process. The FindSounds spider has located as
many as 367 identical copies of a single recording. URLs of copies
are saved in a database so that if one copy becomes inaccessible (i.e.,
the file goes offline), the index can be updated to refer to another
copy. Users receive the URL of only one copy in a list of hits so
they are not bothered by multiple hits for identical files.

Over
its lifetime, the FindSounds spider has located about 10 million audio
files on the Web and about 90% of these were rejected automatically.
The remaining one million files, after duplicates are detected,
represent about 600,000 different recordings. Of these,
auditioners have accepted about 100,000 for inclusion in the FindSounds
index. However, because files on the Web become inaccessible over
time, the current number of indexed files is about 50,000.

Expanding the
Search

FindSounds Palette is a software program introduced by Comparisonics
Corporation in 2002 that extends the capabilities of FindSounds.com.
It is an audio player, recorder, editor, database, search engine, and
Web browser, all in one program. FindSounds Palette provides
access to a palette of sounds stored locally and on the Web.

Users can catalog and search audio files stored on their local disks and
local area network. A database named "MyPalette" stores
information about local audio files. The user may enter the
following metadata into MyPalette for each file: description, source,
copyright, notes, genre, key, and tempo. In addition, each file
may be placed in a class (Effect, Instrument, or
Other) and in a category and sub-category. The main window of
the program displays a hierarchical view of MyPalette files organized by
class, category, and sub-category.

The FindSounds index is
accessible from the program and is called "WebPalette." With one
query, a user can search MyPalette and WebPalette to find local and
remote files satisfying search criteria. Up to 200 MyPalette hits
are returned in one list, and up to 200 WebPalette hits are retrieved in
another. For each hit, icons are provided for playing the file,
opening the file in the audio editor, and launching a sounds-like search
using the file as the prototype. Once opened in the audio editor,
a WebPalette file can be saved locally to MyPalette.

Sounds in
MyPalette and WebPalette are located by keyword, sounds-like, and
combined searches. For any search, the user may place restrictions
on file format, file size, number of channels, resolution, sample rate,
duration, key, and tempo. Keyword searches may apply to any
combination of text fields: file name, description, source, copyright,
notes, genre, category, and sub-category. The user may specify a
desired range of similarity scores in a sounds-like search.

Users
can search not only the sounds of local and remote files, but also
sounds obtained by changing the speeds of these recordings. Each
file in MyPalette may be indexed at its normal speed and 24 speed
variations: the normal speed increased by one to 12 semitones (one
octave) and decreased by one to 12 semitones. This has the effect
of multiplying the size of the local audio collection, but without
occupying additional disk space because each audio file is stored only
once, at its normal speed. A collection of 10,000 local audio
files thereby becomes a searchable database of 250,000 sounds. The
sound that the user is seeking may already be on the user's hard drive
but has yet to be heard by human ears.

Each WebPalette file is
indexed at more than 40 speeds. The 50,000 sounds in the
FindSounds index become a searchable collection of 2,000,000 sounds,
which amounts to more than 1500 hours of audio. Users can find
many interesting matches in this expanded collection. A speed
variation is indicated in a list of hits by a number of semitones that
is positive if the variation is faster than normal speed and negative if
it is slower. The Comparisonics waveform display is colored to
represent the sound of the speed variation, and when a user clicks on
the play icon, the recording is played at the indicated speed.

In
the audio editor, the user can play, record, and edit an audio file
while viewing its Comparisonics waveform display. Editing
operations include cut, copy, paste, mix, delete, fade, adjust volume,
change speed, undo, and redo. In addition, metadata describing a
MyPalette file may be entered and edited. The user may pan and
zoom the waveform display; its colors help the user to "see" the sounds.
The user may select any sound by highlighting it in the waveform
display. Clicking on the sounds-like search icon retrieves sounds
in MyPalette and WebPalette that are similar to the selected sound.
A user can be recorded mimicking a desired sound and the recording can
be edited or speed-changed to "fine tune" it before launching a search
for similar sounds. When a local sound is used as the prototype in
a WebPalette search, a signature is computed to characterize the sound,
and it is the signature, not the voluminous audio data, that is
communicated over the Internet to the FindSounds query processor.

In Figure 2a below is the Comparisonics waveform display of a
recording of a whale that has been speeded up by four semitones.
The first part of the recording has been selected (indicated by the
black background) and is used as the prototype in a sounds-like search
of WebPalette. Figure 2b shows a list of hits in order of
decreasing similarity score. Because the hits sound similar to the
prototype, their waveforms have similar colors. Each hit is a
speed variation indicated by a positive or negative number of semitones.
In this example, the prototype matched speed-altered recordings of
whales, loons, a sparrow, a mosquito, human burps, radar beeps, a bell,
a whimpering gorilla, a screaming toad, a Japanese wood flute, radio
beacons, and the routing tone used by the Irish telephone system.

Figure 2b. List of hits for a sounds-like
search in FindSounds Palette.

Future Directions

Computer
technology has contributed to the "democratization" of multimedia
production. Music composition and movie editing can be
accomplished using personal computers and millions of people are
embracing the opportunity. Creative people seek the best access to
the most sounds. FindSounds.com and FindSounds Palette have
succeeded in increasing the access to sounds; however, there is more
that can be done.

Today there are countless hardware and software
devices for electronically synthesizing and transforming sounds,
offering limitless possibilities. However, these devices currently
have no mechanism in place for searching the sounds they produce.
A user explores the sounds of a synthesizer by the tedious process of
setting parameters, playing a sound, changing the parameters, playing
another sound, changing the parameters again, and so on. Wouldn't
it be wonderful to perform a sounds-like search of the universe of
sounds that a synthesizer can produce? The user could examine a
list of hits, quickly audition any sound in the list, and obtain the
parameter settings used to generate each sound. This concept could
also be applied to manual sound-making devices (like the gadgets used on
Foley stages) to discover sounds and the recipes for producing them.

Collections of audio recordings are untapped resources. With
only meager access afforded by keyword searches, thousands of sounds
remain hidden. Millions more sounds can be derived automatically
from these collections (via speed change and other transformations), but
are unsearchable without content-based retrieval.

In the year
1624, Sir Francis Bacon wrote New Atlantis in which he
describes his vision of the future. We close with an excerpt that
is prophetic.

"We
have also sound-houses, where we practice and demonstrate all
sounds, and their generation. We have harmonies which you
have not, of quarter-sounds, and lesser slides of sounds.
Diverse instruments of music likewise to you unknown, some
sweeter than any you have, together with bells and rings that
are dainty and sweet. We represent small sounds as great
and deep; likewise great sounds, extenuate and sharp. We
make diverse tremblings and warblings of sounds, which in their
original are entire. We represent and imitate all
articulate sounds and letters, and the voices and notes of
beasts and birds. We have certain helps, which set to the
ear do further the hearing greatly… We have also means to
convey sounds in trunks and pipes, in strange lines and
distances."

— Sir Francis Bacon, New
Atlantis (1624)

References

[1]

D. Kaye and J. LeBrecht, Sound and Music for the
Theatre: The Art and Technique of Design, Focal Press,
2000.