Understanding Natural Sound

Understanding Natural Sound

Figure 1: Typical frequency response effects of reflections and air loss over distance. This is true for natural sources and speakers. Speaker systems can be equalized to mimic a particular source/listener distance.

One of the most enduring goals in sound system performance is to create a “natural” sound. In the case of sound reinforcement systems, this can be achieved with one simple step: mute the speakers. Within milliseconds we will be reminded that we don’t actually want natural sound. The voice coming from the stage sounds distant and hard to understand. We are suddenly aware that everyone around us is coughing. Without the support of a sound system, the human speakers must talk in an unnatural way in hopes of projecting to the rear. There is nothing natural about actors whispering loudly, but it beats not hearing the lines.

The goal of a sound reinforcement or cinema video playback system is very rarely to sound natural. It is, instead, to sound bigger, closer, and easier to understand than natural. In short, our business is creating magic sound, not natural sound. Our challenge is to maintain the illusion of natural sound so people are not distracted by our manipulations. To succeed, we must not let anything slip that alerts the listener to the man behind the curtain. The ear’s ability to detect even very slight changes in the audio stream is wired into our brains and those of every motile species on Earth. Ask a deer hunter about how quickly your prey can locate you when you make a sound.

This article will discuss some of the key issues involved in the creation of magic “natural” sound and how to avoid having our tricks discovered.

Figure 2: Localization mechanism in the horizontal plane

Transmitting Natural Sound

Let’s first spend a moment examining the behavior of natural sound to better know what we are working toward. Natural sound emanates from a source whose placement is definable, such as a voice or a musical instrument. The sound then propagates from this source in a very predictable way that affects its loudness, frequency response, and timing. The point of origin is important because our visual link to the source provides the listener with expectations of what is natural for the given distance. When we see someone speaking at 30 meters, we don’t expect it to sound as if they are right next to us. We know this difference intuitively, but we may not have thought through the acoustical physics in play here.

The inverse-square law, which states that the sound level will drop 6dB for every doubling of distance, governs the level component. This law is broken more often than a highway speed limit. In fact, unless you spend a lot of time skydiving or in anechoic chambers, you have probably never heard this law fully adhered to. Why? Because any reflected energy is added to the direct sound like a sonic recycling program and reused. This causes the level reduction to be less than the 6dB called for by law but not evenly over frequency. The reflections favor the low-frequency range over the highs because at low frequencies the sources are more omnidirectional and the surfaces are more reflective. The opposite is true for the HF range. Even the direct sound has reductions greater than the inverse square law because air is a lossy medium in the VHF range. As direct and/or reflected path lengths get longer, there is more and more VHF range loss compared to all other ranges.

The low-end rise will even occur outdoors, where at the very least we have a reflecting ground plane. The tilt becomes more dramatic for indoor spaces. The longer the distance and the bigger and more reverberant the space, the more we can expect to hear the frequency response tilt lows up and the highs down compared to the midrange. Does this mean a large-volume room such as an opera house will make a voice on stage sound as if the highs are rolled off and the bass boosted in the back row? Absolutely yes, when compared with standing a few feet in front of the singer. They are both natural sound.

Receiving Natural Sound

Our aural localization system can identify the origin of a sound source in both the horizontal and vertical planes. The mechanisms, however, are totally different. The horizontal system is a dual-channel comparator that monitors relative level and arrival time between our two ears (hence the term binaural hearing). For example, a single sound source ahead and to our left will be both louder and arrive earlier at our left ear. The two horizontal systems validate each other in the brain, and the source is conclusively localized. With a single natural source, it is difficult to conjure up a scenario where our brain’s two horizontal systems are in conflict. At the very least this would require a reflection, which is, in effect, a second source. As we will see shortly, we can create localization conflicts with multiple speakers. Our vertical plane localization is two independent single channels. Each ear maps out the vertical plane separately by the learned reflection signature of our outer ear structure, the pinna. Sound coming from above us is reflected differently into the ear canal than sound below and so on. This mechanism is far less sensitive than the horizontal system. When multiple arrivals occur, such as direct sound and a reflection, the conflict is resolved primarily by loudness, with arrival time being a much smaller factor than in the horizontal plane.

Understanding Natural Sound

Our aural system coexists with our visual sense, and the brain expects the results to match. The range and location seen by our eyes should match our ears. The audio range finder is the amount of tilt in the frequency response, and the audio localization is done by the mechanisms just discussed.

The last part is timing. Sound is slower than light, so natural sound transmission means that the sound will always be behind what we see. Interestingly, though, we are amazingly easy to dupe on this. We can easily be fooled and in fact, often get annoyed when we are not. I will explain with some examples. If we are 33 meters (110ft.) from a person speaking on stage, we should expect to notice the sound being approximately 100 milliseconds behind what we see. That is natural. But if there was video projection where we could see moving lips, we would complain that the “sound system” (i.e. natural sound) is out of sync. I watch fireworks and I am surprised over and over that sound and light are out of sync. I am firmly convinced no one would notice if the sound was magically accelerated to the speed of light for these events. Movies prove it. Such magic acoustics are found in the explosions in Hollywood films, where no matter the distance (and even in outer space), we are always in time. So when it comes to synching to video, we need not mimic natural sound behavior but rather improve it in order to be perceived as natural.

How Speakers Relate to Natural Sound

Now that we know how natural sound is transmitted and received, we can look at how we can push the envelope and make magic sound without detection. Let’s first look at the things to avoid. Visible speakers would be a start, but they’re not actually the most important. A set of black boxes can disappear from the mind very quickly if we cannot detect their usage. But even the most cleverly hidden speaker will give up its position if it breaks one of the big rules.

First, there can’t be distortion, hum, noise, rattles, or buzzes. No matter how loud a human shouts, they don’t go into clipping or feedback. The sound system must have enough headroom to be free from these risks. On the other side, the noise floor must be under control so there is no audible line frequency hum or hiss, or digital quantization.

A less obvious matter is the frequency range of the speaker system. There is nothing in natural acoustics that limits the frequency range of sound transmission. The limits are our ears and our speaker systems. To mimic natural sound requires a full-range sound system, not one that covers just the vocal range. Systems with truncated frequency extremes are easily detected by the absence of the full spectrum found in natural sound. Conversely, a system that exaggerates either the LF or HF regions will also be very easily detected. Equalization will be covered shortly.

Maintaining a plausible link between the visual placement of the original sound source and its amplified copy is critical to the illusion of natural sound. The location of the speaker(s) will be the most critical parameter. The ideal speaker locations would image well to the performers and have enough acoustic gain to be stable. How about for chest-mounted, battery-powered speakers for all performers?

There is no way to win the localization game without good locations, and yes, the plural is intentional. Since we can’t place speakers on the performers, we are left with two options: go up or go out. We will need to do both. Because localization errors can be detected in either the horizontal or vertical plane, we will need to have positions that can accomplish this for a wide variety of seats. A solo center cluster can be effective for a balcony, providing a centered horizontal image and, unless it is too high, an acceptable vertical image. By contrast, this location would create extreme vertical image distortion at the floor level. Now let’s go the other way: left/right systems alone. These can be placed low on the sidewalls, which gives us a realistically low vertical image. Unfortunately it will pull the off-center seats to one side or the other in the horizontal plane.

Therefore a triangular configuration (at least) is required to up our chances of realistic localization. The center helps to mitigate our sensitive horizontal localization mechanism. The level and timing relationship between the three sound systems and the fourth source (the original) will determine the sound image placement for each different seat in the hall. The geometry makes it absolutely impossible to develop a perfect timing and level scheme that will guide all seats to the original image source. This is a very complicated subject on its own, but suffice to say, that despite the fact that timing relationships get the most press on this matter, it is level that is the most dominant parameter.

First we must remember that level is half the game in the horizontal plane and practically all of it in the vertical. Level-based localization will hold up over much larger proportions of a hall, especially in large spaces. Level relationships between sources are ratios, and therefore they scale well. Timing relationships are not ratios; they are differences. And the trouble is we have only about 7 milliseconds of difference to work with in the horizontal plane before the game is over. That is about 2 meters of path length difference.

The shape of the room will help guide the strategy here. Narrow rooms can do well with big left/right systems and a small centerfill to help keep the horizontal image inward. Wide rooms, and particularly fan-shaped rooms, will favor a big central system and smaller left/right systems to bring the image down. All configurations benefit from frontfills, which provide both vertical and horizontal help.

Equalizing for a Natural Sound

A human speaker is a point source with a typical coverage angle of about 60 degrees and like a loudspeaker, is more directional in the highs than lows. That does not mean we need a 60-degree speaker system, but we do need to keep the system free from the frequency response distortions of multiple sources known as comb-filtering. We can create combing by cupping our hands around our mouth. We don’t want our speakers to sound like this, therefore we must minimize multi-sourcing between speakers and strong reflections. This is a giant subject of its own, big enough to write a whole book about (I did), but we will touch on it briefly here. The system must be designed with extreme care regarding the overlap between speakers. If speakers have lots of overlap, they must be kept physically close together. If they are far apart, then the overlap must be minimized. Otherwise a large proportion of our listening area will have large peaks and dips that cannot be solved effectively with an equalizer.

One of equalization’s primary roles in maintaining natural sound is range setting. This is done by how flat we make the system. The frequency areas of interest are the upper and lower extremes: below 100Hz and above 4kHz. These relate to mimicking the natural effects of sound transmission over distance described above. If a system is razor flat from top to bottom, then we are sonically placing the listener so close that there is minimal LF addition from the room and HF reduction from the air. Of course we want to move the audience closer but not so close it becomes uncomfortable or grossly out of proportion. A system with lows too far reduced or highs boosted above flat is patently unnatural since you are placing the listener practically in the source. Therefore we let the low end gradually rise and the high end gradually fade until the sonic range of the system is adjusted to the scale appropriate to maintain a magnified natural sound.

Crossover Considerations

One of the characteristics of loudspeakers not found in natural sources is the acoustical crossover. Amazing story: I worked with a performer who was told for years that her voice had a crossover at 800Hz and that she should only use a particular speaker system with the same crossover. And we wonder why people are skeptical of sound engineers!

Our speaker systems sub-divide the spectrum to maximize power, frequency range, and control of coverage. Since audible response transitions are a clue to unnatural sound, we must take great care at the crossover. On one hand there are advantages to making sharp transitions to minimize interaction during the hand-off. On the other hand, sharp transitions are the most easily audible. Consider the movement from a front-loaded cone driver to a horn at 1kHz. The cone’s directional pattern may be very wide as it transitions into a narrow horn. This can make an easily detected transition as a source moves up the scale.

There is much more to this subject, but hopefully this will provide some insight into how to provide the most natural sound from your speaker system.