What you say can hurt you

by Doug Hanson On Oct 1, 2007

Can you hear me now?" asks the popular Verizon commercial. In audio analysis, the answer may well have been "Yes." But that didn't mean the evidence could be used. Limitations in first-generation audio analysis techniques and tools often prevented forensic analysts from making sense out of what they heard.

Advances in digital sound recording technology and audio analysis software have changed this. The restriction on audio forensic analysis traditionally has been the ability of software programs to manipulate sound data. Today's sophisticated software packages are a boon to forensic analysts because they permit analysis, manipulation and interpretation of audiotapes to recover information that in the past may have been lost. Now, complex analysis can even take place on older equipment, which has always been able to collect sound data - the limitation has typically been in the software.

Today's police departments and prosecutors understand valuable evidence often rests on an answering machine recording, cell phone message, 911 call tapes, etc. Most state crime laboratories and many private companies now perform audio forensic investigation. As the use of this evidence increases, "mum's the word" may take on new meaning among the criminal element, who may soon find what they say is as incriminating as what they do.

The value of audio evidence Audio evidence was once overlooked because law enforcement lacked convenient means of analysis. This changed as federal funds became available to enhance forensic laboratories' ability to analyze both audio and visual evidence. In fact, the Department of Homeland Security has allocated funds for small- to mid-sized departments to add audio forensics capabilities.

A number of companies produce forensic audio analysis software packages. These vary in the types and depth of analysis they perform as well as in price, which can range from $400 to $1,500. Among the companies marketing this software are: Enhanced Audio Inc. of York, Pennsylvania; Cedar Audio of Portland, Maine; Clarifying Technologies of Raleigh, North Carolina; and Sound Forge, a division of Sony Corporation. (See "Establishing the audio/video forensic lab" on Page 140.)

But it's no panacea. It's true advances in electronic communications and security technology that have led to an increase in the number and types of audio recording devices (cell phones, video cameras, computer voicemail, cassette recorders and audio recording security cameras to name a few). However, in many cases the recording quality of the device itself remains relatively poor. While a recording may sound good to the average listener, the level of background noise, the ability to reliably reproduce speech frequencies without distortion and the audio frequency spectrum they can record are factors in obtaining an audio recording of sufficient quality for evidentiary purposes.

Don't spoil the evidence Handling audio recordings and tapes is also a critically important factor in developing defensible audio evidence. As with all crime scene evidence, strict chain-of-custody rules and procedures must be adhered to. When a piece of audio evidence is found, it should be immediately placed in an evidence bag or appropriate container and shipped to the audio lab. Evidence should not be lugged around in a coat pocket for hours or left on the front seat of a closed squad car in 95-degree temperatures. This situation not only leads to evidence degradation but also can provide a point for the defense to question the data. Audio evidence also should not be copied to low-quality recording equipment. This evidence should only be reproduced and analyzed on certified equipment designed for audio forensic use.

There are many other ways audio evidence can be spoiled. Consider the following scenario: An officer is working undercover to crack a major drug ring in the city. For months he has worked his way through a number of small-time sellers and has finally gained enough credibility to be presented to the top guys for a major buy. His plan is simply to make the buy and record the event. He has studied enough to know that a digital recording cassette machine will give him much better sound than an old-style micro-cassette recording device. His department has acquired the best digital device on the market specifically for this score. He has studied the digital recorder's manual and knows that while it can be set to record for 2, 4 or 8 hours - the longer the recording time the lower the audio quality. Due to compression, the longer the recording time, the more audio information that must be squeezed (compressed) onto the tape. He sets his machine for 2 hours since he is sure the meeting will not last more than 10 to 15 minutes.

He is now ready to make the buy and obtain the vital evidence. However, even with all of his preparation, he has made a basic mistake. The digital recorder is inside the lining of his sports coat. He has his cell phone turned on and clipped to his belt near the recorder. The buy goes down exactly as planned, and as the officer makes his way back to the station, he feels pretty good that his efforts are paying off.

As the audio technician begins to analyze the tape, it becomes evident there is a great deal of background noise occurring at regular intervals. The impulse signals produced from the apparently silent cell phone come through the recording at a much greater level than the audio and sound like small arms fire on the tape. Cell phones and other digital devices must always be kept far away from digital recording devices. This tape, despite the officer's careful preparation, will be difficult, if not impossible, for a jury to hear and clearly understand.

Years ago, this would be the end of the story. But with today's forensic audio analysis software, such evidence can be salvaged. The technician simply applies a set of adaptive filters to the tape. These so-called "smart filters" can effectively remove a variety of background noises and are particularly good at clearing out impulse signals from cell phones. The end result is an audio tape that clearly identifies what is happening and who is involved. The cleaned-up tape can be easily heard and understood by a jury. Even so, problems like this can be avoided if an audio technician is consulted as officers develop operational plans for a case like this.

Interpreting the data A problem also arises with advanced methods of audio analysis in the interpretation of the analyzed data. When any evidence is presented in court, prosecutors expect it to be challenged by the defense. A major challenge is always to the credibility of the laboratory and the technician who performed the analysis.

In all areas of forensic analysis, laboratory and technician credibility is important. This is particularly true in the audio area where improper manipulation of analytical software can easily lead to a wrong conclusion. Any size department can buy the software and recording equipment to set up an audio lab. However, this lab will only be as good as the quality of the technician running it. Unlike other software, where users can read books or pick up the basics through trial and error, running the proper software and filters is just a small part of the audio analysis equation. More important is data interpretation. And this only comes from hands-on experience analyzing and interpreting audio data. A well-trained technician is essential, especially in the area of speech recognition. Identifying a specific voice, isolated from a tape with several voices, and then comparing it to tapes of suspect voices, requires a great deal of experience. Only a certified speech recognition expert should evaluate this type of data.

Many forensic audio experts say audio analysis is part science, part art and a large part gut feeling that only comes from experience. A feeling for the right filter or parameter to apply to a tape to bring out a needed layer of sound comes with time. This experience can greatly decrease the time needed to obtain necessary evidence from a tape. Consider the following:

A call comes in to a 911 operator from a distraught, frantic woman. She is screaming and hard to understand. As the operator calms her down, she realizes the woman is being threatened with violence and may already have been beaten. The operator immediately dispatches a patrol car to the address. When officers arrive, they find the caller alone, very agitated, and insisting the call was "a mistake" and nothing was wrong. On the way to the scene, the officers learned there was a restraining order against the woman's ex-husband. They pay a visit to the ex-husband on the job. He claims to have been there all night, and his supervisor backs up his story. Officers then ask a forensic lab manager to listen to the 911 tape, since they believe it might provide a clue as to what actually happened at the woman's apartment.

The forensic unit analyzes the tape by applying the latest computer analysis software; both the voices of the 911 operator and the caller can be clearly heard. The analysis also reveals a barely detectable third voice.

The audio technician uses the software's leveling module to normalize, or bring all three voices to a level signal. It is now clear the voice in the background is threatening the woman with bodily harm. But who is talking in the background? The officer learns the woman has a boyfriend. Dialogues with other residents also reveal a male tenant has been repeatedly trying to gain her affections.

Both men are brought in for interrogation. Their answers to specific questions are collected by a digital tape recorder similar in quality to the 911 devices. The questions are structured to illicit certain words and phrases found on the 911 tape. The forensic audio expert then compares each man's response to the original tape. In a process called "speaker identification," the technician compares bits from the interrogation tapes to corresponding bits on the original tape. An energy distribution pattern of each voice over the vocal frequency range is developed and analyzed. Each man's energy distribution pattern is unique to the individual in a manner similar to the uniqueness of a fingerprint.

At this point the audio lab technician is stumped. He lacks the appropriate training to accurately interpret this type of pattern. An outside expert in speech identification is called in. The speech identification expert studies the tapes and makes a number of adjustments to the software to highlight certain parts of the recording. After his analysis, he determines the voice on the tape is the other tenant and not the boyfriend.

The use of an outside expert is critical in this type of case unless the local analyst has many years of experience with voice/speech analysis. In addition to years of experience, an audio speech expert also should be a board certified forensic audio/video examiner certified by the American Board of Recording Experts. On the witness stand, an expert's credentials will be a strong point with a jury and cannot easily be discredited by the defense. The expert also has had experience in presenting data to a jury in a clear and easily understood manner.

The future sounds clear for forensic audio analysis. New software applications continue to be developed to further enhance audio analysis techniques. Databases of background sounds and other useful audio information are being created. And a variety of training opportunities in forensic audio analysis have sprung up across the country.

DNA isn't the only forensic science making it tougher for criminals to escape the long arm of the law. Today's criminals need to watch what they say because law enforcement may be listening.

Establishing the audio/video forensic lab The following factors should be considered when establishing or increasing the capability of a modern audio/video forensic laboratory:

Identifying potential employees

Requiring a lengthy apprenticeship or equivalent experience of personnel in certain audio and video analyses fields

Equipping the laboratory to play back and record in numerous analog and digital audio and video formats, and then providing the capability to improve voice intelligibility, compare voices, identify non-voice signals, authenticate recordings, enhance video images and conduct other related analyses