Sound Searching

For the past four months, Merck & Co. has been watching its words. In September plaintiffs' attorneys requested that the pharmaceutical company preserve all of its Vioxx-related voicemails. The New Jersey Superior Court judge presiding over the case allowed the request despite Merck's objections.

"We argued that the request was unduly burdensome," says Ted Mayer, partner at Hughes Hubbard & Reed and outside counsel for Merck. "Still, the order went through so we are complying."

So far, Merck hasn't had to produce any audio recordings for the plaintiffs. But if it does, it could find itself in over its head.

That's because culling through audio files has historically been a time-consuming and expensive process.

"Previously the only alternative was to put headphones on contract attorneys and have them sit in a room and listen to the audio files," says Mary Mack, technology counsel at FIOS, an e-discovery service provider. "Then you'd have to send all responsive recordings out to a transcription service for a cost."

But technology is catching up to meet the needs of in-house counsel. Thanks to improved speech-recognition software, searching through audio files--or audio mining as it's commonly referred to--has never been easier or cheaper.

"Counsel can now index and search huge volumes of voice recordings very quickly," Mack says. "This allows them to reduce the volume of recordings to review and to produce them much faster."

Mandatory Mining

Discovery orders that include demands for audio files are still somewhat uncommon. But recent changes in the court procedures and rules may increase the chances a company will have to scan audio files for responsive content.

According to the amendments to the Federal Rules of Civil Procedure, which went into effect Dec. 1, all electronically stored information, including sound and visual recordings, are subject to discovery. The explicitness of the new rules eliminates any wiggle room corporate defendants may have had to avoid such requests in the past.

"Most in-house counsel are just starting to get their arms around how relatively simple forms of content, such as e-mails and spreadsheets, fit into e-discovery," says Barry Murphy, a senior analyst at Forrester Research, a technology research company. "Now that the courts have determined that any type of communication is discoverable, we'll probably start seeing more requests for audio recordings. In-house counsel need to establish a process to deal with that very quickly."

And that process has to be much less expensive than relying on a room full of attorneys and transcribers. A contract attorney will usually charge at least $100-an-hour to listen to audio content--which would translate into a $50,000 bill for a standard 500-hour project.

Add the fee for a transcription of responsive files, and the price goes up at least another $10,000. But with powerful speech-recognition technology, these exorbitant fees may become a thing of the past.

Word Find

Consider the same project with speech-recognition software. No human listeners. No transcription services. The only price is that of the software, which on average costs about $30,000 for a 500-hour project using either of the major speech-recognition technologies aimed at the legal market.

The two major technologies are phoneme-based and speech-to-text--each of which employs different audio-mining techniques.

Speech-to-text software uses advanced speech-recognition technology to match spoken words to textual words stored in its built-in dictionary. The software then creates a searchable transcript of the audio file.

"Web sites such as YouTube.com allow you to search against the metadata of video clips to find a match, but you won't be able to go to the exact second where you really want to listen," says Robert Weideman, senior vice president of marketing and product strategy for Nuance, which manufactures a line of speech-recognition software known as Dragon. "Our product enables you to access a specific location within a recording using the time- stamped -transcript."

Yet speech-to-text has its drawbacks. It's not much faster than human listening. Also, because it relies on an internal dictionary, the software has a difficult time deciphering proper nouns, which reduces its accuracy. For example, if it comes across the spoken word "Microsoft," it may slot in the textual word "microwave."

As a result, some companies are using phoneme software instead. Rather than converting sound to text, it interprets the individual components of human speech to find keywords. For example, the word "litigation" phonetically looks like this--l-i-t-i-gei-sh-u-n. When searching for the word "litigation", the software breaks it down phonetically and scans the audio file phoneme-by-phoneme for a match.

"By not producing the text file, we are actually able to do an hour's worth of human listening in one minute," says David Fishel, senior director of business development and technology counsel for Nexidia, a phoneme-based speech-recognition software provider. "We also don't inject any errors, which is common when transcribing audio."

Increasing Accuracy

This is why speech-to-text solutions tend to have lower accuracy rates than phoneme-based solutions. Although both technologies are fairly reliable, the speech-to-text method, at best, can reach 85 percent accuracy, especially if users populate the software's dictionary with additional terms. Phoneme-based technology accuracy can be as high as 95 percent. However, these high accuracy rates begin to drop fairly quickly if the quality of the recordings is poor.

"Accuracy can be much lower if the recording is a telephone conversation with a bad connection, the people are distant from the microphone when the recording was made or if a person just isn't speaking clearly," Weideman says.

No matter which flavor of speech-recognition software a legal department decides to purchase, what's most important is general counsel adjust their e-discovery plans and polices to include audio recordings.

"This is not something we see many companies proactively planning for," Murphy says. "But they need to because technically everything is discoverable, and it takes a very specialized tool to search audio files."