Speech Technology Tech Sounds Better All The Time

When you think speech technology, odds are you think of HAL from 2001, or perhaps, more generously, R2-D2 and C3PO from the Star Wars movies. And of course, that level of speech recognition is still a long way off. But what we have today might be good enough.

0shares

When you think speech technology, odds are you think of HAL from 2001, or perhaps, more generously, R2-D2 and C3PO from the Star Wars movies. And of course, that level of speech recognition is still a long way off.

But as I walked around the SpeechTek 2010 conference last week, I was again reminded that speech recognition may not be at that level, but it's certainly good enough for a lot of applications today, and it's being readied for even more uses in the near future.

We often think of speech recognition in terms of applications like Dragon Naturally Speaking, but the biggest use for speech recognition today seems to be in the call center, with interactive voice response (IVR) systems handling and routing simple queries, and analytics software helping companies search through their records looking for patterns. If you've ever heard "this call may be recorded for quality purposes," odds are some software with speech recognition is involved. And I know that most times I call my bank or credit card company, I go through an IVR system before I get to the right person. Indeed, the fact that the conference was held in conjunction with the CRM Evolution conference aimed at call centers is an indication that such applications still dominate the market.

All sorts of firms produce products that serve different parts of the market, with Nuance (which makes Dragon), Microsoft (through its TellMe Brand), and Loquendo probably the best known providers of the engines themselves.

Both Nuance and Microsoft have widened their offerings in recent years through acquisitions.

Microsoft's Grant Shirk, director of industry solutions, says the company is focusing on a "cloud-based platform" for speech, with its products used by companies such as Fidelity, UPS, and Avis.

But in the long run, it views speech as part of a "natural UI" that will combine speech, touch, and gesture recognition. Indeed, in a keynote speech at the conference, Microsoft Speech General Manager Zig Serafin talked about the transition to the "natural UI era"

For instance, Microsoft has integrated speech and gesture recognition into its Kinect for the Xbox; and speech and touch features together for Windows Phone 7. Indeed, Shirk gave me a great demonstration of using speech on a Windows Phone 7 device. You just press and hold the center button on the bottom of the phone, and you can say things like "Start Outlook." You can go into Bing and say things like "Find Italian Restaurants near me." Or just say the name of an airline and flight, and get the status.

The TellMe technology is part of the speech recognition that is already embedded in Windows 7, but in many ways, this technology seems to be moving beyond such obvious dictation, Shirk said. For instance, he noted how a new Voice Mail Preview function could transcribe voice messages and put the text into your e-mail inbox, with the person leaving the message not even knowing it.

Improvements in the technology are based in part on getting more samples, and in building the usage of existing products so that developers can gather more data. The company is also interested in building in semantic intent and context, so that such software does a better job in understanding what you mean, building on tools like the Powerset engine Microsoft recently acquired. The goal is to "stop transcribing and start understanding."

Nuance may be best known for Dragon Naturally Speaking -- still the best-selling dictation program, and one that keeps improving. But it also makes a wide variety of products aimed at other speech applications, which are often used in the telecommunications and financial services industries, including an on-demand version it recently acquired with BeVocal. The company says it has over 4000 deployments of customer care applications.

Laura Marino, senior director of product management, said the company is particularly looking at improving the grammar of a conversation, making software that is a "smart listener." She noted that such "adaptive grammar" makes it easier for the software to understand what someone means in a conversation.

Another area of research, she said, was "dialog strategies", making the applications more conversational: asking questions and responding to them. The company also talked about "natural languages" and is working on making the next generation of such applications closer to talking to a live agent. She noted how many people prefer going to an ATM rather than to a teller, so making such "self-service" applications work even better is a key focus.

Of course, the company is also looking to build on its wide variety of applications, with Dena Skrbina, senior director of solutions marketing,telling me that consumer speech applications were driving improvements in IVR speech, and vice versa. She said the use is broadening beyond customer care applications; to full problem-solving solutions, including such things as outbound messages, in systems that notify customers of changes or alerts via SMS, e-mail, or voice calls.

For instance, the company offers a visual display system that combines speech and visual displays for phone companies, so you can easily navigate your bill. One such system, Dena said, has sent more than 12 million alerts. The company counts Metro PCS and T-Mobile among its customers.

Personally, I've noticed a great deal of improvement in speech over the past few years, from dictation programs to mobile search to IVR solutions. And while no IVR system is as good as talking to a real live person, there are plenty of times when I'd rather get a quick answer over an IVR system than wait on hold for a live agent. Voice recognition has come a long way, but of course, it's still nowhere near as good as it looks in the movies.

Michael J. Miller's Forward Thinking Blog: forwardthinking.pcmag.com
Michael J. Miller is chief information officer at Ziff Brothers Investments, a private investment firm. From 1991 to 2005, Miller was editor-in-chief of PC Magazine, responsible for the editorial direction, quality and presentation of the world's largest computer publication.
Until late 2006, Miller was the Chief Content Officer for Ziff Davis Media, responsible for overseeing the editorial positions of Ziff Davis's magazines, websites, and events. As Editorial Director for Ziff Davis Publishing since 1997, Miller took an active role in...
More »

Automatic Renewal Program: Your subscription will continue without interruption for as long as you wish, unless
you instruct us otherwise. Your subscription will automatically renew at the end of the term unless you authorize
cancellation. Each year, you'll receive a notice and you authorize that your credit/debit card will be charged the
annual subscription rate(s). You may cancel at any time during your subscription and receive a full refund on all
unsent issues. If your credit/debit card or other billing method can not be charged, we will bill you directly instead. Contact Customer Service