Speech Recognition Finally Finding Its Voice in Mobile Technology

Voice-recognition technology has made significant progress of late, becoming a popular feature of smartphones, automotive navigation and entertainment systems. While a panel of Silicon Valley tech experts says it still has its glitches, it can eventually improve to where talking to a machine is like talking to a person.

PALO ALTO, Calif. -- If speech-recognition technology were a human, it would be like a 5- or 6-year-old child. At the age of 1, you can speak to a child, but you have to speak slowly and simply using small words. By 5 or 6, it starts to better understand your words and, more importantly, your meaning.

The comparison of computer speech development to human speech development came up during a panel discussion Aug. 20 at a forum hosted by the Churchill Club of Silicon Valley in Palo Alto, Calif. Representatives of a speech-recognition software company, an automaker and Apple co-founder Steve Wozniak discussed where speech recognition has been and where it's going.

Speech is becoming the new computer user interface, said Quentin Hardy, deputy technology editor of The New York Times and moderator of the panel, continuing a long line of UI evolution from the punch card and the command line interface to the mouse and the touch-screen.

With each advance, the interaction shifts became less machine and more human. When we want to get someone s attention, we tap them on the shoulder like we tap on a screen, said Wozniak, and when we want to talk to someone, we speak.

"We love our computers; we love our phones. We are getting that feeling we get from another person," he said.

Speech-recognition technology has evolved from the machine understanding voice commands to understanding meaning and context, said Ron Kaplan, senior director and distinguished scientist at Nuance, whose voice-recognition technology has been licensed to Apple for use in its Siri personal assistant feature on the iPhone 4S and to the Ford Motor Co., for its MyFordTouch system that is also based on Microsoft Sync.

"One of the enabling technological advances that makes more accurate speech recognition possible and makes more accurate understanding of intent possible, is the ability to accumulate large amounts of data from lots of user experiences and to sift and organize and build models from it," Kaplan said.

In other words, like a child, its vocabulary and understanding grows the more it hears what people say to it.

Ford opened a lab in Silicon Valley at the beginning of the year and the unit is organized as a startup far away from the bureaucracy at Ford headquarters in Dearborn, Mich. The lab is continually working to improve the accuracy of MyFordTouch, which was introduced in 2007. Drivers use voice commands to get directions, adjust the heating and air conditioning or change radio stations. Ford parked a 2012 Focus Electric sedan equipped with MyFordTouch in the hotel ballroom where the event was held.