Siri Effect: Why Natural Voice Recognition Changes Everything

When Tim Cook stood in for Apple honcho Steve Jobs this fall, one day before Jobs’ death, and introduced the iPhone 4S, it soon became evident that the star of the show was not the phone itself, but Siri, the iPhone’s new speech-controlled personal assistant. Unlike traditional voice-recognition (VR) software, Siri doesn’t make you bark commands; it understands natural language—the language we use to navigate the world.

This giant step in the evolution of the way humans and machines communicate has the power to rock our world, both for users and companies, experts believe. Even Microsoft, Apple’s arch-competitor in Redmond, Wa., gives the devil his due and praises what Apple has done to raise the profile for devices that have a natural user interface (NUI) like Siri.

Natural user interface

“Apple has done a great job capturing the public imagination,” Ilya Bukshteyn, senior director of sales and marketing with Microsoft’s Tellme group, told BusinessNewsDaily. “We’re really happy to see Apple help shift the experience and get over the barrier to using natural user interfaces.”

Apple is far from the only game in town. Microsoft has baked in advanced voice- and gesture-recognition functionality into its Windows 7, Xbox Kinect and new Windows Phone 7.5 software. The company’s Tellme speech-recognition platform is being incorporated in products spanning automobiles, mobile devices, gaming and personal productivity technology.

Siri-like apps for Android such as Speaktoit Assistant are also helping to spread the NUI word. And software developers are waiting in the wings for Apple to open Siri’s application program interface to the outside world so they can begin creating complementary apps.

“Talking personal assistants is a new generation of user interface,” said Ilya Gelfenbeyn, founder of Speaktoit. “In the past, we used command line and graphical interfaces and are now are starting to explore opportunities to speak naturally to the computer. This is the most natural way for people to communicate.”

Natural language

Unlike earlier voice-recognition applications that users quickly came to hate for their obtuseness when faced with normal human speech, Siri and other new applications understand natural speech. It’s an amalgam of speech-recognition algorithms with a hint of artificial intelligence (AI) and natural language processing blended in, all connected to apps on your device or databases around the Web such as Bing, Google or Yelp, which have the information you’re looking for.

Tap the new iPhone and you enter a world of instant information and interaction. You ask Siri questions and she gives you answers. But you don’t have to remember keywords and use specific commands. You can use her to send messages, schedule meetings, place phone calls and fire up applications. And like a flesh-and-blood human assistance, she keeps getting smarter as she gets to know you better.

“When using Siri today, it feels magical about 65 percent of the time,” said Chris Ulm, CEO of Appy, a game developer for iPhone and iPad. “It really seems to understand natural speech, and when it does what you want—like turning on ‘American Idiot’ on your phone or setting a reminder for stopping by the market on the way home—it starts to feel like a genie in a bottle. As Siri is used, it will start to be patched and steadily improved so that magical feeling will be 95 percent of the time.”

User friendliness

It’s a quantum leap in user friendliness from the early VR apps, which users quickly came to hate and abandon. User friendliness was a long time coming.

“What sets Apple’s efforts apart from what has gone before is that I don’t have to learn commands,” said Michael Gartenberg, research director for Gartner, the IT research firm. “It uses natural language processing and has enough connection to external databases. It lives up to the demos and hype and makes life easier.”

Consumer acceptance for voice-recognition applications was a long time coming. Users were easily frustrated by cumbersome limitations. Because they quickly abandoned apps that didn’t provide good results, they also didn’t provide the kind of user data that developers needed to improve the quality of the apps.

“Speech recognition had a hard time getting traction with users,” Microsoft’s Bukshteyn said. “On day one it was as good as it would get. I had to train myself, and the failure rate was pretty high. Users would try it at most three times. The industry was stuck in a vicious cycle.”

Adoption was also stymied by this fact of human nature: People don’t like to be embarrassed in public. People didn’t like using speech recognition in public because they thought they sounded foolish using language that sounded like robots in a B movie.

Processing moves to the cloud

Two things changed the acceptance of voice recognition and helped improve the quality of the apps and user experience, said Bukshteyn. The data processing role was moved to the cloud, which not only added horsepower to the processing and provided access to infinite databases of information; but it also opened up user data that the scientists needed to improve applications. And VR was integrated with the entire user experience.

“Speech came out of the cold,” said Bukshteyn. “Speech was always a separate app. When speech became part of the natural user interaction, usage went up.”

Gartner’s research director agrees that integration is the key.

“It’s not the technology itself that’s so important,” said Gartenberg. “At the end of the day, it’s how the technology is integrated into the larger experience that matters.”

This new generation of speech-controlled devices is changing the way we interact with the world and how companies interact with consumers.

Changing the way we interact

“Voice interface simplifies the way people communicate with services and devices,” said Speaktoit’s Gelfenbeyn. “There is no need to search for a particular app or web service—just ask your personal assistant for what you want and get it done. Voice interfaces will also become a new important channel for companies communicating to consumers.”

These apps, said Appy’s Ulm, are making computers behave more like us instead of the other way around. In the future, he said, machine interaction will become more natural and human-centered and our devices will be able to understand body language, expressions, voice tone and complex emotional content.

“Siri is a good step in this direction in that it carves off a piece of ‘fuzzy’ interaction that humans do and builds the AI around a very functional role, that of an assistant with very expected behaviors—helping make appointments, reminders, turning things on or off and taking dictation,” he said.

Voice is only the first step, according to Microsoft. The company envisions an interactive landscape that involves touch and gesture recognition as well as voice input, said Bukshteyn. In fact, Microsoft is also beginning to bake that functionality into products such as the Xbox Kinect, where the user’s body functions as the onscreen controller.

“TV is probably the next frontier, said Bukshteyn.

The consumer is also going to have to do his or her part to make the interactive vision a reality, he said. Remember, people had trouble learning how to use the mouse when it first was factored into the human-computer equation. His hope is that it proves to be a shorter learning curve because it’s a more natural experience, he said.

But it will require training to get used to this new way of interacting.

“Consumers will require education on this kind of different paradigm,” said Gartenberg.

Commerce as a service

So will companies. Stefan Schmidt, vice president of product strategy at hybris, a U.K.-based company that develops e-commerce software, believes that the introduction of these new interactive interfaces will create a new paradigm for companies—commerce as a service. This means creating a commerce platform that is equipped to support all consumer touch points.

“Online, offline, it doesn’t matter,” Schmidt said. “They have to open up their systems so that other systems like Siri can access them.”

A personal assistant, after all, is only as good as the databases it is able to connect with.

The rise of the concept of commerce as a service will also cause retailers and companies to revisit the definition of the store, Schmidt said.

“The store is not dead and isn’t going away,” he said. “But retail has to redefine the role of the store.”

Future stores may serve more as showrooms that carry a limited stock of merchandise and allow consumers to touch and feel the company’s products than as full-line operations that stock all models, shapes and sizes, Schmidt said.

Great expectations

All of this is going to put pressure on manufacturers to deliver the kinds of devices and integrated experience that consumers want. They started the ball rolling; now they need to eat their dog food with gusto.

“Consumers are going to expect a different kind of experience,” said Gartenberg at Gartner. “Vendors are going to have to live up to that.”

The children may lead them, Microsoft believes.

“The next generation is going to grow up expecting to be able to interact naturally using speech, touch and gestures with every device they encounter in their lives,” Bukshteyn said.