Voice recognition in Google Now is getting better and can now handle up to seven languages at once.
CNET

MOUNTAIN VIEW, Calif. -- OK, Google. Why can't I have chocolate?

That's one question that Google Now's voice search can't answer easily, and it's an often-asked one from kids with access to their parents' phones, said Google Search team Vice President Tamar Yehoshua.

Google Now, Google's search-and-knowledge personal assistant, currently can respond to queries in around 52 spoken languages. The service will soon gain the ability to switch between up to seven languages on the fly, like a proper multilingual robot. Google originally told CNET that the feature would begin updating on Wednesday, June 25, but Google has since said that the feature will be delayed until later in the summer with no confirmed release date.

You'll have to preselect your secondary languages, but once you do that, the feature will work. Simultaneous multiple-language support is expected to arrive in the coming days to all Google Now users.

Google researchers told CNET said that seemingly simple language-recognition tasks are much harder than they appear. Yehoshua said during a recent lunchtime conversation at Google's Building 43 here that she's looked into how many people are aware that they can search Google by asking their phones.

"Fifty percent of smartphone and tablet users in the US are aware of voice search, and one-third of those use it," Yehoshua said. But she added that most people don't realize how natural conversational queries have gotten with Google Now.

Related links

Around 130 million people in the US have used Google's voice recognition for search within the three years that the feature has been available, according to numbers from the Pew Research Internet Project. Searching Google by voice is available on Windows, Mac, Linux, and Chrome OS desktops in Google Chrome, and in apps for Android, iOS, and Windows 8.

"Most people will use it for things like checking the weather," Yehoshua said. "They don't know that you can ask, 'Do I need an umbrella today?'"

Johann Schalkwyk, a lead staff software engineer on Google's voice-recognition team, discussed some of the myriad problems that Google is working to solve.

"In order for this digital assistant to be part of your everyday life, it just has to work," he said. The problem is that's not always the case. Ambient noise, such as from your car if you're using it while zipping down a freeway, is one problem. Another is accents and unusual speaking patterns, such as those from children.

Google Now is about a year or two away from beginning to be able to recognize kids' speech, he said, an impressive prediction given the problems. Legally, there are issues with the retention of data from children, as covered in the US by the Children's Online Privacy Protection Act. But there are technological problems as well.

"Speech and input modalities are very difficult" for the technology to recognize from children starting as young as 3 to around 10 years old, he said. "They're learning to enunciate better; they don't always speak grammatically; they yell at the phone; they hyper-enunciate -- 'DIE-no-saur.'"

Despite the problems, Schalkwyk believes Google's progress in voice recognition will solve current woes sooner rather than later.

"It's going to be five years, maybe less, before my computer can recognize child speech as well as I can," Schalkwyk said.

Although Google just announced that the recognition technology has gotten good enough to understand Indian accents, that still leaves a virtual tower of Babel misheard and misunderstood, and people using it frustrated.

A third problem that Google has yet to solve is what Schalkwyk, a South African native, called a "far field environment." That's when the distance from the mouth of the person speaking to the microphone is too large for the technology to work well. Even the 6 to 9 feet between your couch and your TV can be too far for the tech to handle well.

Schalkwyk said that while Google is employing better and more microphones to capture a stronger audio signal, his division relies more heavily on research into "deep neural networks."

"Recurrency, the input of one neuron that goes back and feeds upon itself, models dynamic signals in speech very well," he said. Basically, language modeling copies how the human brain picks up audio, "leading to pretty dramatic breakthroughs. On top of that, if you just add a lot of data, that's very useful."

Despite all the advanced scientific research that goes into Google telling you if you need an umbrella today, voice recognition still has a long way to go.

Schalkwyk confirmed what many Google users have already figured out: Google Now doesn't do well with names, especially those of places and restaurants. Some of that will be fixed as Google builds its knowledge graph, its database filled with facts about the real world and the connections between them.

There's also a problem with what can be charitably called the "dork factor." It's just not particularly cool to talk to your phone anymore, and even less so when a robotic voice answers you back.

Yehoshua said there are no plans to change the initiating phrase, "OK Google," or soften up the robotic timbre to the responses.

"There are going to be environments where voice is better, and there are environments where you want to be more polite," she said. She declined to offer guidance on what those were, although she did note that voice search usage is high in countries like Japan where typing is not as easy as it is in English.

As to the question of what to tell your kids when they ask Google instead of you why they can't have chocolate? Perhaps Google Now should tell them to eat their veggies.

About the author

Senior writer Seth Rosenblatt covered Google and security for CNET News, with occasional forays into tech and pop culture. Formerly a CNET Reviews senior editor for software, he has written about nearly every category of software and app available.
See full bio