How Google Retooled Android With Help From Your Brain

A picture of the human voice, courtesy the AndroSpectro app. Photo: Ariel Zambelich/Wired

When Google built the latest version of its Android mobile operating system, the web giant made some big changes to the way the OS interprets your voice commands. It installed a voice recognition system based on what’s called a neural network — a computerized learning system that behaves much like the human brain.

For many users, says Vincent Vanhoucke, a Google research scientist who helped steer the effort, the results were dramatic. “It kind of came as a surprise that we could do so much better by just changing the model,” he says.

Vanhoucke says that the voice error rate with the new version of Android — known as Jelly Bean — is about 25 percent lower than previous versions of the software, and that this is making people more comfortable with voice commands. Today, he says, users tend to use more natural language when speaking to the phone. In other words, they act less like they’re talking to a robot. “It really is changing the way that people behave.”

It’s just one example of the way neural network algorithms are changing the way our technology works — and they way we use it. This field of study had cooled for many years, after spending the 1980s as one of the hottest areas of research, but now it’s back, with Microsoft and IBM joining Google in exploring some very real applications.

When you talk to Android’s voice recognition software, the spectrogram of what you’ve said is chopped up and sent to eight different computers housed in Google’s vast worldwide army of servers. It’s then processed, using the neural network models built by Vanhoucke and his team. Google happens to be very good at breaking up big computing jobs like this and processing them very quickly, and to figure out how to do this, Google turned to Jeff Dean and his team of engineers, a group that’s better known for reinventing the way the modern data center works.

Neural networks give researchers like Vanhoucke a way analyzing lots and lots of patterns — in Jelly Bean’s case, spectrograms of the spoken word — and then predicting what a brand new pattern might represent. The metaphor springs from biology, where neurons in the body form networks with other cells that allow them to process signals in specialized ways. In the kind of neural network that Jelly Bean uses, Google might build up several models of how language works — one for English language voice search requests, for example — by analyzing vast swaths of real-world data.

“People have believed for a long, long time — partly based on what you see in the brain — that to get a good perceptual system you use multiple layers of features,” says Geoffrey Hinton, a computer science professor at the University of Toronto. “But the question is how can you learn these efficiently.”

Android takes a picture of the voice command and Google processes it using its neural network model to figure out what’s being said.

Google’s software first tries to pick out the individual parts of speech — the different types of vowels and consonants that make up words. That’s one layer of the neural network. Then it uses that information to build more sophisticated guesses, each layer of these connections drives it closer to figuring out what’s being said.

Neural network algorithms can be used to analyze images too. “What you want to do is find little pieces of structure in the pixels, like for example like an edge in the image,” says Hinton. “You might have a layer of feature-detectors that detect things like little edges. And then once you’ve done that you have another layer of feature detectors that detect little combinations of edges like maybe corners. And once you’ve done that, you have another layer and so on.”

Neural networks promised to do something like this back in the 1980s, but getting things to actually work at the multiple levels of analysis that Hinton describes was difficult.

But in 2006, there were two big changes. First, Hinton and his team figured out a better way to map out deep neural networks — networks that make many different layers of connections. Second, low-cost graphical processing units came along, giving the academics had a much cheaper and faster way to do the billions of calculations they needed. “It made a huge difference because it suddenly made things go 30 times as fast,” says Hinton.

Google’s Jeff Dean did the computing work to spruce up Android’s voice recognition. Photo: Ariel Zambelich/Wired

Today, neural network algorithms are starting to creep into voice recognition and imaging software, but Hinton sees them being used anywhere someone needs to make a prediction. In November, a University of Toronto team used neural networks to predict how drug molecules might behave in the real world.

Jeff Dean says that Google is now using neural network algorithms in a variety of products — some experimental, some not — but nothing is as far along as the Jelly Bean speech recognition software. “There are obvious tie-ins for image search,” he says. “You’d like to be able to use the pixels of the image and then identify what object that is.” Google Street View could use neural network algorithms to tell the difference between different kinds of objects it photographs — a house and a license plate, for example.

And lest you think this may not matter to regular people, take note. Last year Google researchers, including Dean, built a neural network program that taught itself to identify cats on YouTube.

Microsoft and IBM are studying neural networks too. In October, Microsoft Chief Research Officer Rick Rashid showed a live demonstration of Microsoft’s neural network-based voice processing software in Tianjin, China. In the demo, Rashid spoke in English and paused after each phrase. To the audience’s delight, Microsoft’s software simultaneously translated what he was saying and then spoke it back to the audience in Chinese. The software even adjusted its intonation to make itself sound like Rashid’s voice.

“There’s much work to be done in this area,” he said. “But this technology is very promising, and we hope in a few years that we’ll be able to break down the language barriers between people. Personally, I think this is going to lead to a better world.”