Google Voice Search, Deep Learning, and SHIELD

Google Voice Search, Android TV’s speech-recognition service, is one of the many ways the NVIDIA SHIELD streaming media player is ahead of its time.

Google Voice Search may sound futuristic, but it’s estimated that over the next five years more than half of all internet searches will be done with voice. And on SHIELD, this intelligent search capability is transforming the living room experience today.

No one wants to be tapping away at a keyboard or trying to operate a mouse while relaxing on a couch in the living room. Pressing a letter for each button using a controller is just plain slow.

SHIELD, the flagship Android TV box, taps into Google’s advanced speech recognition technology and deep learning to help you find the song, game, or video you’re looking for with ease, using simply your voice.

It goes way beyond the basics you’d expect from a streaming media player connected to your TV, like searching for a movie title or artist name. For example, just press the voice button and say “best football movies,” “best WW2 movies,” or even “best rom-com movies,” and up pops a list of familiar favorites available to you with a click of your remote.

Google also searches across apps to find content. Search for “Transformers” and it’ll tell you the movie is available on Google Play, Cinema Now, and Sling TV.

Deeper still, you can easily find out “Oscar nominated films of 2014,” settle bets by learning “Who acted in Ocean’s 11?,” or query an actor’s filmography by asking to see “George Clooney films.”

Moving from the virtual world into the physical, you can ask things like, “When is the next Giants game?,” and you’ll quickly learn the outcome of their most recent game, who they played, and their upcoming schedule. Or ask something like “Where is Cal Poly?” to see which city your friend’s new college is located in.

And this is where the real power kicks in. Subsequent searches are relative, so you can then ask “How long does it take to drive there?” and the answer is based on your current location and the previous search query. Amazing? Yes, but it’s just getting started.

Ask something like “Who was the 42nd President of the US?,” then follow up with “Who was his Vice President?” This is starting to get fun, right?

Curious what the weather is? You can ask for “Weather in Chicago” and see the forecast, but a more natural question might be “Will I need an umbrella?” If you had just asked about a city, you’ll get an answer based on that search. However, the next time you ask that question, you’ll automatically get an answer for your home location.

Google Voice Search is very powerful. So is Voice Commands, which lets you access content without even loading an app. Simply say “Play Rollin in the Deep,” and up pops Adele’s music video on YouTube, which automatically starts playing. Ready to play an awesome Android game? Command your TV to “Open Talos Principle” and the game launches.

The Deep Learning Science Behind Voice Search

So how does it all work? Here’s where it quickly goes from a fun consumer experience to a very complex computer science problem.

The words you say, and their correct semantic context, are parsed by extremely complex computer algorithms. These algorithms that power media player voice search are developed and continuously improved using techniques in deep learning, a field of artificial intelligence (AI).

Deep learning focuses on the development of computer programs that can teach themselves to grow, change, and learn when exposed to new data, such as the thousands and thousands of hours of spoken search queries that users speak into Google’s speech recognition servers every day.

Deep learning is also used by companies like Facebook, Twitter, Yahoo, and Microsoft, which use GPUs (graphics processing units) to train their computer systems in areas such as speech recognition, image recognition, and video analytics. Not coincidentally, the GPU is the same fundamental technology that powers NVIDIA SHIELD, and its voice search.

So, tired of the tedious character-by-character text input on your cable or satellite remote? Ready to step into a world where a nearly endless spectrum of entertainment is only a voice search away?