GOOG-411 Puts Us Deep in the Matrix

Whoa, apparently GOOG-411 is putting us all into the matrix doing unpaid labor for Google. Fascinating.

The reason we really did it is because we need to build a great speech-to-text model … that we can use for all kinds of different things, including video search.

The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. … So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or we’re trying to get the voice out of video, we can do it with high accuracy.

Interesting — called to find a Chinese restaurant in Chinatown in NYC to see if it could understand Chinese names. Well, of course it couldn’t. I liked how after identifying Chinese restaurants as a category then an intersection it gave me a restaurant that was reasonably nearby but by no means the closest (Grand Sichuan on Canal Street). The system pronounced it “Grand see-kwan.” (Note to file: learn pinyin for category “Chinese restaurants” given this huge market segment in USA.) Then I twice demanded the information be sent as a text message. What I received was a completely different restaurant, China Grill, more than four miles away and over 200 Chinese restaurants away from my selected intersection. Since China Grill is one of the most prominent and expensive restaurants nominally in this category, I had to wonder….