Site Search Navigation

Site Navigation

Site Mobile Navigation

Pushing the Limits of Google’s Speech Recognition

By David F. Gallagher June 29, 2009 6:01 pmJune 29, 2009 6:01 pm

Last week I asked readers to pick up the phone and help me test a feature in Google’s new phone call management service that automatically transcribes voice mail messages. I encouraged creativity, and I got a lot. People called up and left all kinds of stuff for Google’s speech-recognition algorithms to puzzle over.

The aim here was not to see how well the service, called Google Voice, handled transcriptions of the day-to-day mundanities of voice mail. Google itself acknowledges that the feature, which it says is “the only fully automated voice-mail transcription on the market,” is still sort of experimental. The company is pushing the limits of technology with this, so the goal was to see how far they could be pushed before they broke. Not all that far, it turns out.

Here’s a sampling of what the service produced. Click on the play arrows to hear the original audio:

Transcript by Google: “here’s a message to test out google voice let us go there and you and i as the evening is spread out against this guy like a patient he’s arrive to be on the table steve cool kind identify the source of that quote to”Notes: Google had trouble with “the sky” (“this guy”), but many Jimi Hendrix-loving humans have had the same problem. More interesting is that Google often has trouble processing its own name. In the second reference here it came out as “cool.” Note that Google Voice doesn’t even attempt to punctuate the transcripts, since the point is really to give you the general gist of the message.

Transcript: “we hope you straight to be self evident that all manner created equal apparent out by their creator with certain valuable right the money for like 430 in the pursuit of happiness this is securities rights concert instituted mobile among the and driving their just powers from the content of the covered that whatever any former governor because destructive of the and it is the right people to offer to abolish it thank you and see if you got on the 28th sunday schnitzels cripples organizing powers and such 4 at today i’m sheltie most likely to check to see if you have any”Highlight: “laying its foundations on such principles” > “sunday schnitzels cripples”Notes: The general gist of the message is totally lost here. Luckily for Google, nobody talks like this nowadays.

Transcript: “hi david this is carol and i just got a call from monty python hi here’s here’s for the said to meet this carriages no more in tennessee seems to be hey it’s expired and gone to meet it’s maker this is shirley parents it’s a staff the rest of the life your prius in case if you hadn’t mailed it to the purchase that would be pushing updates it’s run down the curtain enjoy inquire into school this is at net expire parrot now i think somebody’s up faulty or i we should investigate and get back to me on thanks bye”Highlight: “This parrot is no more. It has ceased to be.” > “this carriages no more in tennessee seems to be”

Transcript: “please bring taking this lady to those did guaranteeing both in the way it but all means you’re the bar does i’m over at first upgrade see where the jebra walk my son the charge the light the closet catch you where the job job burden’s sean from E S and there’s match”Notes: At least five people called up and read chunks of “Jabberwocky,” which of course is totally unfair to Google, but fun nonetheless. With two of these, Google Voice took a pass and failed to produce a transcript at all.

Transcript: “that i’d i did i thought we’d just google voice with an australian accent or even i know the extent from sydney no but brisbane or can brock where we can have can group call on a emu and typos plus lots of poisonous snakes in spot as especially trying to unnecessarily skid torres you can has mel gibson we apologize for russell crowe integrate we’ve done in the ray bridge that name and address of those women o’clock row goodbye bye”Highlight: “platypus” > “typos”Notes: A number of people tried to stump Google with their wacky accents. As you can see, the results were mixed.

Transcript: “hi i have nothing in whittier clever i’m sorry but perhaps we can see you grew voice can handle my southern accent thank you”Notes: Not great, but that’s a pretty thick accent.

Transcript: “hopefully 79 you have a couple field kinda gets reviewed room what’s up coming at 7701 also shoot shaving she is about 3 o’clock monday cool my apple dealed citgo you”Notes: “Beowulf” is of course another unfair test. The same caller left another message with an excerpt from “The Blind Men and the Elephant,” which is in non-old English, and Google couldn’t come up with a transcript.

Transcript: “david this is mark graydale from louise open tacky testing google voice for you i’m actually testing to technologies because i’m using magic jack and my phone service for this call hope it works bye office 5 7 4 43″Highlight: “Louisville, Kentucky” > “louise open tacky”

Transcript: “this is janet and a message must include the name of the caller but if you would smithsonian side wonder if offered to do any better and especially if the given name is relatively uncommon or if you’d given me into surname combined sound a little too much like a more common phrase for example of many of my classmates were offspring of immigrants we did not realize the insurance and unusual names for the kids danny too don wong gary chinchilla wellington to she would man flu you know certainly do these names lose something in the google voice translation”Highlight: “humans misspell” > “you would smithsonian”Notes: Janet, the answer is that Google Voice does miss a lot of surnames. It did just fine with some of the odder names here, probably because they are longer words that give it more to work with, and you left gaps between them.

Transcript: “i’m sorry i can’t come to the door right now i’m very hill and i’m afraid it in my weakened condition i could taste and nasty spelled down the stairs and subject myself so for the school absences you can reach my parents at the place of business thank you for stopping by i appreciate your concern for my well being it will be remembered long after this this is past”Notes: This came out well, perhaps because the caller, doing his best Matthew Broderick impersonation, was speaking slowly.

Thanks to everyone who helped out with this. I’ll post a few more calls and transcripts on Tuesday. Update:Here’s Part 2.

I’ve been using Google Voice since the start (having been a Grand Central user) and find that in reality, most of the time I can get the gist of the message despite the glitches – and the advantage of being able to skim my “voice” mail quickly far outweighs the quirky transcriptioins. It’s not great, but it’s great!

Not true, the startup Yap powers a number of other companies’ voicemail-to-text features (GotVoice, YouMail, etc.) with machine-based speech recognition as well. It was actually first to market with this capability (their R&D team previously worked at IBM and Nuance).

Voice is going to be bigger than you can think off in the future ,plastic credit cards will go voice bio metrics id will take over so will passports, drivers licences, you dont need to type of spell with voice ,

I’ve got a better idea: play these same clips to someone and have them provide a transcript. Then compare against what Google came up with. To be honest with you, I would have struggled with most of these if I had just listened to them without any sort of background or context whatsoever (and Google never has an context). Heck, I doubt anyone not from the greater New York area would get “Ronkonkoma”.

As a test of how far you can push the technology until it breaks, I think it is better to start easy and work your way up to hard, rather than to start out with ridiculous and then conclude you can’t push the technology very far. I realize this post is primarily about fun, and only secondarily about testing the software, but I would be more interested in more “real world” examples.

The other thing this text point out is just how awful telephone audio quality is. I’m always shocked when I hear telephone recordings. They’re terrible!

For getting the gist of a message sometimes it is good, sometimes it is not. In my experience (I have been a user since early on and used transcripts starting on day one of availability), it varies depending on accent and speed of speech. I have received messages from people in the southeast that were darn near unintelligible. Other accents are generally better, if they don’t talk too fast.

One feature of the transcription is that it detects phone numbers and reports them in the format “(555)-555-5555″. I’ve probably got 5-10 message with phone numbers and all of them where correct. No more having to listen to the message over and over to get that fast sequence of digits at the very end.

Despite I have not tested Google Voice I can say that Google Voice is not the only fully automated voice-mail transcription on the market. There are others that have been around before Google Voice.

One of them, //www.vlingo.com. I have to say that I’m not part of this company but I know it from a forum I attended in Boston.

One of the messages I recalled was that the technology is still not there to fully automate complex vocabulary but it is approaching a level where simple command and control voice can be succesfully translated. Therefore we should expect good performance on those basic daily applications.

There’s a website that is now trying to get all these ridiculous Google Voice transcriptions cataloged for your pleasure. Check out //gvscrewups.blogspot.com. You can even submit your best GV messages that have been butchered beyond belief.

What's Next

About

Gadgetwise is a blog about everything related to buying and using tech products. From figuring out which gadget to buy and how to get the best deal on it to configuring it once it’s out of the box, Gadgetwise offers a mix of information, analysis and opinion to help you get the most out of your personal tech.