Posted
by
ScuttleMonkey
on Monday February 08, 2010 @05:12PM
from the ford-why-is-this-fish-in-my-ear dept.

nikki4 writes to tell us that in giving some major improvement tweaks to its existing voice recognition tool for the Smartphone, Google is aiming for new translator software that will provide instant translation of foreign languages. "The company has already created an automatic system for translating text on computers, which is being honed by scanning millions of multi-lingual websites and documents. So far it covers 52 languages, adding Haitian Creole last week. Google also has a voice recognition system that enables phone users to conduct web searches by speaking commands into their phones rather than typing them in. Now it is working on combining the two technologies to produce software capable of understanding a caller’s voice and translating it into a synthetic equivalent in a foreign language."

Maybe my experience is atypical, but Google doesn't seem to translate pages very well. I can only imagine how bad it will be having a phone do this. "Did that guy's phone just call me what I think it did?"

As someone with a background in AI and HCI, I completely agree. Unfortunately, we still have a long way to go, and I think that Google is jumping the gun on this. It should prove to be quite humorous, even as first steps go.

As someone with a background in AI and HCI, I completely agree. Unfortunately, we still have a long way to go, and I think that Google is jumping the gun on this. It should prove to be quite humorous, even as first steps go.

Well, to be fair, they did say this would be something they are shooting to accomplish "In a few years". Still a tough task, but they're giving themselves some time. Considering how far they've gotten so far, I'm really excited to see how this works out.-Taylor

As a professional translator and interpreter, I also agree that Google has done better than anyone else, and that that's still not particularly good by objective standards. I've been using their Translation Center (NOT Google Translate) for a while now, and I've seen their translation memory evolve before my eyes.

The basic problem, however, is that the computer doesn't actually understand what it's spitting back to you. It only spits back the translations others have provided for similar phrases. It doesn't know if they're any good. Sometimes they're surprisingly good, and sometimes they're bizarrely bad.

There's a lot of ambiguity in human writing, and even more so in speech. Even assuming you hear the words correctly, it's tricky to tease out the precise meaning they wanted to convey, and trickier still to re-express that in another language, with appropriate cultural and regional context.

Google will get better and better at parroting good translating and interpreting decisions, but software will never be able to make those decisions, because, in the final analysis, they are subjective decisions.

Google will get better and better at parroting good translating and interpreting decisions, but software will never be able to make those decisions, because, in the final analysis, they are subjective decisions.

Think about how successful google has been with search. Prior to the web, we would have idealized search as speaking with an expert who has all the knowledge that exists on the web. Various efforts still strive for that vision today (askjeeves, wolphramalpha, etc). But clearly it is unreachable

Google will get better and better at parroting good translating and interpreting decisions, but software will never be able to make those decisions, because, in the final analysis, they are subjective decisions.

Think about how successful google has been with search. Prior to the web, we would have idealized search as speaking with an expert who has all the knowledge that exists on the web. Various efforts still strive for that vision today (askjeeves, wolphramalpha, etc). But clearly it is unreachable for the forseeable future. Yet, search is very useful.

Similarly, this universal translator may well reach a point that it is possible to visit a place, buy things, have a meal, ask where the toilets are, and get back home, particularly when both parties in the conversation are familiar with the limitations of translation. That would be extremely useful, even if it's only 1/100 of all a native bilingual speaker understands, or what you would need for nuanced treaty negotiations or to author a respectable translation of War and Peace.

In Chinese, then back to English:

Think about how the success has been with Google search. Prior to site, we will work with specialists who have all the knowledge and presence on the network to speak idealized search. However, efforts to fight the idea, (it is by virtue of, wolphramalpha, etc.). But obviously can not access the foreseeable future. However, the search is very useful.

Similarly, the universal translator is likely to reach a point of view, is that we can visit places, buy things, eat a meal, and asked where the toilets and get back home, especially when the parties are familiar with the limitations of dialogue and translation. This will be very useful, even if only 1 / 100 of the machine for all those who understand the bilingual, or you need to nuanced negotiation of a treaty, or the author's respect for the translation of war and peace.

(Disclaimer: I happen to work for Google, but not on anything related to machine translation.)

You are demonstrating his very point. Translation will not get nuanced stuff, but it could greatly help everyday interactions for travelers or recent immigrants.

Let's do English->Chinese->English on his actual examples:

buy things, "What is the cost of this umbrella?" -> "What is the cost of this umbrella?" (Note I didn't say "how much for" because I am familiar with the limitations of translation and know that phrase is a colloquialism.)

have a meal, "I would like to order the beef soup." -> "I would like to order beef soup."

ask where the toilets are, "Where is your bathroom?" -> "Where is the bathroom?"

get back home "How do I get to the Hilton hotel?" -> "How do I get in the Hilton Hotel?"

I'd say that is pretty passable. Now, it would be better if folks could learn the local language, but for anyone who travels a lot you realize that it is not practical to learn a new language every single trip. Something like this might also help more folks travel with a little less fear, and experience places they otherwise wouldn't. Tools such as this could also allow older immigrants more access to the country they now live in.

Machine translation has a long long way to go to even be considered "good", but having something close to the state or the art, working on improving it, and making it free for all to use seems like a good thing to me.

I, Google Maps and do a link and, please, I do not think it is successful. Aruuebu, all Web search experts, to discuss the background of existing knowledge. In addition, various efforts and vision, dedication, today (U.S. time askjeeves, wolphramalpha etc.) lead. But this has obviously reached the foreseeable future. However, the search is very useful.

Similarly, from the perspective of our toilet, I can eat, and to translate the conversation reaches the end of a universal

Similarly, from the perspective of our toilet, I can eat, and to translate the conversation reaches the end of a universal translator, the parties, to purchase a backup location to buy me a house helpful must.

That is on all seriousness one of the funnier things I have read in a long, long time! (Yes, I am tired)

Biggest problem is, that if you give google just one sentence and not the whole context it will never be able to translate it correctly. Especially with languages that build on a lot of context in text and conversation.

Only if language is thought of in terms of rules and grammar parsing. If statistics are used (Bayesian filters, for example), then it's not that hard.

As a demonstration of this, look at spam emails: today's clients have nearly a 99% success in capturing spam. I sincerely do not remember the last time I had a legitimate email treated as spam either in google mail, thunderbird or outlook 2007.

Maybe my experience is atypical, but Google doesn't seem to translate pages very well. I can only imagine how bad it will be having a phone do this. "Did that guy's phone just call me what I think it did?"

If you haven't used it recently, try it now. Speaking as a linguist I am incredibly impressed by the speed of their progress.

If you haven't used it recently, try it now. Speaking as a linguist I am incredibly impressed by the speed of their progress.

And speaking as a translator, I can say my job is in no danger.

But putting that aside, it's rather absurd to even compare text-to-text with speech-to-speech processes; they're entirely different beasts. I routinely get messages left through my google voice account in non-english languages, and the text that is left in my inbox is ridiculously bad. If they can't even get voice-t

I can only imagine how bad it will be having a phone do this. "Did that guy's phone just call me what I think it did?"

Everyone says this will happen, and I don't understand it. When two human beings are trying to communicate, there's a lot more going on than just the actual words that get transmitted from one to the other. If some tourist is trying to communicate with you using his phone, and the thing comes up with "How much is your wife?", you're probably not even going to be offended - it'll be hilarious

I tried to translate your sentence multiple times, then back to English so I could post the ridiculous result.

Except google's translation was actually pretty good.

Try a more complicated example. For instance, starting here:

"It's probably pretty good at translating translations it produces back into the same source text. If you figure that a phrase structure in one language corresponds to a certain data structure in Google Translate, then it makes sense that this data structure would survive multiple passes through the same restructuring algorithm..."

translating to Japanese and back to English yields this:

"It is translated to produce translated text back to the very same source is probably a good thing. Cases, one single phrase structure of language specific data structures in the Google translation, it is this data structure makes sense and survival of multiple paths through the same algorithm structure corresponding figures..."

Here you've got badly handled idiomatic phrases all around... Like the Google translation to Japanese used "seiseisuru honyaku no honyaku dewa ii koto da" at the end of the first sentence ("created-translation's translation is good" or something like that). On the translation back the connection between "good" and "translation" was lost - Google slapped on a fairly generic "is probably a good thing" - picking the bit of uncertainty out of the start of the Japanese sentence and combining that with the "dewa ii koto da" - but dropping the whole idea of what it is that's good... Which is something that can be kind of vague in the structure of Japanese... Meanwhile, the phrase "source text" was transliterated into katakana, but it got broken up in the translation back to English and wound up in two different locations in the sentence...

The whole conditional clause in the second sentence got kind of mangled. In the Japanese translation it starts with "baai wa": baai means "case" or "situation" - the structure of the sentence establishes this "case" being described as a possibility... Google lost all that, and just said "cases," Then, at the end of the sentence, after the ellipsis, "figure", from "if you figure" in the English original, was tacked on as "taiousuru zu" - "interacting drawing" or "interacting figures". In the return-to-English version this somehow wound up back before the ellipsis again.

The rest of the second sentence in Japanese is something like "if this data structure uses the same intermediary algorithm, several passes of the algorithm should be survived and it should make sense." The apparent problem there is something analogous to operator precedence in arithmetic. The "and" is meant to mean that the surviving translation should still make sense - but this clause apparently got broken up... like the reverse translation assumed that "uses the same intermediary algorithm... should be survived" was all one stand-alone clause - and so it assumed that clause had nothing to do with "this data structure", switched the order of the "and" around, etc...

My hobby is building Gundam models - one of the most comprehensive review sites for new Gundam kits is in Korean. Believe me, we all try using Google translate or Babelfish on Dalong's site from time to time, but the result is rarely worth the effort.

My favorite is for Chinese literature, which it is horrendously bad at. Take a line from the Diamond Sutra, which is a model piece of Chinese literature. This is what Google Translate spews out for one line:

All Xian Sheng, begin with a difference non-action law.

Now for a human translation:

All worthy sages vary only in their mastery of the unconditioned Dharma.

Google Translate's version means nothing whatsoever, not even giving a hint about the actual meaning.

Google's translator apparently does no grammatical analysis, relying entirely on an internal corpus of bilingual documents to make word and phrase equivalency guesses. On top of that, it has no AI for understanding context and analyzing semantic ambiguities. So unless you're asking it to translate simple phrases it already has a perfect translation of in its database, it's hard to see how Google Translate will ever be more than a poorly-functioning gimmick. For languages like Japanese omit subjects, nest cl

Would be interesting if the english gets translated correctly into japanese and then translated back by google. Might yield better results. Just translation back and forth between the same system will yield strange results.

It is really bad at asian languages. Even simple sentences sometime get mangled. Especially in a language like Korean where the subject and object are often implied. It is understood from context. A machine can't remotely pick up on that.

Google is developing software, the first foreign language translation of a phone almost immediately - Hitchhiker Guide's may sound like a fish galaxy.

Building on existing technology, speech recognition and automatic translation by Google is expected to have a basic system ready in a few years time. If successful, it's finally over 6000 languages in the world can be translated into the interaction between.

Does anyone use voice recognition software? Here are a couple of my voicemails transcribed by Google Voice:

Hey man, Hello, this is gonna ask you about Stockton uncle in a missed your call, so, so give well. Okay bye.

Hey it's me and I for me. Long, My of the day. So Hey Jared, Here doing. If you come for another anti, gimme a call before you go to sleep and stuff, so give me a favor you familiar with it. I love you bye.

I find Dragon to be much better, simply because they require you to upload your contact list (privacy issue flared up a while back but their new privacy agreement is pretty in-depth and satisfactory for me), so any contacts are not garbled, like they were in GVoice voicemail->text (a feature I love, but is definitely not for the easily confused).

I'm sure if Google used their contacts list (from your google voice or gmail contacts) privacy freaks would rip them a new one... so not sure what they can do ot

Google voice has definitely gotten better at recognizing my name after accepting/rejecting voicemails and filing a feedback request (right near the beginning).

The thing I have noticed is that it is a little too trigger happy to label the first bit of audio as a standard greeting (hello or something). I have a friend who seems to manage to *never* start a voicemail with an immediate greeting. There is always like a giggle or some background sounds or a last word or two of conversation with a real person.

I suspect your friends might have thick regional accents. The voicemail transcriptions are the main reason I use it. Perfect, no. But I'd say it usually only messes up about one or two words per paragraph of text, on average. Except for one friend with a really noticeable Texas accent. It has serious trouble with his.

I live in the midwest and my friends (at least in the case of these voicemails) are intelligent and well spoken. They weren't talking as though they were going to have a computer interpret what they were saying, but aside from that they were speaking about as clearly as you could hope for.

It doesn't work that bad for people with a clear accent. The problem is getting the software to work as well with Southerners from the US or Scottish people. But I am glad Google is continuing to work on it despite it it not being perfect.Someone has to keep pushing it if it's going to improve.

The problem is getting the software to work as well with Southerners from the US or Scottish people.

It's a problem of inputs. I'm a native Mississippian. When I was in high school (early 90s), a friend of mine had an older brother who worked for IBM designing their first voice-operated phone tree systems. He brought out a list of about 100 words and asked people to call an 800 number and read them off to train the system on Southern accents.

FWIW, you can have a pretty significant effect on Google's system just by using GOOG-411 repetitively; I've trained it in how to pronounce a local restaurant corr

The problem with voice recognition is inherently a user related problem. All this fluid/casual conversation, regional dialects, muffled voices, uneven, laxidasical cadences not to mention you kids and your fads and lexicon of so-called 'lingo'. If everyone just spoke like robots there'd be no problems. Humph!

Speech to text can be so perfectly on the mark sometimes (when you expect it to be way off) and it can be way off on something so simple.

My girlfriend is a history major and she always handwrites her papers - and because I can get 70 WPM (bursts, not constant) I usually end up typing them up for her. I decided we'd try the Speech to text service on my laptop, with the USB microphone that came with Rockband.

The paper was on Women in Ancient Rome, so you can imagine that there would be a ton of errors when it

"Hi, Stephen, it’s Natasha from BBC Newsnight in London. Just to say I’ve sent you two texts. One is to say that we could do it at eleven am your time after the launch, or any time sooner after the launch, or we could do it at midday as we suggested earlier. I, er, if you could text me back about that, and I’ve sent you the details of Skype that you need to do too. If you could give me a call back. Enjoy the launch and I’ll speak to you after that. Thank you Bye."

I’ve transcribed it from the voicemail sound file that resides online on my inbox on the Google Voice site. All fine. I have also ticked the option for Google Voice to send me a text transcript of any voicemail. Below is their interpretation of Natasha’s message it’s rather endearing how hopelessly wrong the largest company on earth gets it.

"Hi Stephen. It’s Jeff from BBC needs in nuns. And just to say I sent 80 tax, one, if to say we could do it. I left in i a m your time off to go into any time soon, or the court and full we could grab me today as we suggested at. A. F. I. If you could text me back byebye. I’ve sent you the details of skylights that you need to 3 T if you could give me a call. Bye. Enjoy the loans. I’ll speak to you after that. Thank you. Bye"

On a more serious note, such transcripts at least allow you to get an idea of the rough content and tone of a message without having to stop and listen to it, a much more concentration-intensive task.

"On a more serious note, such transcripts at least allow you to get an idea of the rough content and tone of a message without having to stop and listen to it, a much more concentration-intensive task."

What part of Stephen Fry's amusing example has *anything* to do with the actual content of the message? It didn't even get the caller's name right. All Stephen Fry could have gleaned from that was that his own name was Stephen. Thanks, Google Voice!

I use voice recognition all the time. Lots of people do. I use the voice-search on my Droid. You have to enunciate fairly clearly, but it's faster than typing. And when it's wrong, that's fine--you type it out instead. I also use Google Voice transcriptions. Are they perfectly correct? Heck no. They have tons of mistakes. But the transcription is accurate enough that one can glance it over and immediately know the general subject matter of the voicemail, which immediately tells you if you need to: (1) call

with everyone else... Google isn't great at translating and sadly it's pretty much the best. I speak a myriad of languages and Google only does well with Latin based langs and only if they are grammatically perfect.You could always figure it out by context but when you get to German or Russian, then you're in trouble. Hell, imagine Mandarin/Cantonese? Pretty soon though, everyone will be able to understand everyone else and I won't be as cool anymore:)

As long as people use it to improve their understanding and not to officially communicate with others, I have no problem with that. It can be somewhat offensive to receive papers that are badly translated. If you want to communicate or sell me something, at least try to learn my language instead of faking it with computer translators. You should see the ridiculous English to French translations sometimes...

The lack of interest in learning other languages can and will lead to embarrassing situations...

The lack of interest in learning other languages can and will lead to embarrassing situations...

The bigger problem for most people is practicing the language; most Americans will only encounter significant numbers of speakers of working-class, highly idiomatic, Indian-influenced Spanish. Those in New York and New England have the opportunity to learn Canadian French, a language that other French speakers will find archaic. It's definitely easier to keep up your skills than it used to be, with the Internet, but it's still difficult.

The problem is primarily things like diction. You can "train" someone sitting in front of a computer to speak slowly and clearly with good diction. Fine.

The problem is the most useful use model for a cell phone translator would be getting a cab or walking into a store. You talk into your phone and it says something to the other person in their language - wonderful, because you have "trained" yourself to speak clearly and slowly with good diction.

Then the other person mumbles something back at you in their language that neither you or the cell phone can make heads or tails out of. You can't "train" them so it will never work for that.

From my limited experience, English has its share of strange accents and such but in large measure people can speak with good diction and pronounciation. Lots of non-English languages seem to promote far less clarity and human-to-human it doesn't really impair communication that much. Human-to-machine is a whole different story and we are very far away from being able to do speech recognition with poor pronounciation and poor diction.

in large measure people can speak with good diction and pronounciation.

I wonder to what degree this is an artifact of Hollywood and the BBC pushing out their bland accents - as an American, I find it easy to understand RP, but a lot of casual British speech (e.g., on radio shows meant for domestic consumption) is very difficult for me to understand unless I've been listening to a lot of it lately.

Just about everyone is capable of some degree of accent- and code-switching, and American accents have become much, much more uniform than they were in my grandparents' time - idio

Sure, this has its limitations. We're not going to be conducting diplomacy with aliens on the deck of the starship enterprise using cell phone machine translation. But for simple and easy to understand things like "Where is the bathroom?" or "The cheese is old and moldy" this thing will be sufficient, I'm sure.

There is also an art to using machine translation. I don't know how to describe it, but if you input things like you'd imagine a foreigner saying them, the translation will be much better. Your in

This seems like something that the NSA is probably salivating over. Imagine being able to translate intercepts in near real time with accurate voice recognition. I'm sure they already have imagined it. That technology is nothing short of a Manhattan Project for the SIGINT community.

And they probably started working on it a decade or two ago, and have a working version now:P

I can't confirm or deny its existance. But I can offer you one reason for the denial: The FBI. If we admitted we had it, they'd want to use it. The last time they asked for forensic analysis, it was for their cross-dressing commander who wanted to know who had shit on his front doorstep. After intensive analysis, it was determined to be a squirrel. Then they called us because they couldn't identify which squirrel. We've been hesitant to offer our services ever since. We did find the squirrel though. In anot

It is really easy to make fun of translate.google.com based on how it translates Chinese to English. This is quite silly IMHO, as Chinese is possibly the hardest language in the world. (Travel around China and you'll find semi-literate taxi drivers, even in the major cities.[*]) This is a good article on why Chinese is hard: http://www.pinyin.info/readings/texts/moser.html [pinyin.info].

A better example would be say Dutch. Translate the OP from English to Dutch and back to English (i.e. a worst case scenario), and you end up with this:

"The company has an automatic system for translating texts on computers, sweetened by scanning millions of multilingual websites and documents. Until now includes 52 languages, adding Haitian Creole last week. Google has a system telephone speech recognition that allows users to query websites by speaking commands into their phones instead of typing them in. Now it is working on combining the two technologies to software to understand voice of a caller and translating it into a synthetic equivalent in a foreign language to produce. "

This is perfectly legible to me, and vastly better than what you got when babelfish was introduced 11 years ago. There is a good TechTalk about the topic at http://www.youtube.com/watch?v=y_PzPDRPwlA [youtube.com] which should be required viewing before making fun of google's machine translation efforts.

Voice recognition is harder, but for continuous untrained speech recognition google voice is pretty cool - I've gotten some barely intelligible voice messages on my google voice number, and where google voice is sure (i.e. black text) it is 95%+ correct, where it is not sure it is maybe 30% correct, but for another 30% it is not possible to figure out what was said, except when taking context into consideration. Google Voice transcribing a call from a mobile phone is better than what you got with Dragon Dictate 5 years ago even with a good microphone, so it is not unlikely that in a few years it will be better than naive human transcription. Humans will be better at guessing based on context thought.

Some caveats to what you wrote about Chinese: spoken Chinese is actually one of the simplest languages one earth - certainly a lot less confusing than the many cases and forms of English. Written Chinese, however, is surely one of the hardest. The article you link to hints at the reason: it was not in the interests of the upper class through most of Chinese history to be understood by the lower classes. In Europe this was "solved" by the upper classes conducting their business in a different language - the

A less 19th century European perspective might be that the Chinese mandated the continuity of their literary tradition, and thus words used 2000 years before still needed to be mastered. Of course this was difficult, but this was also in a culture where scholars memorized the Confucian classics as children. The scholar class had the job of studying and passing on literature, just as the Brahmans in India had the difficult task of memorizing the Vedas precisely. Or how Buddhist monks memorized massive sutras

It is really easy to make fun of translate.google.com based on how it translates Chinese to English...

A better example would be say Dutch.

Dutch and English are strongly related and almost mutually intelligible. You can generate a 99% understandable translation simply by substituting one word for another. That's why people use languages such as Japanese, Chinese, or Korean for demonstrating the problem with machine translation: since the languages are about as different from English as you can get, it will

Google Voice transcribing a call from a mobile phone is better than what you got with Dragon Dictate 5 years ago even with a good microphone, so it is not unlikely that in a few years it will be better than naive human transcription.

This has pretty much been the state of AI-type research ever since it has started. It's always "it's so-so now, but it will be better in x years".

I think we can expect Google's translation system to be a bit better in 5 years, but not an order of magnitude better. In 5 years we'll still be seeing the same kind of errors we see now, just not with commonly used phrases.

After the translation, you'll hear advertising based upon what was said.

Person Using Translator: Excuse me, where is the bathroom?Phone in local Language: Excuse me, where is the bathroom?Local Person (in local language): Down that hall, third door on your left.Phone in Person's Language: Down that hall, third door on your left. By the way, One Week Bath will build your dream bathroom in one week, guaranteed! Visit www.OneWeekBath.com today!

When in the Course of human events, it becomes necessary for one people to dissolve the political bands which have connected them with another, and to assume among the powers of the earth, the separate and equal station to which the Laws of Nature and of Nature's God entitle them, a decent respect to the opinions of mankind requires that they should declare the causes which impel them to the separation.

Google is in the business of collecting data and applying it to practical problems. I imagine the voice-to-text will be vastly improved over its generations by users accepting/rejecting the vtt result and them pooling the results data. The same thing could be done for translation from one language to another.

I see it as crowdsourcing the algorithm accuracy checks among millions of people, allowing them to improve the algo at a much faster rate than they (or their competitors) would otherwise be able to do in a closed testing environment.

This is all speculating on the fact that google pools results of translations or VTT and whether the user accepts/declines them. I wouldn't be surprised in the slightest if they did.

This is what I don't understand - on the google translation page there is a "suggest a better translation" etc feature. But the only people who would use google translate are those who aren't able to translate it themselves and hence are in no position to help out! Unless there are fluent speakers who use google translate for fun, I don't see much feedback coming from there...

Sometimes I need a phrase translated, and I know some of the words but not all. If I come across a translation that doesn't seem right at all, I'll get a second opinion by trying another engine, and usually by doing a reverse translation in both. And if I'm still skeptical, I'll just Google the translation and see if I can get the jist of the true meaning.

Google is working on a translation system that's based on the massive information they've gathered off the internet. To get an idea of how this works, have a look at the 2009 Google Wave developer presentation. Fast forward to about 1h 12minhttp://www.youtube.com/watch?v=v_UyVmITiYQ [youtube.com]

In another demo (which I can't find right now) they show how the translation engine understands the context of the conversation.

It's easy to see how this could be applied to a phone call using the right voice recognition software

Because a cellphone is a portable communications device and a modern one has major compute power and storage, thanks to decades of Moore's Law. Adding a translation application to such a platform - even if an only moderately competent one - is a natural fit and a potentially major benefit to the user at negligible cost to the provider.

... a partially telepathic device, rather than a pure computer program? (And invented by Spock's mother - a scientist who ended up marrying a high-ranking (and of course telepathic) Vulcan she encountered during her research?)