Google Translate Can Teach Itself With "Zero Shot Translation"

A new Google Translate system of artificial intelligence called Zero Shot Translation can now create translations between multiple different pairs of languages — even pairings the system has never been exposed to before.

In a blog post Tuesday, a trio of Google researchers explain how upgrades to their recently unveiled Google Neural Machine Translation (GNMT) system pinpoint the required target language and allow for translations between different combinations of languages, sight-unseen. This means Google Translate can now translate, say, Korean into Japanese, despite having received no Korean-into-Japanese training data. An accompanying research paper was published in Cornell University’s arXiv (“archive”), an online repository for scientific research papers.

Last week, the Google Translate team announced it had the ability to translate whole sentences, as opposed to merely reading things as a succession of discrete individual words. The new system — GNMT — makes for a subsequently more cohesive, organic, and generally human-sounding experience. (The company had previously announced its intention to adopt the new A.I. technology back in September.)

But GNMT only enabled Google Translate’s development team to improve on translations of languages they tested the system on, meaning the bulk of the 103 languages Google Translate incorporates couldn’t benefit. This is where Zero Shot Translation comes in.

The AI is an improvement on a new system unveiled just last week.

Essentially, if the system contains pre-existing knowledge that enables it to translate Japanese to English, as well as Korean to English, the new model can use those parameters to translate Japanese to Korean, without ever actually being fed that explicit data. This is what Google means by “Zero Shot,” illustrated by the yellow dotted lines above.

Beyond the immediate benefits the new system will provide to users, the linguistic implications of Google Translate A.I.’s ability to learn this are pretty rad — the success would seem to indicate that phrases with similar meanings manifest across language barriers in similar ways (“interlingua,” as the Google blog post put it). The developers actually 3D modeled all possible language pairs, the effect of which was to illuminate an apparent “overall geometry” between sentence semantics of different languages. In short, interlingua.