A picture tells a 1,000 words. Here's about 750 on Facebook using pics to school AI translators

Not a great way, but an interesting way, to teach bots

Computers are getting pretty good at translating the world's languages. However, as they say, onwards and upwards. Eggheads are now trying to teach machines to do the job in a more human-like way.

“While most machine translation systems to date are trained on large parallel corpora, humans learn language in a different way: by being grounded in an environment and interacting with other humans,” Facebook's AI research team and academics at New York University set out in a paper that appeared on arXiv this month.

Thus, rather than explicitly training neural networks on pairs of languages, the team taught bots new lingo by making them play a communication game.

Here's how it worked for two bots: an English one learning Japanese, and a Japanese one learning English. The English computer player – or speaker – is given an image, such as a photo of a galaxy, and tries to describe the picture to the second Japanese player.

The second bot – or listener – is given two pictures: a target image, and what's called a distractor image, which is a picture of something else. In this example, the target image shown to the listener is the photo of the galaxy, and the distractor image is a picture of a plant. Based on the English speaker's description, the second bot has to guess which of the two pictures it has been shown – the galaxy and the plant – is the one described by the speaker. The listener isn't told which is the target picture; it has to work it out for itself.

The speaker's goal is to send a message that is both an accurate description of the target image, and helps the listener identify the correct image.

As both players take turns to be the speaker and listener, they’re trained to map the right words to the right images in two different languages. It works similarly to machine translation where neural networks learn to map corresponding words in different languages to translate text.

“It is natural to use vision as an intermediary: when communicating with someone who does not speak our language, we often directly refer to our surroundings,” the team's paper stated.

When the game is made more difficult using complete sentences rather than words in English and German, the system struggled, the researchers admitted. But the performance is slightly better when there are three players instead of two.

Now each player has to communicate with two other bots speaking two other languages. The researchers noticed that the quality of the translations between pairs of languages improves.

Douwe Kiela, a researcher at Facebook, told The Register that "this is probably because of what are called ensemble effects in machine learning: as more agents interact with each other, they learn from more diverse data, which allows them to learn faster and as it turns out, to become better at translation."

Studies

Experimenting with scenarios where multiple agents are forced to talk to get a task done is quite popular. OpenAI and Baidu both carried out similar studies to get bots to invent their own language about objects in their environment.

The results from Facebook's latest multi-agent tests show this approach of using image description passing is poor for building a translation engine, but it's an interesting technique nevertheless.

Kiela explained the study was focused on "low-resource translation."

"[It's] an interesting AI problem: we are getting pretty good at translation when there is a lot of parallel data available," he said. "Parallel means having an original sentence and a corresponding translation, but this kind of data isn’t available for a lot of language pairs.

"Low resource machine translation is still very much an open problem. Our method shows that parallel data is not strictly necessary, as long as there is an intermediate common ground, in this case in the form of images."

"It can potentially be used to improve existing translation systems, especially for low resource languages, and it can lead to new translation methods. A problem with the current method is that there are no images for abstract sentences. For example, 'Democracy is a political system' does not have corresponding images. We plan to work on that in future work." ®