AI can translate between languages in real time as people speak, according to fresh research from Chinese search giant Baidu and Oregon State University in the US.

Human interpreters need superhuman concentration to listen to speech and translate at the same time. There are, apparently, only a few thousand qualified simultaneous interpreters and the job is so taxing that they often work in pairs, swapping places after 20 to 30 minute stints. And as conversations progress, the chance for error increases exponentially.

Machines have the potential to trump humans at this task, considering they have superior memory and don’t suffer from fatigue. But it’s not so easy for them either, as researchers from Baidu and Oregon State University found.

They built a neural network that can translate between Mandarin Chinese to English in almost real time, where the English translation lags behind by up to at least five words. The results have been published in a paper on arXiv.

The babble post-Babel

Languages have different grammatical structures, where the word order of sentences often don’t match up, making it difficult to translate quickly. The key to a fast translation is predicting what the speaker will say next as he or she talks.

With the AI engine an encoder converts the words in a target language into a vector representation. A decoder predicts the probability of the next word given the words in the previous sentences. The decoder is always behind the encoder and generates the translated words until it processes the whole speech or text.

“In one of the examples, the Chinese sentence ‘Bush President in Moscow…’ would suggest the next English word after ‘President Bush’ is likely ‘meets’”, Liang Huang, principal scientist at Baidu Research, explained to The Register.

“This is possible because in the training data, we have a lot of “Bush meeting someone, like Putin in Moscow” so the system learned that if “Bush in Moscow”, he is likely “meeting” someone.

The problem with languages

The difficulty depends on the languages being translated, Huang added. Languages that are closely related, such as French and Spanish for example, have similar structures where the order of words are aligned more.

Japanese and German sentences are constructed with the subject at the front, the object in the middle, and the verb at the end (SOV). English and Chinese also starts with the subject, but the verb is in the middle, followed by the object (SVO).

Translating between Japanese and German to English and Chinese, therefore, more difficult. “There is a well-known joke in the UN that a German-to-English interpreter often has to pause and “wait for the German verb”. Standard Arabic and Welsh are verb-subject-object , which is even more different from SVO,” he said.

The new algorithm can be applied to any neural machine translation models and only involves tweaking the code slightly. It has already been integrated to Baidu’s internal speech-to-text translation and will be showcased at the Baidu World Tech Conference next week on 1st November in Beijing.

“We don’t have an exact timeline for when this product will be available for the general public, this is certainly something Baidu is working on,” Liang said.

“We envision our technology making simultaneous translation much more accessible and affordable, as there is an increasing demand. We also envision the technology [will reduce] the burden on human translators.”