One model is better than two. Yandex.Translate launches a hybrid machine translation system

14 sep 2017

Today Yandex.Translate launched a hybrid machine translation system that combines neural and statistical approaches to machine translation to deliver our users an even higher quality translation that utilizes the complementary strengths of both translation models. The new system first translates users’ queries using both a statistical and a neural machine translation model. Next, CatBoost, our gradient boosting library ranks the outputs of each model, ultimately selecting the highest quality translation.

There are several approaches to machine translation and over the years, a number of technological advances have improved the quality of machine translation. Since its launch in 2011, Yandex.Translate has been powered by statistical machine translation, a widely used approach that works by comparing example translations to find statistical correspondences between words in the two languages.

With today’s launch, Yandex.Translate now also includes a neural machine translation component, a method that has led to more fluent, human-like translations in the last few years. The new Yandex.Translate system is unique in offering users a free machine translation service that combines these two methods.

Statistical translation and neural translation models each have different strengths that complement each other. When combined in our new hybrid machine translation system, they will produce higher quality results than either of the underlying models alone.

Statistical models prove extremely efficient at memorizing example translations and can produce better translations of words or phrases that are seen less frequently in the training data. However, statistical machine translation break sentences up into words or phrases during the translation process, which sometimes makes it challenging to construct fluent translations.

Neural machine translation models, on the other hand, can process entire sentences at once. Neural models choose a translation based on the full context of a query, often resulting in much more fluent, human-like translations. But, because the neural network uses context to understand how a word is translated, it often fails to learn reasonable translations for words that it saw very few times in the training data. By combining the two systems, which excel in different areas, we see significant improvements in translation quality over either of the individual methods.

The hybrid system will initially be launched for the English and Russian language pair, which accounts for 80 percent of the tens of millions of daily Yandex.Translate requests. The Yandex.Translate team also hopes to add other language pairs in the near future.

Yandex’s new Head of Machine Translation, David Talbot explains, “We are excited to launch our new hybrid system for Yandex.Translate users. Ultimately, we want to develop a deeper understanding of how we can better assist Yandex users with their language needs, be it communication, language learning or simply accessing the huge amounts of information on the web available in other languages.”