Neural Machine Translation: Stock VS Custom engines

With the disruption of Neural Machine Translation (NMT) in recent years and its application to numerous fields and situations, our idea of Machine Translation (MT) has changed from a practical system that provides fast but poor quality translations to a system that is now capable of producing natural-sounding output and is able to be trained and customised via machine learning and AI.

How does Neural Machine Translation work?

Neural Machine Translation is produced by powerful computer units called MT engines. In order to produce that high quality that we’ve seen in recent years, the engineer needs to first “train” the neurone network with translation data, lots of it. That explains why the quality is high in certain language pairs and low in others, the more past translation data there is the higher the output quality of the MT engine is. For example for a number of European languages, thanks to the European Union and enormous amounts of translated data, MT engines are able to produce relatively high-quality output. It is less the case for language pairs such as English into Arabic, or non-European languages in general (Chinese being an exception).

Another thing to take into account is that because the NMT engine relies on past translation data, the quality of this data will also affect the performance of the engine. Certain NMT engines such as German DeepL have gained quite a reputation by training their NMT engines with carefully selected high-quality translation data. Now we know that Neural Machine Translation quality output relies on two things: quantity and quality of translation data. This is what makes the main difference between the 2 main offerings of NMT engines, so-called “stock” engines and custom engines.

Stock NMT engines

Stock Neural Machine Translation engines claim to get their output performance from massive amounts of translated data. The most famous stock engines are Google NMT, Microsoft Bing or Amazon MT. These are usually publicly available & free of use. These engines have been trained with very large amounts of translated data and that is where they get they reliability from (this is of course relative).

Stock engines are useful for general & simple texts with few or no figures of speech and are helpful to get the meaning of a text instantly. The output quality tends to suffer greatly as soon as the text becomes more technical, domain-orientated or stylistic. This is why machine translation engineers have been working on custom NMT engines.

Custom NMT engines

Custom Neural Machine Translation engines are the answer to the question: If the performance of an NMT engine relies on translation data quality that is used for its training, can I influence this quality by carefully selecting the training data?

For example by training an NMT engine only with data from the finance & banking domain, you could hope to achieve higher quality output when translating content specific to this particular domain. Another example is that training an NMT engine with client-specific translation data will allow achieving translations that use the client’s specific lingo, tone, etc… That is why certain custom NMT engine vendors such as Kantan MT or Globalese claim higher quality in translation output for certain domain in specific language pairs.

Once again it still depends on the amount of data and usually, the engine will first use a base of translation data as a start and is then trained with the specific data for it’s intended purpose. As you use the custom MT engine and get the translated output reviewed and edited by professional translators and subject matter experts you can feed those translations back into the engine as training data to improve quality over time.

AT 2M we work with a team of computational linguists who can customise NMT engines with training data to suit the domain and terminology of our clients. Our engines are securely hosted on our private servers complying with stringent cybersecurity regulations of the Australian Federal Police, Defence Department and QLD Government. We also use NMT output when assisting Research Institutions in translating vast amounts of content that otherwise would remain untranslated due to lack of budget and time.

We look at “Fit for purpose” and what the intentional use of the translated assets is, in order to determine the most suitable linguistic approach.