How to Write for Machine Translation

Writing documentation that a Machine Translation engine can successfully parse is essential to producing better yet more affordable Machine Translations.

Thankfully, this is something that you can do by following a few simple rules when writing your own documentation. The most important thing to remember is to keep your writing clear and concise. The simpler your writing is, the easier it is for a Machine Translation engine to read it.

In this blog, we will take a closer look at how to produce clear and concise documentation.

Many organisations use “controlled language” to write for translation. Controlled language is much stricter than our everyday writing style. The aim of controlled language is to produce coherent and comprehensible documentation that is easy for a Machine Translation engine to read. Controlled language is particularly useful when writing instructional content. Uwe Muegge, a leading figure in the translation industry, has developed the Clout™ rule set; Clout stands for Controlled Language Optimised for Uniform Translation, and this blog references a number of rules within this set. This blog also references rules within Strunk and White’s The Elements of Style, which are useful for all content types.

Rule 1. Avoid misspellings
The most simple and basic rule of all! A Machine Translation engine cannot accurately translate a misspelled word. Ensure that you proofread your data before running it through your translation engine.

Rule 2. Keep your sentences short and concise
Avoid conjunctions (and, but, which, etc.) and more than one clause when possible. Keep your sentences shorter than 25 words. Ensure that each sentence is grammatically complete (begins with a capital letter, has at least one main clause, and has an ending punctuation).

Rule. 3 Use a simple grammatical structure
Do not over complicate the structure of sentences.

Example:
Show that you can organise your thoughts by using a simple sentence structure in your texts. = Correct
You, in your texts, to show that you can organise your thoughts, should use a simple sentence structure. = Incorrect

Rule 4. Use the active voice
The active voice is a direct writing style that cuts out vagueness and ambiguity. It is very difficult for Machine Translation engines to successfully translate vague phrases or those with double meanings.

Example:
“My first time building a KantanMT engine will always be remembered,” = Incorrect

The incorrect phrase is vague because it is unclear who will always remember you building your first
KantanMT engine; it could be you, someone else, or the world in general.

“I will always remember building my first KantanMT engine” = Correct

Rule 5. Write phrases that you can recycle
Write a phrase that you can recycle throughout your documentation. A Machine Translation engine can recognise and accurately translate repetitive phrases.

Example:
You could use the following list within several sections of the same document.
“Follow these three steps to build your KantanMT engine:

Rule 9. Repeat nouns instead of pronouns
This improves the clarity of sentences.

Example:
“You must build the KantanMT engine before using the KantanMT engine to translate client files” = Correct
“You must build the KantanMT engine before using it to translate client files” = Incorrect

And that’s it! Pretty simple right? By following these simple but important steps, you will write documentation that is much more Machine Translation friendly. That means less post-editing time, faster outputs, lower costs, and happier clients!

[…] There are many pre-editing steps you can carry out to produce better MT output. Also, keep in mind writing styles when developing content for Machine Translation to cut the amount of pre-editing required. Get tips on writing for MT here. […]