It's a major version of Content Translation first released in September 2018. It keeps the general design of Content Translation, but several features are thoroughly updated. In particular, it integrates Visual Editor as the editing environment, improves paragraph alignment, and makes several changes in support for templates. As of October 2018, CX2 is still in development. You can try it by clicking "Try the new version" on the Content Translation dashboard.

Is Content Translation a machine translation or an automatic translation tool?Edit

Neither.

Content Translation is a working environment for human wiki editors that helps creating the first revision of an article by translating an existing article from another language. It automatically helps to adapt images, links, footnotes, templates and some other formatting elements.

For some language pairs it also has the option of integrating a machine translation service. The currently supported services are Apertium, Matxin, Yandex.Translate, and Youdao. More services may be added in the future.

Definitely! It was pretty clear even before the development started: In the past there were so many attempts at making similar tools that it's impossible to count them. Some are listed at m:Machine translation#Attempts (please add there any you know of).

As of October 2018, over 40,000 users created over 370,000 articles using it since it was enabled in January 2015. More than 500 users created over 100 articles each using Content Translation. So it is more certain than ever that there is demand for it.

How does the Content Translation tool differ from the Translate extension?Edit

The Translate extension was initially built with focus on translating software user interface messages for MediaWiki and other programs. As such, it's built to keep translations in sync with the source language, while each language edition of a Wikimedia project is supposed to be independent and not reflect a source language.

To provide its advanced features on MediaWiki wiki pages, Translate requires preparation of the source page for translation with some additional markup, which can't yet be handled visually. Said wikitext is distracting to most editors and we did not want to expose Content Translation translators to it.

It is available for logged-in users of a wiki where it's enabled, and it must be enabled as a beta feature in the preferences.

When I try to use Content Translation, I see a blank page and nothing works. How can I fix this?Edit

The most common reason for this is old local JavaScript code on your wiki, which is incompatible with Content Translation. Such code usually appears in Gadgets, user scripts, or Common.js. If you recently enabled a gadget or changed a user script, try disabling it. If this fixes the problem, talk to the gadget maintainers on your wiki and ask them to fix it.

The incompatibility is usually resolved by going to Special:CX and finding the failing source code line at your browser's JavaScript console. Most often the reason for the failure is that the #bodyContent element is used in the JavaScript code. This element is not available on Special:CX, you should edit the JavaScript code and check whether #bodyContent actually exist.

If this explanation is too complicated or doesn't help, please report the issue at Talk:CX, and if possible, notify the people who maintain gadgets and user scripts on your wiki.

The focus for initial development is articles in Wikipedia. In the future it may also be enabled in Wikivoyage and other sites.

What are the steps to create a new article with the Content Translation tool?Edit

The main entry point to Content Translation is a button on your contributions page:

Click "Contributions" in your personal bar (near "Log out").

Click "New contribution" and select "Translation".

Click "Create new translation".

Select the language from which you want to translate in the "From:" field, and type the name of the article in that language.

Select the language to which you want to translate and type how the new article will be called.

Click "Start translation". This will take you to the translation interface.

Type the translation of each paragraph in the translation column. You don't have to translate all the paragraphs. Translate as much as needed for the wiki in your language.

Until you publish, the translation is regularly saved automatically, so you don't have to worry that you'll lose it. To come back to an article that you started translating, repeat steps 1 and 2 and select the article from the list that you'll see.

When you wrote everything you want for the first version of the new translated article, click "Publish translation". This will create a new article in the wiki in the target language with the title that is given at the top of the translation column.

"Suggestions" are personalized automatically-generated links to translating articles that exist in one language but not in another. They are generated based on the articles that the user who is viewing the CX dashboard translated recently, and on frequently-read articles in one language that don't exist in other languages. So for example if you translated an article about a city in India, you are likely to see more suggestions to translate more articles about cities in India. And if there is an article about the recently-elected president of Slovakia in English, but not in your language, you are likely to see a suggestion to translate it.

How can I read a suggested article in the source language before I translate it?Edit

Click the language name.

For example, if the suggestion is to translate an article from English to Italian, you will see "English > Italiano" at the bottom of the suggestion row. The word "English" will be a link to the article in English.

The tool will try to adapt references as much as possible between the source and target languages.

If you delete a reference from the translation, you can add it back by placing the caret where you want to add the reference, clicking the reference in the source column and then clicking "Add reference".

Adapting references may be challenging given that different languages use different citation formats. If there is a reference that you cannot adapt, or that is adapted incorrectly, please report a bug.

Yes, images will be copied just like paragraphs - simply by clicking. The translator will have to type the caption, of course.

It works only if the image is stored in a common media repository (for Wikimedia projects this is Commons). It doesn't work for files stored in the projects locally. It also won't work if the image is a part of a template, such as an infobox; however, it may be possible to adapt the image if the template is adapted.

The creators of the source article are credited in a link to the revision of the source article from which the translation was made, which is added automatically to the edit summary of the published version. This is compatible with the CC BY-SA license and with Wikimedia's Terms of Use, which require you as the re-user to give attribution "through hyperlink [...] to the page or pages that you are re-using".

Will the previous revisions of the translated be imported when I am translating?Edit

Not automatically.

In Wikipedia in some languages, such as German, there is a custom of importing the revisions of the article that was translated so that they appear in the history as if they were made before the first actually translated revision. This can be done with articles that were created using Content Translation in the same way.

At the moment Content Translation is focused on creating the first version of the article. After publishing, the translation cannot be loaded as such from the dashboard, and the published page must be edited as a usual wiki page.

There is a plan to make it possible to add translated paragraphs to already-published pages.

Link adaptation: Links will be adapted automatically when they will be available as interlanguage links to the target languages. It will be possible to make basic manipulation on them - remove them and pick them from other sources.

Category adaptation: Categories that have a directly corresponding category page in the target language linked by an interlanguage link will be added to the translated page.

Image adaptation: Images are copied to the translated article in one click.

Machine translation and translation memory: These are similar to what is used in the Translate extension.

Can anybody read the text that is saved automatically while I am writing the translation?Edit

The data for translation memory will have to be filled from some initial translations, so it may take a while from the time that translation memory is enabled for Content Translation until it becomes useful.

As good as any other articles are created in Wikipedias in the respective languages.

Since the deployment of Content Translation as a beta feature in some languages in January 2015 until November 2015 about 30,000 articles were created. In July 2015 CX became enabled as a beta in all languages, and since then till November 2015 less than 10% of the articles created using CX were deleted. For comparison, the rate of deletion of articles that are created using the wiki syntax editor goes up to 50% in English.

The articles that were not deleted developed as usual Wikipedia articles: people fixed layout, added or edited paragraphs, added templates, improved references, and so on. Usually these improvements were done both by the person who created the first version and by other wikipedians.

There is no machine translation for my language. How is Content Translation useful to me and my wiki?Edit

By itself Content Translation is not a machine translation tool. Its primary focus is to help people to create translated wiki pages as efficiently as possible. It includes tools that are tightly integrated with MediaWiki and its usual content creation and editing workflow: display of the source and the translation side-by-side; adaptation of links, categories, images and text formatting; publishing to different namespaces; interlanguage links. These features are already supposed to make typing translated articles by hand easier.

This is not just theory. Content Translation was enabled in the French Wikipedia on March 31 2015 and by June 7 it was used to create 500 articles, even though machine translation was not available.

The fact is that machine translation is not available for the majority of languages in which there are Wikipedias, so most language pairs will only be able to use Content Translation as a tool to translate articles manually with the above adaptation tools. If you want to help create a machine translation engine for your language, see How can I improve machine translation support for my language?

Machine translation to my language is bad, and it's easier to translate manually. How is Content Translation useful to me and my wiki?Edit

As written in the previous answer, Content Translation is not by itself a machine translation tool, but a tool to create translated wiki pages. It is designed to be useful even without machine translation.

Machine translation works quite well in some languages, and then it can make the translators' work even more efficient. Machine translation support for a language pair is enabled after testing and approval from people who know the language well.

If machine translation support for your language is enabled, but you don't want to use it, you can disable it and still enjoy the other tools, such as link, category, and image adaptation, as well as dictionaries (if available for your language).

For language in which machine translation is supported in Content Translation, machine translation will be auto-filled upon clicking a paragraph in the translation area.

Initially we're using the Apertium engine, which is free software and can be installed and maintained on our own servers. At a later point we may use Moses and other engines. In November 2015 we added Yandex for limited use between English to Russian and in the months after that we enhanced this to most languages that Yandex supports. We have added more machine translation services since then - Matxin and Youdao. You can read more about the process in the page dedicated to machine translation systems in Content Translation.

Content Translation just pastes the same text from the source article. How do I actually make it translate the content?Edit

First of all, it's important to understand that Content Translation by itself is not a machine translation tool. The translation is supposed to be done by humans. For some languages machine translation is integrated, but even then it is always supposed to be fixed by the translator.

If Content Translation supports machine translation to your language, you'll be able to enable it in the "Automatic translation" card that appears in the sidebar when you click a paragraph.

If machine translation to your language is not supported, you will only have two options: to retype the information that is initially pasted in the source language, or to disable it entirely in the "Automatic translation" card and type everything by yourself. In all cases you'll be able to use other tools, such as link, image and reference adaptation.

What languages are being handled by Yandex? Are there plans to add more?Edit

Yandex is available at present for more than 70 languages. As Yandex’s language coverage expands we will consider enabling them for Content Translation. Please note: Yandex machine translation will not be available when creating pages into English.

As a user of Content Translation you will not feel any difference on the translation interface as the machine translation system of Yandex will display the translated content in the same way Apertium currently does for the supported 45 language pairs.

Yandex provides a free for use API key that allows websites and other services to use their translation system. Content Translation also uses a unique API key to access this service on Yandex’s server. When a user starts translating an article, the HTML content of each section of the source article is sent to the Yandex server and a translated version is obtained and displayed on the respective translation column of Content Translation. Links and references are adapted as usual and users can modify the content as required.

This process continues for all the sections of the article being translated. For better performance, the translations for consecutive sections are pre-fetched. The user can save the unpublished translation (to work on it again at a later time) or publish the article in the usual manner. The article is published on Wikipedia like any other normal article with appropriate attribution and licenses.

Content Translation evolved from a long-standing need to bridge the gap in the amount of content between Wikipedias in different languages. Like all other software used on Wikimedia sites, Content Translation is also open source. In this particular case as well, we are using an open source client to interact with the external service and import freely licensed content in order to help users expand our free knowledge.

To use Yandex’s machine translation system we are not adding any proprietary software in the Content Translation code, or on the Wikimedia websites and servers. The service is free of charge and available for everyone.

Only the freely available Wikipedia article content (in segments) is sent to the Yandex service and the obtained translated content is freely usable on Wikipedia pages. The translated content can be modified by users and this data is also available publicly under a free license through the Content Translation API. This is a valuable resource made available for the community to develop open source translation services for those languages where they don't exist yet.

After studying the implications carefully, we found the fact that the content was stored previously in a closed source service does not limit the freedom of our knowledge or our software in the present or the future. We have taken special care to make sure that the content provided is freely licensed to make sure it complies with Wikipedia policies. This includes a long process for legal and technical evaluation and compliance. The summary of the terms of use is also available.

From user feedback we have seen that machine translation support is really helpful for users and we want to support all languages in the best way. Guided by the principles of Wikimedia Foundation’s resolution to support free and open source software, we will prioritise the integration of open source services whenever they are available for a language. Apertium has been a critical part of Content Translation since its inception, but currently it only provides machine translations for 45 of the numerous possible language combination that Wikipedia can support.

Should I be worried about my personal information when using Yandex?Edit

Irrespective of the service being used, you can be sure that only Wikipedia content from existing articles is sent and only freely licensed content will be added back to the translation. No personal information is sent and communication with those services happen at the server side, so they are isolated from the user device. Please refer to this diagram for more details.

What if Yandex is the only machine translation tool available and I don’t want to use it?Edit

Machine Translation is an optional feature in Content Translation that you can easily disable at will. If more machine translation systems are added for your languages, you can choose to enable MT again and select the MT service of your choice.

Will the content translated by Yandex be free for use in Wikipedia?Edit

Yes. The content received from Yandex is otherwise freely available on the Yandex web translation platform. Content Translation receives it via an API key to make it seamlessly available on the translation interface. This content can be modified by the users (if necessary) and used in Wikipedia articles under free licenses.

Can this content be used for improving machine translation systems in general?Edit

Yes. Translations made in Content Translation are saved in our database. This information will be made publicly available for anyone to use as translation examples to improve their translation services (from University research groups, open source projects to commercial companies, anyone!). The content can be accessed via the Content Translation API. Please note, only information related to translated text is publicly available. This includes - source and translated text, source and target language information and an identifier for the segment of text.

Because it should be easier for translators who are beginners with Wikipedia editing, and because it was much easier to implement features the adaptation of references, images and links, as well as machine translation integration in an HTML-based WYSIWYG editor. Content Translation is an article creation tool rather than an article editing tool. Because it is not supposed to be a full-fledged article editing environment, it only provides the most basic formatting tools. After an article is created, it can be edited in the VisualEditor or in the source editor, just like any other article.

In more technical terms, Content Translation uses a simple HTML "contenteditable" element that is available in modern browsers. It transforms the source article's HTML to the translation, and when publishing the article as a wiki page, it converts the translation to wikitext using Parsoid. At the moment, Content Translation does not use the VisualEditor for editing the translation, though this may be done in the future.

As of mid-2016, there is no plan to add the ability to write the translation in wiki syntax in Content Translation. There is a possibility that this will happen in the future, however, for example by integrating the new wikitext editor that is being developed by the Visual Editor team.

If you are concerned about the quality of the wiki syntax, you can publish your translation to user or draft space by changing the target title appropriately.

There was a lot of research on the topic, see m:Machine translation#Attempts. For instance: «The quantitative results show that the contributions can improve the accuracy of a combination of RBMT-SPE pipeline at around 10 %, after the post-edition of 50,000 words in the Computer Science domain. We believe that these conclusions can be extended to MT engines involving other less-resourced languages lacking big parallel corpora or frequently updated lexical knowledge» (doi:10.1007/978-3-642-35085-6_4).

We treat machine translation only as a tool that may help a human translator be faster. Publishing machine-translated articles is not the intention of Content Translation, and it is actively discouraged.

Will there be a feature to prevent bulk publishing of unedited machine translated text?Edit

Yes!

We take article quality seriously. Machine translation is only a tool that helps the translator be more efficient, and the developers understand well that all translations must be edited by a human. The translation interface will show a warning if the translator will try to publish an article that only has machine translation. The developers will work with the editing communities to adjust this for the needs of every language.

The dictionaries will be initially taken from free dictionaries from the freedict project. Later other dictionaries may be added, such as Wiktionary, OmegaWiki, terminology collections, and possibly other open sites.

The ContentTranslation extension works from the outset with multiple wikis and it needs to synchronize information between them. To make this possible, it uses an additional component called "cxserver". It also optimizes much of the connection to translation tools, such as dictionaries, machine translation, etc.

A markup applied to some part of text. Basically, it is html tags like anchor, bold, italic, underline etc.

card

a box which appears in the tools column on the special page and provides translation tools for specific context, e.g. a box that allows editing links

columns

vertical areas in which Special:ContentTranslation is divided: there are currently three columns (source, translation, tools)

Content Translation (CX)

This tool consisting of ContentTranslation extension and cxserver backend.

cxserver

Backend for CX written in Node.js, handling text segmentation and providing consistent API for services like machine translation, dictionaries and translation memories.

glossary

A list of terms with definitions or translations.

GWT (Given-When-Then)

GWT is a semi-structured way to write down test cases. They can either be tested manually or automated as browser tests with Selenium.

lemmatization

also called stemming. Mapping multiple grammatical variants of the same word to a root form; e.g. (swim, swims, swimming, swam, swum) -> swim. Derivational variants are not usually mapped to the same form (so happiness !-> happy).