Statistical and neural machine translation (SMT/NMT) methods have been successfully used to build MT systems in many popular languages in the last two decades with significant improvements on the quality of automatic translation. However, these methods still rely upon a few natural language processing (NLP) tools to help pre-process human generated texts in the forms that are required as input for these methods, and/or post-process the output in proper textual forms in target languages.

In many MT systems, the performance of these tools has great impacts on the quality of resulting translation. However, there is not much discussion on these NLP tools, their methods, their roles in different MT systems of diverse methods, their coverage of support for the many languages in the world, etc. In this workshop, we would like to bring together researchers who work on these topics and help review/overview what are the most important tasks we need from these tools for MT in the following years.

These NLP tools include, but not limited to, several kinds of word tokenizers/de-tokenizers, word segmenters, morphology analysers, etc. In this workshop, we solicit papers dedicated to these supplementary tools that are used in any language and especially in low resource languages. We would like to have an overview of these NLP tools from our community. The evaluations of these tools in research papers should include how they have improved the quality of MT output.

TOPICS

We solicit original research papers, review papers as well as position papers on these tools in the workshop. Multilingual and/or Cross-lingual NLP tools for MT of low resource languages are especially welcome. Topics of the workshop include but not limited to

– Research and review papers of pre-process and/or post-process NLP tools for MT

– Position papers on the development of pre-process and/or post-process tools for MT