Have you ever wanted to have a ‘mega-TM’ or AutoSuggest dictionary filled with content that is available on the web? Are you fed up with the inferior quality of a multilingual corpus such as the European Medicines Agency documents (EMEA) that are available as part of the OPUS project?

Join us on our journey to learn how we downloaded (parts) of the Internet (huge parts of the EMEA documents available), how we batch-converted the PDF files into Word files and how we cleaned up the Word files to improve the results of the alignment process. Learn about the tools we used for batch-aligning the files and what we did to clean up the aligned TMs.

We will also discuss just how much the huge TM and AutoSuggest dictionary improved our productivity and which problems still need to be solved in order to achieve the same productivity that is claimed by MT and post-editing.

Date and time: May 28th, 2013, 15h30 CET – Central European Time
Duration: 120 minutes with Q/A
Language: English