WMT 2017 Quality Estimation Datasets – phrase-level

Bilingual corpora labelled for quality at phrase-level (for researchers working on quality estimation or evaluation of machine translation).
7,500 machine translations annotated for quality with binary labels (good/bad) at the phrase-level (67,817 phrases). To be used to train and test quality estimation systems.
The corpus consists of source segments in English, their machine translation, a segmentation of these translations into phrases and a binary score given by humans indicating the quality of these phrases.

IMPORTANT LEGAL NOTICE
TAUS Terms of Use (https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21).
TAUS grants to QT21 User access to the WMT Data Set with the following rights:
i) the right to use the target side of the translation units into a commercial product, provided that QT21 User may not resell the WMT Data Set as if it is its own new translation;
ii) the right to make Derivative Works; and
iii) the right to use or resell such Derivative Works commercially and for the following goals:
i) research and benchmarking;
ii) piloting new solutions; and
iii) testing of new commercial services.