Personal tools

Tense-Annotation

This dataset contains parallel English and French texts from the Europarl corpus (Koehn, 2005).

The files provide alignments of EN and FR verbs along with information on their position, tense and voice and can therefore be used in translational studies for these languages and/or the training of translation systems that can make use of the labels in this resource.

Although the resource was created semi-automatically, the verb alignment and inferred tenses are of high precision, especially in the second file contained in the package:

Tense-Annotation-full.txt : complete alignment.

Tense-Annotation-gold.txt : alignments only for cases where there is an EN /and/ an FR tense that was inferred from the verbs.

A description of the method that was used to create the alignment will soon be published.

Work regarding this resource was partially funded by the SNF Sinergia projects COMTIS and MODERN. We would also like to thank Andrei Popescu-Belis, Bastien Crettol, Yann Rodriguez and Vincent Spano of Idiap for their help in making the data available.