Word sense disambiguation for statistical machine translation

In this thesis, we show for the first time that lexical semantics modelling is useful in Statistical Machine Translation (SMT). Word Sense Disambiguation (WSD), the task of resolving sense ambiguity to identify the right translation of a word is one of the major challenges faced by language translation systems. If the English word ”drug” translates into French as either ”drogue” (used as a narcotic) or ”medicament” (used as a medicine), then an English-French machine translation system needs to disambiguate every use of ”drug” in order to make the correct translations. Heavy effort has been put in designing and evaluating dedicated WSD models, in particular with the Senseval series of workshops. This is partly motivated by the often unstated assumption that any full translation system, to achieve full performance, will sooner or later have to incorporate individual WSD components. However, in most machine translation architectures, in particular SMT, the WSD problem is typically not explicitly addressed. This paradoxical situation encouraged speculation that recent progress in SMT shows that SMT models are already very good at WSD and that current WSD systems have nothing to offer to state-of-the-art SMT. Going beyond these untested assumptions and speculative claims, we conduct the first direct extensive empirical study of the strengths and weaknesses of WSD and SMT. Using the state-of-the-art HKUST WSD system, we surprisingly show that incorporating WSD predictions in SMT does not help translation quality. Puzzlingly, we also report results suggesting that typical SMT models cannot disambiguate word translations as well as dedicated WSD systems. These seemingly contradictory results lead us to generalize conventional WSD models to incorporate assumptions at least as strong as in state-of-the-art SMT. Specifically, (1) WSD targets are generalized from words to phrases, (2) WSD sense inventories and annotation are learned automatically in the same way as conventional SMT translation lexicons, and (3) WSD models are fully integrated in SMT decoding. Remarkably, the resulting generalized Phrase Sense Disambiguation (PSD) models improve translation quality across four different Chinese-to-English translation tasks, as measured by eight common automatic evaluation metrics. Further analysis reveals that generalization from conventional WSD to PSD is necessary in order to obtain consistent improvements in translation quality.