Complexity of alignment and decoding problems: restrictions and approximations

Abstract

We study the computational complexity of the Viterbi alignment and relaxed decoding problems for IBM model 3, focusing on the problem of finding a solution which has significant overlap with an optimal. That is, an approximate solution is considered good if it looks like some optimal solution with a few mistakes, where mistakes can be wrong values (such as a word aligned incorrectly or a wrong word in decoding), as well as insertions and deletions (spurious/missing words in decoding). In this setting, we show that it is computationally hard to find a solution which is correct on more than half (plus an inverse polynomial fraction) of the words. More precisely, if there is a polynomial-time algorithm computing an alignment for IBM model 3 which agrees with some Viterbi alignment on \(l/2+l^\epsilon \) words, where l is the length of the English sentence, or producing a decoding with \(l/2+l^\epsilon \) correct words, then P \(=\) NP. We also present a similar structure inapproximability result for phrase-based alignment. As these strong lower bounds are for the general definitions of the Viterbi alignment and decoding problems, we also consider, from a parameterized complexity perspective, which properties of the input make these problems intractable. As a first step in this direction, we show that Viterbi alignment has a fixed-parameter tractable algorithm with respect to limiting the range of words in the target sentence to which a source word can be aligned. We note that by comparison, limiting maximal fertility—even to three—does not affect NP-hardness of the result.

Keywords

Notes

Acknowledgments

We are very grateful to the anonymous referees and the editor of the Machine Translation journal for suggesting a more relevant setting to apply our techniques, and pointing us to the literature. We also want to thank Todd Wareham, Valentine Kabanets and Russell Impagliazzo for numerous discussions and suggestions, and to Venkat Guruswami for telling us about then-unpublished work of Sheldon and Young.

Søgaard A (2009) On the complexity of alignment problems in two synchronous grammar formalisms. In: Proceedings of the third workshop on syntax and structure in statistical translation (SSST-3) at NAACL HLT 2009, Boulder, pp 60–68Google Scholar