Article Structure

Abstract

In statistical machine translation (SMT), syntax-based pre-ordering of the source language is an effective method for dealing with language pairs where there are great differences in their respective word orders.

Introduction

SMT systems have difficulties translating between distant language pairs such as Chinese and English.

Dependency-based Pre-ordering Rule Set

Figure 1 shows a constituent parse tree and its Stanford typed dependency parse tree for the same

Experiments

We used the MOSES PBSMT system (Koehn et al., 2007) in our experiments.

Conclusion

In this paper, we introduced a novel pre-ordering approach based on dependency parsing for a Chinese-English PBSMT system.

Topics

dependency parse

This paper introduces a novel pre-ordering approach based on dependency parsing for Chinese-English SMT.

Page 1, “Abstract”

Since dependency parsing is more concise than constituent parsing in describing sentences, some research has used dependency parsing in pre-ordering approaches for language pairs such as Arabic-English (Habash, 2007), and English-SOV languages (Xu et al., 2009; Katz-Brown et al., 2011).

Page 1, “Introduction”

In contrast, we propose a set of pre-ordering rules for dependency parsers .

Page 1, “Introduction”

(2007) exist, it is almost impossible to automatically convert their rules into rules that are applicable to dependency parsers .

Page 1, “Introduction”

In fact, we abandoned our initial attempts to automatically convert their rules into rules for dependency parsers , and

Page 1, “Introduction”

(b) Stanford typed dependency parse tree

Page 2, “Introduction”

Figure l: A constituent parse tree and its corresponding Stanford typed dependency parse tree for the same Chinese sentence.

Page 2, “Introduction”

They created a pre-ordering rule set for dependency parsers from English to several SOV languages.

Page 2, “Introduction”

Figure 1 shows a constituent parse tree and its Stanford typed dependency parse tree for the same

Page 2, “Dependency-based Pre-ordering Rule Set”

As shown in the figure, the number of nodes in the dependency parse tree (i.e.

Page 2, “Dependency-based Pre-ordering Rule Set”

Because dependency parse trees are generally more concise than the constituent ones, they can conduct long-distance reorderings in a finer way.

Then, syntactic reordering rules are applied to these parse trees with the goal of reordering the source language sentences into the word order of the target language.

Page 1, “Introduction”

terrorism definition (a) A constituent parse tree

Page 2, “Introduction”

(b) Stanford typed dependency parse tree

Page 2, “Introduction”

Figure l: A constituent parse tree and its corresponding Stanford typed dependency parse tree for the same Chinese sentence.

Page 2, “Introduction”

Figure 1 shows a constituent parse tree and its Stanford typed dependency parse tree for the same

Page 2, “Dependency-based Pre-ordering Rule Set”

As shown in the figure, the number of nodes in the dependency parse tree (i.e.

Page 2, “Dependency-based Pre-ordering Rule Set”

9) is much fewer than that in its corresponding constituent parse tree (i.e.

Page 2, “Dependency-based Pre-ordering Rule Set”

Because dependency parse trees are generally more concise than the constituent ones, they can conduct long-distance reorderings in a finer way.

Page 2, “Dependency-based Pre-ordering Rule Set”

I Search the Chinese dependency parse trees in the corpus and rank all of the structures matching the two types of rules respectively according to their frequencies.

Page 3, “Dependency-based Pre-ordering Rule Set”

For each kind of structure, we selected some of the sample dependency parse trees that contained it, tried to restructure the parse trees according to the matched rule and judged the reordered Chinese phrases.

Since dependency parsing is more concise than constituent parsing in describing sentences, some research has used dependency parsing in pre-ordering approaches for language pairs such as Arabic-English (Habash, 2007), and English-SOV languages (Xu et al., 2009; Katz-Brown et al., 2011).

Page 1, “Introduction”

They created a set of pre-ordering rules for constituent parsers for Chinese-English PBSMT.

Page 1, “Introduction”

terrorism definition (a) A constituent parse tree

Page 2, “Introduction”

Figure l: A constituent parse tree and its corresponding Stanford typed dependency parse tree for the same Chinese sentence.

Page 2, “Introduction”

By applying our rules and Wang et al.’s rules, one can use both dependency and constituency parsers for pre-ordering in Chinese-English PB SMT.

Page 2, “Introduction”

Figure 1 shows a constituent parse tree and its Stanford typed dependency parse tree for the same

Page 2, “Dependency-based Pre-ordering Rule Set”

9) is much fewer than that in its corresponding constituent parse tree (i.e.

Page 2, “Dependency-based Pre-ordering Rule Set”

First, we converted the constituent parse trees in the results of the Berkeley Parser into dependency parse trees by employing a tool in the Stanford Parser (Klein and Manning, 2003).

Page 4, “Experiments”

In our opinion, the reason for the great decrease was that the dependency parse trees were more concise than the constituent parse trees in describing sentences and they could also describe the reordering at the sentence level in a finer way.

Page 5, “Experiments”

In contrast, the constituent parse trees were more redundant and they needed more nodes to conduct long-distance reordering.

The purpose of this paper is to introduce a novel dependency-based pre-ordering approach through creating a pre-ordering rule set and applying it to the Chinese-English PBSMT system.

Page 1, “Introduction”

To our knowledge, our manually created pre-ordering rule set is the first Chinese-English dependency-based pre-ordering rule set.

Page 1, “Introduction”

They created a set of pre-ordering rules for constituent parsers for Chinese-English PBSMT.

Page 1, “Introduction”

By applying our rules and Wang et al.’s rules, one can use both dependency and constituency parsers for pre-ordering in Chinese-English PB SMT.

Page 2, “Introduction”

In contrast, our rule set is for Chinese-English PBSMT.

Page 2, “Introduction”

Because there are a lot of language specific decisions that reflect specific aspects of the source language and the language pair combination, our rule set provides a valuable resource for pre-ordering in Chinese-English PBSMT.

Page 2, “Introduction”

Our development set was the official NIST MT evaluation data from 2002 to 2005, consisting of 4476 Chinese-English sentences pairs.

Page 4, “Experiments”

In this paper, we introduced a novel pre-ordering approach based on dependency parsing for a Chinese-English PBSMT system.

Page 5, “Conclusion”

These results indicated that dependency parsing is more effective for conducting pre-ordering for Chinese-English PBSMT.

For training the Berkeley Parser , we used Chinese Treebank (CTB) 7.0.

Page 4, “Experiments”

We conducted our dependency-based pre-ordering experiments on the Berkeley Parser and the Mate Parser (Bohnet, 2010), which were shown to be the two best parsers for Stanford typed dependencies (Che et al., 2012).

Page 4, “Experiments”

First, we converted the constituent parse trees in the results of the Berkeley Parser into dependency parse trees by employing a tool in the Stanford Parser (Klein and Manning, 2003).

Page 4, “Experiments”

Thus, we then extracted the POS information from the results of the Berkeley Parser and used these as the pre-specified POS tags for the Mate Parser.

Page 4, “Experiments”

Finally, we applied our dependency-based pre-ordering rule set to the dependency parse trees created from the converted Berkeley Parser and the Mate Parser, respectively.

Page 4, “Experiments”

Table 1 presents a comparison of the system without pre-ordering, the constituent system using WR07 and two dependency systems employing the converted Berkeley Parser and the Mate Parser, respectively.

dependency relation

Here, both x and y are dependency relations (e.g., plmod or lobj in Figure 2).

Page 2, “Dependency-based Pre-ordering Rule Set”

We define the dependency structure of a dependency relation as the structure containing the dependent word (e. g., the word directly indicated by plmod, or “El?” in Figure 2) and the whole subtree under the dependency relation (all of the words that directly or indirectly depend on the dependent word, or the words under “El?” in Figure 2).

Page 2, “Dependency-based Pre-ordering Rule Set”

Further, we define X and Y as the corresponding dependency structures of the dependency relations x and y, respectively.

Page 2, “Dependency-based Pre-ordering Rule Set”

For example, in Figure 2, let x and y denote plmod and lobj dependency relations , then X represents “El?” and all words under “E'TJ”, Y represents “iii/LEE” and all words under “iii/3E”, and X \Y represents

Page 2, “Dependency-based Pre-ordering Rule Set”

2) Filter out the structures from which it was almost impossible to derive candidate pre-ordering rules because x or y was an “irrespective” dependency relation , for example, root, conj, cc and so on.

Page 3, “Dependency-based Pre-ordering Rule Set”

As a result, we obtained eight pre-ordering rules in total, which can be divided into three dependency relation categories.

language pairs

In statistical machine translation (SMT), syntax-based pre-ordering of the source language is an effective method for dealing with language pairs where there are great differences in their respective word orders.

Page 1, “Abstract”

SMT systems have difficulties translating between distant language pairs such as Chinese and English.

Since dependency parsing is more concise than constituent parsing in describing sentences, some research has used dependency parsing in pre-ordering approaches for language pairs such as Arabic-English (Habash, 2007), and English-SOV languages (Xu et al., 2009; Katz-Brown et al., 2011).

Page 1, “Introduction”

Because there are a lot of language specific decisions that reflect specific aspects of the source language and the language pair combination, our rule set provides a valuable resource for pre-ordering in Chinese-English PBSMT.

word order

In statistical machine translation (SMT), syntax-based pre-ordering of the source language is an effective method for dealing with language pairs where there are great differences in their respective word orders .

Page 1, “Abstract”

The reason for this is that there are great differences in their word orders .

Page 1, “Introduction”

Then, syntactic reordering rules are applied to these parse trees with the goal of reordering the source language sentences into the word order of the target language.

Page 1, “Introduction”

If the reordering produced a Chinese phrase that had a closer word order to that of the English one, this structure would be a candidate pre-ordering rule.

Page 3, “Dependency-based Pre-ordering Rule Set”

In this example, with the application of an nsubj : rcmod rule, the phrase can be translated into “a senior official close to Sharon say”, which has a word order very close to English.

Page 3, “Dependency-based Pre-ordering Rule Set”

A bilingual speaker of Chinese and English looked at an original Chinese phrase and the pre-ordered one with their corresponding English phrase and judged whether the pre-ordering obtained a Chinese phrase that had a closer word order to the English one.

machine translation

In statistical machine translation (SMT), syntax-based pre-ordering of the source language is an effective method for dealing with language pairs where there are great differences in their respective word orders.

Page 1, “Abstract”

This is especially important on the point of the system combination of PBSMT systems, because the diversity of outputs from machine translation systems is important for system combination (Cer et al., 2013).

Page 2, “Introduction”

By using both our rules and Wang et al.’s rules, one can obtain diverse machine translation results because the pre-ordering results of these two rule sets are generally different.