Citations

...med translation for both French–English and English–French, automatically evaluating translation performance over all systems in terms of Word-Error Rate (WER), Sentence-Error Rate (SER), BLEU score (=-=Papineni, Roukos, Ward, & Zhu, 2002-=-), and Precision and Recall (Turian, Shen, & Melamed, 2003). Our experiments together with their results are described in more detail in the following sections. 5.1 EBMT vs. PBSMT In order to evaluate...

...ariants. To investigate the effect of adding hybrid sub-sentential fragments, we seeded both baseline systems with chunks from the EBMT system and the PBSMT system (extracted following the method of (=-=Och & Ney, 2003-=-)) to create our various ‘hybrid’ EBMT and PBSMT systems. Again we performed translation in both language directions. Incorporating SMT word alignments into the EBMT system results in an average BLEU ...

...MT. (Vogel & Ney, 2000) automatically derive a hierarchical TM from a parallel corpus, comprising a set of transducers encoding a simple grammar. In a similar manner, (Marcu, 2001) uses an SMT model (=-=Brown et al., 1993-=-) to automatically derive a statistical TM. In addition, he adapts the SMT decoder of (Germann et al., 2001) to avail of both the statistical TM resources and the translation model itself. Unlike the ...

... incorporated both lexical and phrasal information, it is only quite recently that SMT practitioners have obtained higher translation quality via phrase-based models (e.g. (Koehn, Och, & Marcu, 2003; =-=Och, 2003-=-)) compared to the older word-based systems (Brown et al., 1990, 1993). This inclusion of chunks as well as word alignments has been so successful that PBSMT has become, by some distance, the most dom...

... is only quite recently that SMT practitioners have obtained higher translation quality via phrase-based models (e.g. (Koehn, Och, & Marcu, 2003; Och, 2003)) compared to the older word-based systems (=-=Brown et al., 1990-=-, 1993). This inclusion of chunks as well as word alignments has been so successful that PBSMT has become, by some distance, the most dominant approach in MT research today. 4.1 Phrasal Alignment Tech...

.... In this paper, we show that similar gains are to be had from constructing a hybrid ‘statistical EBMT’ system capable of outperforming the baseline system of (Way & Gough, 2005). Using the Europarl (=-=Koehn, 2005-=-) training and test sets we show that this time around, although all ‘hybrid’ variants of the EBMT system fall short of the quality achieved by the baseline PBSMT system, merging elements of the marke...

..., the findings of (Way & Gough, 2005), while interesting, are of rather limited value. Accordingly, in (Groves &Way, 2005), we replicated their experiments using the Pharaoh phrase-based SMT Decoder (=-=Koehn, 2004-=-)1 instead of the word-based ISI ReWrite Decoder.2 In general, in (Groves & Way, 2005) we showed that the baseline phrase-based SMT system still fell short of the quality obtained via EBMT for these e...

...f transducers encoding a simple grammar. In a similar manner, (Marcu, 2001) uses an SMT model (Brown et al., 1993) to automatically derive a statistical TM. In addition, he adapts the SMT decoder of (=-=Germann et al., 2001-=-) to avail of both the statistical TM resources and the translation model itself. Unlike the system of (Vogel & Ney, 2000), for which no evaluation is provided, Marcu demonstrates that his hybrid syst...

...matically evaluating translation performance over all systems in terms of Word-Error Rate (WER), Sentence-Error Rate (SER), BLEU score (Papineni, Roukos, Ward, & Zhu, 2002), and Precision and Recall (=-=Turian, Shen, & Melamed, 2003-=-). Our experiments together with their results are described in more detail in the following sections. 5.1 EBMT vs. PBSMT In order to evaluate the performance of PBSMT against our Marker-based EBMT sy...

...isi.edu/∼och/Giza++.html the first place, which may include aligning phrase-structure (sub-)trees (Hearne & Way, 2003) or dependency trees (Watanabe, Kurohashi, & Aramaki, 2003), or using placeables (=-=Brown, 1999-=-) as indicators of chunk boundaries. 3.1 Marker-Based EBMT An alternative approach used in the EBMT system used in our experiments (Gough, 2005; Way & Gough, 2005) is to use a set of closed-class word...

...closed-class words to segment aligned source and target sentences and to derive an additional set of lexical and phrasal resources. This series of research papers is based on the ‘Marker Hypothesis’ (=-=Green, 1979-=-), a universal psycholinguistic constraint which posits that languages are ‘marked’ for syntactic structure at surface level by a closed set of specific lexemes and morphemes. In a pre-processing stag...

...l dictionaries. The recombination process depends on the nature of the examples used in 3 http://www.isi.edu/∼och/Giza++.html the first place, which may include aligning phrase-structure (sub-)trees (=-=Hearne & Way, 2003-=-) or dependency trees (Watanabe, Kurohashi, & Aramaki, 2003), or using placeables (Brown, 1999) as indicators of chunk boundaries. 3.1 Marker-Based EBMT An alternative approach used in the EBMT system...

...ion memory (TM) resources with SMT. (Vogel & Ney, 2000) automatically derive a hierarchical TM from a parallel corpus, comprising a set of transducers encoding a simple grammar. In a similar manner, (=-=Marcu, 2001-=-) uses an SMT model (Brown et al., 1993) to automatically derive a statistical TM. In addition, he adapts the SMT decoder of (Germann et al., 2001) to avail of both the statistical TM resources and th...

...here the translation obtained by merging the extracted examples with the decoder clearly improved the results obtained by the engine alone”. There also exist previous attempts to link TMs with EBMT. (=-=Carl & Hansen, 1999-=-) show that when the fuzzy match score of a TM falls below 80%, translation quality is likely to be higher using EBMT than with TM. (Planas & Furuse, 2003) extend TMs in the direction of EBMT by allow...

...exist previous attempts to link TMs with EBMT. (Carl & Hansen, 1999) show that when the fuzzy match score of a TM falls below 80%, translation quality is likely to be higher using EBMT than with TM. (=-=Planas & Furuse, 2003-=-) extend TMs in the direction of EBMT by allowing subsentential matches, and providing a multilevel structuring of TMs. However, to our knowledge the first research which sets out in detail a comparis...

...dgroves,away}@computing.dcu.ie Abstract (Way & Gough, 2005) demonstrate that their Marker-based EBMT system is capable of outperforming a word-based SMT system trained on reasonably large data sets. (=-=Groves & Way, 2005-=-) take this a stage further and demonstrate that while the EBMT system also outperforms a phrase-based SMT (PBSMT) system, a hybrid ‘example-based SMT’ system incorporating marker chunks and SMT sub-s...

...amed) commercial systems: the hybrid French–English system translated 58% of a 505-sentence test set perfectly, while the commercial systems did so for only 40–42% of the sentences. In similar work, (=-=Langlais & Simard, 2002-=-) also attempt to merge EBMT and SMT resources. Despite the increase in WER when the SMT system is augmented with TM data, the authors observe “many cases where the translation obtained by merging the...

...avenues for further research in this area. 2 Related Work While not directly related to the work we present here, there exists a body of work which merges translation memory (TM) resources with SMT. (=-=Vogel & Ney, 2000-=-) automatically derive a hierarchical TM from a parallel corpus, comprising a set of transducers encoding a simple grammar. In a similar manner, (Marcu, 2001) uses an SMT model (Brown et al., 1993) to...

...on the Europarl Corpus Declan Groves∗ and Andy Way National Centre for Language Technology School of Computing Dublin City University Dublin 9, Ireland {dgroves,away}@computing.dcu.ie Abstract (Way & =-=Gough, 2005-=-) demonstrate that their Marker-based EBMT system is capable of outperforming a word-based SMT system trained on reasonably large data sets. (Groves & Way, 2005) take this a stage further and demonstr...

... depends on the nature of the examples used in 3 http://www.isi.edu/∼och/Giza++.html the first place, which may include aligning phrase-structure (sub-)trees (Hearne & Way, 2003) or dependency trees (=-=Watanabe, Kurohashi, & Aramaki, 2003-=-), or using placeables (Brown, 1999) as indicators of chunk boundaries. 3.1 Marker-Based EBMT An alternative approach used in the EBMT system used in our experiments (Gough, 2005; Way & Gough, 2005) i...

...MT system and an EBMT system from which it is built, our primary goal in this paper is to see whether a new hybrid model of ‘statistical EBMT’ can similarly outperform the baseline systems. Finally, (=-=Aue et al., 2004-=-) observe that their approach of merging dependency treelets with phrase-based SMT may be considered as an instance of “the convergence of statistical and example-based machine translation”. By learni...

...t translation from multiple candidates, but do not explicitly report on the actual contribution of the language model and deal with outputs from multiple MT engines rather than from a single system. (=-=Aramaki, Kurohashi, Kashioka, & Kato, 2005-=-) use a language model, but only to re-order the words in the final translation produced by their system, rather than during the re-ranking of translation candidates. French–English Results The result...