The difference between our work and these preVious studies on topic model for SMT lies in that we adopt topic-based WSI to obtain word senses rather than generic topics and integrate induced word senses into machine translation .

WSI-Based Broad-Coverage Sense Tagger

We want to extend this hypothesis to machine translation by building sense-based translation model upon the HDP-based word sense induction: words with the same meanings tend to be translated in the same way.

We present a hybrid approach to sentence simplification which combines deep semantics and monolingual machine translation to derive simple sentences from complex ones.

Introduction

It is useful as a preprocessing step for a variety of NLP systems such as parsers and machine translation systems (Chandrasekar et al., 1996), sum-marisation (Knight and Marcu, 2000), sentence fusion (Filippova and Strube, 2008) and semantic

Introduction

Machine Translation systems have been adapted to translate complex sentences into $nqfleones(ZhuetaL,2010;VVubbenetaL,2012; Coster and Kauchak, 2011).

Introduction

First, it combines a model encoding probabilities for splitting and deletion with a monolingual machine translation module which handles reordering and substitution.

Related Work

(2010) constructed a parallel corpus (PWKP) of 108,016/114,924 comple)dsimple sentences by aligning sentences from EWKP and SWKP and used the resulting bitext to train a simplification model inspired by syntax-based machine translation (Yamada and Knight, 2001).

Related Work

To account for deletions, reordering and substitution, Coster and Kauchak (2011) trained a phrase based machine translation system on the PWKP corpus while modifying the word alignment output by GIZA++ in Moses to allow for null phrasal alignments.

Related Work

(2012) use Moses and the PWKP data to train a phrase based machine translation system augmented with a post-hoc reranking procedure designed to rank the output based on their dissimilarity from the source.

Simplification Framework

We also depart from Coster and Kauchak (2011) who rely on null phrasal alignments for deletion during phrase based machine translation .

Simplification Framework

Second, the simplified sentence(s) s’ is further simplified to s using a phrase based machine translation system (PBMT+LM).

Simplification Framework

where the probabilities p(s’ |DC), p(s’ |s) and 19(3) are given by the DRS simplification model, the phrase based machine translation model and the language model respectively.

We evaluate our new topic model, ptLDA, and existing topic models—LDA, pLDA, and tLDA—on their ability to induce domains for machine translation and the resulting performance of the translations on standard machine translation metrics.

Instead of using a parallel corpus which should have entity/relation alignment information and is thus difficult to obtain, this paper employs an off-the-shelf machine translator to translate both labeled and unlabeled instances from one language into the other language, forming pseudo parallel corpora.

Abstract

Based on a small number of labeled instances and a large number of unlabeled instances in both languages, our method differs from theirs in that we adopt a bilingual active learning paradigm via machine translation and improve the performance for both languages simultaneously.

Abstract

machine translation , which make use of multilingual corpora to decrease human annotation efforts by selecting highly informative sentences for a newly added language in multilingual parallel corpora.

We present experiments in using discourse structure for improving machine translation evaluation.

Abstract

Then, we show that these measures can help improve a number of existing machine translation evaluation metrics both at the segment- and at the system-level.

Experimental Results

In this section, we explore how discourse information can be used to improve machine translation evaluation metrics.

Experimental Results

Overall, from the experimental results in this section, we can conclude that discourse structure is an important information source to be taken into account in the automatic evaluation of machine translation output.

Introduction

From its foundations, Statistical Machine Translation (SMT) had two defining characteristics: first, translation was modeled as a generative process at the sentence-level.

Introduction

This is demonstrated by the establishment of a recent workshop dedicated to Discourse in Machine Translation (Webber et al., 2013), collocated with the 2013 annual meeting of the Association of Computational Linguistics.

Introduction

The area of discourse analysis for SMT is still nascent and, to the best of our knowledge, no previous research has attempted to use rhetorical structure for SMT or machine translation evaluation.

Related Work

Addressing discourse-level phenomena in machine translation is relatively new as a research direction.

Related Work

The field of automatic evaluation metrics for MT is very active, and new metrics are continuously being proposed, especially in the context of the evaluation campaigns that run as part of the Workshops on Statistical Machine Translation (WMT 2008-2012), and NIST Metrics for Machine Translation Challenge (MetricsMATR), among others.

These have focused on an iterative collaboration between monolingual speakers of the two languages, facilitated with a machine translation system.

Related work

In our setup the poor translations are produced by bilingual individuals who are weak in the target language, and in their experiments the translations are the output of a machine translation system.1 Another significant difference is that the HCI studies assume cooperative participants.

We use the NiuTrans 2 toolkit which adopts GIZA++ (Och and Ney, 2003) and MERT (Och, 2003) to train and tune the machine translation system.

Experiments

This tool scores the outputs in several criterions, while the case-insensitive BLEU-4 (Papineni et al., 2002) is used as the evaluation for the machine translation system.

Experiments

When top 600k sentence pairs are picked out from general-domain corpus to train machine translation systems, the systems perform higher than the General-domain baseline trained on 16 million parallel data.

Neural network language models are often trained by optimizing likelihood, but we would prefer to optimize for a task specific metric, such as BLEU in machine translation .

Abstract

Our best results improve a phrase-based statistical machine translation system trained on WMT 2012 French-English data by up to 2.0 BLEU, and the expected BLEU objective improves over a cross-entropy trained model by up to 0.6 BLEU in a single reference setup.

0 Machine Translation - We script the Google translation API to get even more semantic links.

Knowledge Graph Construction

In total, machine translation provides 53.2% of the total links and establishes connections between 3.5 million vertices.

Related Work

The ready availability of machine translation to and from English has prompted efforts to employ translation for sentiment analysis (Bautin et al., 2008).

Related Work

(2008) demonstrate that machine translation can perform quite well when extending the subjectivity analysis to multilingual environment, which makes it inspiring to replicate their work on lexicon-based sentiment analysis.

This is done using the scripts provided by the Statistical Machine Translation system Moses (Koehn et al., 2007).

Evaluation

In addition to these, the system’s output can be compared against the L2 reference translation(s) using established Machine Translation evaluation metrics.

Introduction

Whereas machine translation generally concerns the translation of whole sentences or texts from one language to the other, this study focusses on the translation of native language (henceforth L1) words and phrases, i.e.

Introduction

the role of the translation model in Statistical Machine Translation (SMT).

System

It has also been used in machine translation studies in which local source context is used to classify source phrases into target phrases, rather than looking them up in a phrase table (Stroppa et al., 2007; Haque et al., 2011).

The first bilingual corpus: OpenMT06 was used in the NIST open machine translation 2006 Evaluation 2.

Complexity Analysis

PatentMT9 is from the shared task of NTCIR-9 patent machine translation .

Complexity Analysis

For the bilingual tasks, the publicly available system of Moses (Koehn et al., 2007) with default settings is employed to perform machine translation , and BLEU (Papineni et al., 2002) was used to evaluate the quality.

Introduction

For example, in machine translation , there are various parallel corpora such as

In statistical machine translation (SMT), syntax-based pre-ordering of the source language is an effective method for dealing with language pairs where there are great differences in their respective word orders.

Introduction

This is especially important on the point of the system combination of PBSMT systems, because the diversity of outputs from machine translation systems is important for system combination (Cer et al., 2013).

Introduction

By using both our rules and Wang et al.’s rules, one can obtain diverse machine translation results because the pre-ordering results of these two rule sets are generally different.