Abstract

Neural network language models are often trained by optimizing likelihood, but we would prefer to optimize for a task specific metric, such as BLEU in machine translation. We show how a recurrent neural network language model can be optimized towards an expected BLEU loss instead of the usual cross-entropy criterion. Furthermore, we tackle the issue of directly integrating a recurrent network into first pass decoding under an efficient approximation. Our best results improve a phrase based statistical machine translation system trained onWMT2012 French-English data by up to 2.0 BLEU, and the expected BLEU objective improves over a cross entropy trained model by up to 0.6 BLEU in a single reference setup.