Neural network language models are often trained by optimizing likelihood, but we would prefer to optimize for a task specific metric, such as BLEU in machine translation.

Abstract

We show how a recurrent neural network language model can be optimized towards an expected BLEU loss instead of the usual cross-entropy criterion.

Expected BLEU Training

We integrate the recurrent neural network language model as an additional feature into the standard log-linear framework of translation (Och, 2003).

Expected BLEU Training

We summarize the weights of the recurrent neural network language model as 6 = {U, W, V} and add the model as an additional feature to the log-linear translation model using the simplified notation 89(10):) 2 8(wt|w1...wt_1,ht_1):

In this paper, we propose a novel approach to learning topic representation for parallel data using a neural network architecture, where abundant topical contexts are embedded via topic relevant monolingual data.

Background: Deep Learning

This technique began raising public awareness in the mid-2000s after researchers showed how a multilayer feed-forward neural network can be effectively trained.

Introduction

These topic-related documents are utilized to learn a specific topic representation for each sentence using a neural network based approach.

Introduction

Neural network is an effective technique for learning different levels of data representations.

Introduction

The levels inferred from neural network correspond to distinct levels of concepts, where high-level representations are obtained from low-level bag-of-words input.

Topic Similarity Model with Neural Network

In this section, we explain our neural network based topic similarity model in detail, as well as how to incorporate the topic similarity features into SMT decoding procedure.

A model that adopts a more general structure provided by an external parse tree is the Recursive Neural Network (RecNN) (Pollack, 1990; Kiichler and Goller, 1996; Socher et al., 2011; Hermann and Blunsom, 2013).

Background

The Recurrent Neural Network (RNN) is a special case of the recursive network where the structure that is followed is a simple linear chain (Gers and Schmidhuber, 2001; Mikolov et al., 2011).

Introduction

Figure 1: Subgraph of a feature graph induced over an input sentence in a Dynamic Convolutional Neural Network .

Introduction

A central class of models are those based on neural networks .

Introduction

These range from basic neural bag-of-words or bag-of-n-grams models to the more structured recursive neural networks and to time-delay neural networks based on convolutional operations (Collobert and Weston, 2008; Socher et al., 2011; Kalchbrenner and Blunsom, 2013b).

In this paper, we propose a novel recursive recurrent neural network (RZNN) to model the end-to-end decoding process for statistical machine translation.

Abstract

RZNN is a combination of recursive neural network and recurrent neural network, and in turn integrates their respective capabilities: (1) new information can be used to generate the next hidden state, like recurrent neural networks, so that language model and translation model can be integrated naturally; (2) a tree structure can be built, as recursive neural networks , so as to generate the translation candidates in a bottom up manner.

Introduction

Deep Neural Network (DNN), which essentially is a multilayer neural network , has regained more and more attentions these years.

Introduction

Recurrent neural networks are leveraged to learn language model, and they keep the history information circularly inside the network for arbitrarily long time (Mikolov et al., 2010).

Introduction

Recursive neural networks , which have the ability to generate a tree structured output, are applied to natural language parsing (Socher et al., 2011), and they are extended to recursive neural tensor networks to explore the compositional aspect of semantics (Socher et al., 2013).

The representation is integrated as features into a neural network that serves as a scorer for an easy-first POS tagger.

Abstract

Parameters of the neural network are trained using guided learning in the second phase.

Easy-first POS tagging with Neural Network

The neural network proposed in Section 3 is used for POS disambiguation by the easy-first POS tagger.

Easy-first POS tagging with Neural Network

At each step, the algorithm adopts a scorer, the neural network in our case, to assign a score to each possible word-tag pair (212, t) , and then selects the highest score one (If), f) to tag (i.e., tag 21“) with f).

Easy-first POS tagging with Neural Network

While previous work (Shen et al., 2007; Zhang and Clark, 2011; Goldberg and Elhadad, 2010) apply guided learning to train a linear classifier by using variants of the percep-tron algorithm, we are the first to combine guided learning with a neural network , by using a margin loss and a modified back-propagation algorithm.

Introduction

We integrate the learned encoder with a set of well-established features for POS tagging (Ratnaparkhi, 1996; Collins, 2002) in a single neural network , which is applied as a scorer to an easy-first POS tagger.

Introduction

To our knowledge, we are the first to investigate guided learning for neural networks .

Neural Network for POS Disambiguation

We integrate the learned WRRBM into a neural network , which serves as a scorer for POS disambiguation.

Neural Network for POS Disambiguation

The main challenge to designing the neural network structure is: on the one hand, we hope that the model can take the advantage of information provided by the learned WRRBM, which reflects general properties of web texts, so that the model generalizes well in the web domain; on the other hand, we also hope to improve the model’s discriminative power by utilizing well-established POS tagging features, such as those of Ratnaparkhi (1996).

Neural Network for POS Disambiguation

Our approach is to leverage the two sources of information in one neural network by combining them though a shared output layer, as shown in Figure 1.

Recently, neural network models for natural language processing tasks have been increasingly focused on for their ability to alleviate the burden of manual feature engineering.

Abstract

In this paper, we propose a novel neural network model for Chinese word segmentation called Max-Margin Tensor Neural Network (MMTNN).

Abstract

Experiments on the benchmark dataset show that our model achieves better performances than previous neural network models and that our model can achieve a competitive performance with minimal feature engineering.

Introduction

Recently, neural network models have been increasingly focused on for their ability to minimize the effort in feature engineering.

Introduction

Workable as previous neural network models seem, a limitation of them to be pointed out is that the tag-tag interaction, tag-character interaction and character-character interaction are not well modeled.

Introduction

In previous neural network models, however, hardly can such interactional effects be fully captured relying only on the simple transition score and the single nonlinear transformation (See section 2).

Specifically, we develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g.

Introduction

To this end, we extend the existing word embedding learning algorithm (Collobert et al., 2011) and develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g.

Introduction

0 We develop three neural networks to learn sentiment-specific word embedding (SSWE) from massive distant-supervised tweets without any manual annotations;

Related Work

propose Recursive Neural Network (RNN) (2011b), matrix-vector RNN (2012) and Recursive Neural Tensor Network (RNTN) (2013b) to learn the compositionality of phrases of any length based on the representation of each pair of children recursively.

Related Work

(2011) that follow the probabilistic document model (Blei et al., 2003) and give an sentiment predictor function to each word, we develop neural networks and map each ngram to the sentiment polarity of sentence.

This model learns the syntax and semantics of the negator’s argument with a recursive neural network .

Experimental results

Furthermore, modeling the syntax and semantics with the state-of-the-art recursive neural network (model 7 and 8) can dramatically improve the performance over model 6.

Experimental results

Note that the two neural network based models incorporate the syntax and semantics by representing each node with a vector.

Experimental results

Note that this is a special case of what the neural network based models can model.

Introduction

This model learns the syntax and semantics of the negator’s argument with a recursive neural network .

Related work

The more recent work of (Socher et al., 2012; Socher et al., 2013) proposed models based on recursive neural networks that do not rely on any heuristic rules.

Related work

In principle neural network is able to fit very complicated functions (Mitchell, 1997), and in this paper, we adapt the state-of-the-art approach described in (Socher et al., 2013) to help understand the behavior of negators specifically.

Semantics-enriched modeling

A recursive neural tensor network (RNTN) is a specific form of feed-forward neural network based on syntactic (phrasal-structure) parse tree to conduct compositional sentiment analysis.

Semantics-enriched modeling

A major difference of RNTN from the conventional recursive neural network (RRN) (Socher et al., 2012) is the use of the tensor V in order to directly capture the multiplicative interaction of two input vectors, although the matrix W implicitly captures the nonlinear interaction between the input vectors.

Semantics-enriched modeling

This is actually an interesting place to extend the current recursive neural network to consider extrinsic knowledge.

The neural network architecture emerged as the best performing approach, and our qualitative analysis revealed that it induced a categorical organization of concepts.

Conclusion

Given the success of NN, we plan to experiment in the future with more sophisticated neural network architectures inspired by recent work in machine translation (Gao et al., 2013) and multimodal deep learning (Srivastava and Salakhut-dinov, 2012).

Experimental Setup

Neural Network (NNet) The last model that we introduce is a neural network with one hidden layer.

Introduction

This is achieved by means of a simple neural network trained to project image-extracted feature vectors to text-based vectors through a hidden layer that can be interpreted as a cross-modal semantic space.

Autoencoders An autoencoder is an unsupervised neural network which is trained to reconstruct a given input from its latent representation (Bengio, 2009).

Autoencoders for Grounded Semantics

Stacked Autoencoders Several (denoising) autoencoders can be used as building blocks to form a deep neural network (Bengio et al., 2007; Vincent et al., 2010).

Conclusions

To the best of our knowledge, our model is novel in its use of attribute-based input in a deep neural network .

Experimental Setup

Finally, we also compare to the word embeddings obtained using Mikolov et al.’s (2011) recurrent neural network based language model.

Related Work

A large body of work has focused on projecting words and images into a common space using a variety of deep learning methods ranging from deep and restricted Boltzman machines (Srivastava and Salakhutdinov, 2012; Feng et al., 2013), to autoencoders (Wu et al., 2013), and recursive neural networks (Socher et al., 2013b).

Related Work

Secondly, our problem setting is different from the former studies, which usually deal with classification tasks and fine-tune the deep neural networks using training data with explicit class labels; in contrast we fine-tune our autoencoders using a semi-supervised criterion.

Taking inspiration from recent work in sentiment analysis that successfully models the compositional aspect of language, we apply a recursive neural network (RNN) framework to the task of identifying the political position evinced by a sentence.

Conclusion

In this paper we apply recursive neural networks to political ideology detection, a problem where previous work relies heavily on bag-of-words models and hand-designed lexica.

Introduction

Building from those insights, we introduce a recursive neural network (RNN) to detect ideological bias on the sentence level.

In this paper, we strive to effectively address the above two shortcomings, and systematically explore the possibility of learning new features using deep (multilayer) neural networks (DNN, which is usually referred under the name Deep Learning) for SMT.

Related Work

(2013) presented a joint language and translation model based on a recurrent neural network which predicts target words based on an unbounded history of both source and target words.

Related Work

(2013) went beyond the log-linear model for SMT and proposed a novel additive neural networks based translation model, which overcome some of the shortcomings suffered by the log-linear model: linearity and the lack of deep interpretation and representation in features.