AI News, rnn: recurrent neural networks

rnn: recurrent neural networks

Note: this repository is deprecated in favor of https://github.com/torch/rnn.

library includes documentation for the following objects: Modules that consider successive calls to forward as different time-steps in a sequence : Modules that forward entire sequences through a decorated AbstractRecurrent instance : Miscellaneous modules and criterions : Criterions used for handling sequential inputs and targets : To install this repository: Note that luarocks intall rnn now installs https://github.com/torch/rnn instead.

The following are example training scripts using this package : If you use rnn in your work, we'd really appreciate it if you could cite the following paper: Léonard, Nicholas, Sagar Waghmare, Yang Wang, and Jin-Hwa Kim.

Most issues can be resolved by updating the various dependencies: If you are using CUDA : And don't forget to update this package : If that doesn't fix it, open and issue on github.

constructor takes a single argument : Argument rho is the maximum number of steps to backpropagate through time (BPTT). Sub-classes

can set this to a large number like 99999 (the default) if they want to backpropagate through the

Calling this method makes it possible to pad sequences with different lengths in the same batch with zero vectors.

other words, it is possible seperate unrelated sequences with a masked element.

So for example : The reverse order implements backpropagation through time (BPTT).

This method brings back all states to the start of the sequence buffers, i.e.

In training mode, the network remembers all previous rho (number of time-steps) states.

The nn.Recurrent(start, input, feedback, [transfer, rho, merge]) constructor takes 6 arguments: An RNN is used to process a sequence of inputs. Each

call to forward keeps a log of the intermediate states (the input and many Module.outputs) and

backward must be called in reverse order of the sequence of calls to forward in order

The step attribute is only reset to 1 when a call to the forget method is made. In

For a simple concise example of how to make use of this module, please consult the simple-recurrent-network.lua training

is actually the recommended approach as it allows RNNs to be stacked and makes the rnn

The actual implementation corresponds to the following algorithm: where W[s-&gt;q] is the weight matrix from s to q, t indexes the time-step, b[1-&gt;q]

the input, forget and output gates, as well as the hidden state are computed at one fellswoop.

This extends the FastLSTM class to enable faster convergence during training by zero-centering the input-to-hidden and hidden-to-hidden transformations. It

The hidden-to-hidden transition of each LSTM cell is normalized according to where the batch normalizing transform is: where hd is a vector of (pre)activations to be normalized, gamma, and beta are model parameters that determine the mean and standard deviation of the normalized activation.

eps is a regularization hyperparameter to keep the division numerically stable and E(hd) and E(σ(hd)) are the estimates of the mean and variance in the mini-batch respectively.

The authors recommend initializing gamma to a small value and found 0.1 to be the value that did not cause vanishing gradients.

To turn on batch normalization during training, do: where momentum is same as gamma in the equation above (defaults to 0.1), eps is defined above and affine is a boolean whose state determines if the learnable affine transform is turned off (false) or on (true, the default).

The actual implementation corresponds to the following algorithm: where W[s-&gt;q] is the weight matrix from s to q, t indexes the time-step, b[1-&gt;q] are the biases leading into q, σ() is Sigmoid, x[t] is the input and s[t] is the output of the module (eq.

examples/s is measured by the training speed at 1 epoch, so, it may have a disk IO bias.

In the benchmark, GRU utilizes a dropout after LookupTable, while BGRU, stands for Bayesian GRUs, uses dropouts on inner connections (naming as Ref.

To implement GRU, a simple module is added, which cannot be possible to build only using nn modules.

y_i = x_i + b, then negate all components if negate is true.

Which is used to implement s[t] = (1-z[t])h[t] + z[t]s[t-1] of GRU (see above Equation (4)).

As in the GRU, the reset gate is computed based on the current input and previous hidden state, and used to compute a new feature vector: where W[a-&gt;b] denotes the weight matrix from activation a to b, t denotes the time step, b[1-&gt;a] is the bias for activation a, and s[t-1]r[t] is the element-wise multiplication of the two vectors.

Unlike in the GRU, rather than computing a single update gate (z[t] in GRU), MuFuRU computes a weighting over an arbitrary number of composition operators.

composition operator is any differentiable operator which takes two vectors of the same size, the previous hidden state, and a new feature vector, and returns a new vector representing the new hidden state.

A proposes 6 additional operators, which all operate element-wise: The weightings of each operation are computed via a softmax from the current input and previous hidden state, similar to the update gate in the GRU.

The produced hidden state is then the element-wise weighted sum of the output of each operation.

where p[t][j] is the weightings for operation j at time step t, and sum in equation 5 is over all operators J.

I could use two sequencers : Using a Recursor, I make the same model with a single Sequencer : Actually, the Sequencer will wrap any non-AbstractRecurrent module automatically, so

increment the self.step attribute by 1, using a shared parameter clone for

build a Simple RNN for language modeling : Note : We could very well reimplement the LSTM module using the newer

A : Regularizing RNNs by Stabilizing Activations This module implements the norm-stabilization criterion: This module regularizes the hidden states of RNNs by minimizing the difference between the L2-norms

The Sequencer requires inputs and outputs to be of shape seqlen x batchsize x featsize :

openning { and closing } illustrate that the time-steps are elements of a Lua table, although it

featsize is 1 as their is only one feature dimension per character and each such character is of size 1. So

the input in this case is a table of seqlen time-steps where each time-step is represented by a batchsize x featsize Tensor.

For example, rnn : an instance of nn.AbstractRecurrent, can forward an input sequence one forward at a time: Equivalently, we can use a Sequencer to forward the entire input sequence at once: We can also forward Tensors instead of Tables : The Sequencer can also take non-recurrent Modules (i.e.

When mode='neither' (the default behavior of the class), the Sequencer will additionally call forget before each call to forward. When