The Gated Recurrent Unit lies somewhere between the LSTM and the
RRNN in complexity. Like the RRNN, its hidden state is
updated at each time step to be a linear interpolation between the previous
hidden state, \(h_{t-1}\), and the “target” hidden state, \(h_t\).
The interpolation is modulated by an “update gate” that serves the same
purpose as the rate gates in the RRNN. Like the LSTM, the
target hidden state can also be reset using a dedicated gate. All gates in
this layer are activated based on the current input as well as the previous
hidden state.

The update equations in this layer are largely those given by [Chu14], page
4, except for the addition of a hidden bias term. They are:

Here, \(g(\cdot)\) is the activation function for the layer, and
\(\sigma(\cdot)\) is the logistic sigmoid, which ensures that the two
gates in the layer are limited to the open interval (0, 1). The symbol
\(\odot\) indicates elementwise multiplication.

Symbolic inputs to this layer, given as a dictionary mapping string
names to Theano expressions. See base.Layer.connect().

Returns:

outputs : dict of theano expressions

A map from string output names to Theano expressions for the outputs
from this layer. This layer type generates a “pre” output that gives
the unit activity before applying the layer’s activation function, a
“hid” output that gives the post-activation values before applying
the rate mixing, and an “out” output that gives the overall output.

updates : sequence of update pairs

A sequence of updates to apply to this layer’s state inside a theano
function.