Define calculations that a LSTM unit performs in a single time step.
This function itself is not a recurrent layer, so that it can not be
directly applied to sequence input. This function is always used in
recurrent_group (see layers.py for more details) to implement attention
mechanism.

Please refer to Generating Sequences With Recurrent Neural Networks
for more details about LSTM. The link goes as follows:
.. _Link: https://arxiv.org/abs/1308.0850

lstm_group is a recurrent layer group version of Long Short Term Memory. It
does exactly the same calculation as the lstmemory layer (see lstmemory in
layers.py for the maths) does. A promising benefit is that LSTM memory
cell states, or hidden states in every time step are accessible to the
user. This is especially useful in attention model. If you do not need to
access the internal states of the lstm, but merely use its outputs,
it is recommended to use the lstmemory, which is relatively faster than
lstmemory_group.

NOTE: In PaddlePaddle’s implementation, the following input-to-hidden
multiplications:
\(W_{xi}x_{t}\) , \(W_{xf}x_{t}\),
\(W_{xc}x_t\), \(W_{xo}x_{t}\) are not done in lstmemory_unit to
speed up the calculations. Consequently, an additional mixed with
full_matrix_projection must be included before lstmemory_unit is called.

A bidirectional_lstm is a recurrent unit that iterates over the input
sequence both in forward and bardward orders, and then concatenate two
outputs form a final output. However, concatenation of two outputs
is not the only way to form the final output, you can also, for example,
just add them together.

Please refer to Neural Machine Translation by Jointly Learning to Align
and Translate for more details about the bidirectional lstm.
The link goes as follows:
.. _Link: https://arxiv.org/pdf/1409.0473v3.pdf

The example usage is:

bi_lstm=bidirectional_lstm(input=[input1],size=512)

参数:

name (basestring) – bidirectional lstm layer name.

input (paddle.v2.config_base.Layer) – input layer.

size (int) – lstm layer size.

return_seq (bool) – If set False, outputs of the last time step are
concatenated and returned.
If set True, the entire output sequences that are
processed in forward and backward directions are
concatenated and returned.

Define calculations that a gated recurrent unit performs in a single time
step. This function itself is not a recurrent layer, so that it can not be
directly applied to sequence input. This function is almost always used in
the recurrent_group (see layers.py for more details) to implement attention
mechanism.

gru_group is a recurrent layer group version of Gated Recurrent Unit. It
does exactly the same calculation as the grumemory layer does. A promising
benefit is that gru hidden states are accessible to the user. This is
especially useful in attention model. If you do not need to access
any internal state, but merely use the outputs of a GRU, it is recommended
to use the grumemory, which is relatively faster.

You maybe see gru_step, grumemory in layers.py, gru_unit, gru_group,
simple_gru in network.py. The reason why there are so many interfaces is
that we have two ways to implement recurrent neural network. One way is to
use one complete layer to implement rnn (including simple rnn, gru and lstm)
with multiple time steps, such as recurrent, lstmemory, grumemory. But,
the multiplication operation \(W x_t\) is not computed in these layers.
See details in their interfaces in layers.py.
The other implementation is to use an recurrent group which can ensemble a
series of layers to compute rnn step by step. This way is flexible for
attenion mechanism or other complex connections.

gru_step: only compute rnn by one step. It needs an memory as input
and can be used in recurrent group.

gru_unit: a wrapper of gru_step with memory.

gru_group: a GRU cell implemented by a combination of multiple layers in
recurrent group.
But \(W x_t\) is not done in group.

gru_memory: a GRU cell implemented by one layer, which does same calculation
with gru_group and is faster than gru_group.

encoded_proj (paddle.v2.config_base.Layer) – attention weight is computed by a feed forward neural
network which has two inputs : decoder’s hidden state
of previous time step and encoder’s output.
encoded_proj is output of the feed-forward network for
encoder’s output. Here we pre-compute it outside
simple_attention for speed consideration.

decoder_state (paddle.v2.config_base.Layer) – hidden state of decoder in previous time step