This function calculates stacked Uni-directional GRU with sequences.
This function gets an initial hidden state \(h_0\), an input
sequence \(x\), weight matrices \(W\), and bias vectors \(b\).
This function calculates hidden states \(h_t\) for each time \(t\)
from input \(x_t\).

As the function accepts a sequence, it calculates \(h_t\) for all
\(t\) with one call. Six weight matrices and six bias vectors are
required for each layers. So, when \(S\) layers exists, you need to
prepare \(6S\) weight matrices and \(6S\) bias vectors.

If the number of layers n_layers is greather than \(1\), input
of k-th layer is hidden state h_t of k-1-th layer.
Note that all input variables except first layer may have different shape
from the first layer.

hx (Variable) – Variable holding stacked hidden states.
Its shape is (S,B,N) where S is number of layers and is
equal to n_layers, B is mini-batch size, and N is
dimension of hidden units.

ws (list of list of Variable) – Weight matrices.
ws[i] represents weights for i-th layer.
Each ws[i] is a list containing six matrices.
ws[i][j] is corresponding with W_j in the equation.
Only ws[0][j] where 0<=j<3 is (I,N) shape as they
are multiplied with input variables. All other matrices has
(N,N) shape.

bs (list of list of Variable) – Bias vectors.
bs[i] represnents biases for i-th layer.
Each bs[i] is a list containing six vectors.
bs[i][j] is corresponding with b_j in the equation.
Shape of each matrix is (N,) where N is dimension of
hidden units.

xs (list of Variable) – A list of Variable
holding input values. Each element xs[t] holds input value
for time t. Its shape is (B_t,I), where B_t is
mini-batch size for time t, and I is size of input units.
Note that this function supports variable length sequences.
When sequneces has different lengths, sort sequences in descending
order by length, and transpose the sorted sequence.
transpose_sequence() transpose a list
of Variable() holding sequence.
So xs needs to satisfy
xs[t].shape[0]>=xs[t+1].shape[0].

Returns

This function returns a tuple containing three elements,
hy and ys.

hy is an updated hidden states whose shape is same as hx.

ys is a list of Variable . Each element
ys[t] holds hidden states of the last layer corresponding
to an input xs[t]. Its shape is (B_t,N) where B_t is
mini-batch size for time t, and N is size of hidden
units. Note that B_t is the same value as xs[t].