The network is trained using backpropagation through time and with a batch size of 16 and learning rate of 0.01.During backpropagation this all happens in reverse, however the supplied memory is only updated when backpropagating (through time) the first item in the sequence.