Generally, the error reduces after more epochs of training,
but might start to increase on the validation data set as the network
starts overfitting the training data. In the default setup, the training
stops after six consecutive increases in validation error, and the
best performance is taken from the epoch with the lowest validation
error.