There are a lot written in StackExchange about train-validation-test split of data set. I am confuse with the following. Assume, I trained model using train set. Then I choose model using validation set and, then, finally, I test my model on the test set and report the "error".

The question: Should one retrain the model after the validation on the whole train+validation set?

1 Answer
1

It's typical (not always) to do the training again, if your model isn't too complex or time consuming to train, and also especially if your data is scarce. This way, you don't throw away any (maybe useful) data. If your model is too complex and hard to train, then it's best leaving the sets as they're.

If you're applying k-fold cross validation, then I believe you'll probably train using the whole training set since after your hyper-parameter tuning or model selection you'll have the entire training set, not a single validation set since validation folds are actually changing and scattered throughout your training set. So, this problem actually appears when you use a static validation set.

But, there are some times that it's better not to do it again. For instance, while neural net training, validation sets are used for early stopping. If you train the network again using the whole set, you won't have an option to check for early stopping.