I was listening to Prof. Mostafa’s lecture on validation, and at one point he mentioned that we can have more than one validation set (if our data permits) to make multiple choices on different models and/or parameters. That got me thinking about how one may go about doing that.

Say we have just enough data to carve out a training set and two validation sets. We have two decisions to make, one is which order of polynomial to fit the data with (2nd order, 3rd order, or 5th order), and the other is what value of the regularization parameter to use (lambda equals 0.01, 0.1, 1 or 2). What is the best way of doing this? I can think of the following ways, and would appreciate any feedback on which approach, if any, might be the best.

1. We have a total of 12 combinations of choices. So we work out the 12 candidate hypotheses on the training set, and combine the 2 validation sets into one set to choose the best hypothesis with. This may “contaminate” the validation set a bit more than we want, as the number of hypotheses is not small. We then combine the training and the validation set to produce the final hypothesis with the best polynomial order and the best lambda.

2. We use the training data to first produce 3 hypotheses based on polynomial order, with a fixed value of lambda chosen from among its four possible values. We decide the best polynomial order using the first validation set. Then we use the training set again with the just decided polynomial order to produce 4 hypotheses based on the values of lambda. We use the second validation set to decide the best value of lambda. Finally, we combine all the data to come up with the best hypothesis with the chosen polynomial order and the chose lambda value.

3. We do the same as option 2, but reverse the order of deciding - find lambda first, then the polynomial order.

4. We do the same as options 2 or 3, but after the first decision, we combine the training set and the first validation set to produce a second, bigger training set for the second decision.

These are the four options I can think of. In option 2, I am not sure if randomly choosing a value of lambda while we are deciding the best polynomial order is wise. Also not sure if using the same training set again and again is OK (wouldn't it get horribly contaminated?)

I think I will first get my try with 3rd order polynomial model on the training set and then test it on the first "validation" set. If I am satisfied with the test result then I stop the process and choose 3rd order one. If I find the 3rd order is overfitting (low training error, high test error) I will get my next try with 2nd order, however if I find the 3rd order is underfitting (high training error, high test error) I will get my next try with 5th order. As you may notice, I am trying to use binary search algorithm here. Obviously, I find it's hard to determine whether a result is satisfied enough, to me it looks like it depends on the specific application situation.

Whatever the model is my next try, I will obviously train it all the training set and then validate it on the first validation set (now the first validation set is no test set anymore).

In the case my next try is 2nd order:

If am satisfied with the result then I will stop here and choose 2nd order one.

If I find the 2nd order is overfitting then I will try to regularize the 2nd order.

If I find the 2nd order is underfitting then I will try to regularize the 3rd order.

In the case my next try is 5th order, I can make the similar decisions.

Now if I am going to have to regularized one of the three models, I will also combine the training set with the first validation set, use the binary search algorithm as described above on the regularization parameters.

In this case, maximum number of combinations of H and lambda need to be validated using binary search is quite small: Only 5 combinations.

Finally I will have the best choice of model and parameter in my thought. It would be great if I have a test set to test this final choice of mine.

--------------------------------------------------------------------

If I have a choice on how to divide the dataset, I will divide it into two parts: One part for training and one part for testing. The test set will be locked and will be used for only one final hypothesis to report the performance (with tight bound) of the final hypothesis to the customer.

Then I will divide the training set in to two parts: One part for the first cross-validation to choose model and second part for the second cross-validation to choose regularization parameter. For this process, I will use the same binary search idea as described above.

--------------------------------------------------------------------

The book also has this statement on validation:

Quote:

In the case of validation, making a choice for few parameters does not overly contaminate the validation estimate of , even if the VC guarantee for these estimates is too weak.

The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.