When K is 1, then estimation of out-of sample error by validation error is not ‘that’ good because of the penalty term. Thus, the model chosen from this poor estimate might not be the ‘best’ one. This explains Expectation[Out-of-Sample Error of g^-_(m*)] < Expectation[Out-of-Sample Error of g_(m*)]. This situation somewhat improves as K increases.

Also I wanted to validate my explanation of other parts of this exercise.

For part (b), this is what I think:
As K increases, the estimation of out-of sample error by validation error gets better. That explains the initial decrease in Expectation[Out-of-Sample Error of g_(m*)]. Then, as K increases beyond the ‘optimal’ value, the training goes bad, which explains the rise.

Please let me know if my understanding is correct or not.

For part (a), I can't figure out the initial decrease in Expectation[Out-of-Sample Error of g^-_(m*)]. Any clue on this will be great.

Well, I'm not sure about my understanding but here is my guess: (If they are not correct please tell me, especially for (c).)

(a) Because is the hypothesis with smallest among hypotheses, and we have already known that is close to for small and large , hence the initial decrease. As we set out more data for validating, we use less data for training and that leads to worse hypotheses, hence the afterward increase.

(b) The reason for the initial decrease is already discussed above. A note here is that initially is very close to , this is because the size of training set used for outputing is very close to the size of training set used for ouputing . Then it takes a rather long ride for to increase again despite of the worse hypotheses, because those worse and worse hypotheses still lead us to the good enough choice of learning model until they get so worse that they finally lead us to the worse choice of learning model.

(c) A possible case is that when , and have almost the same size of training set hence almost the same chance to be a good final hypothesis, however has the guarantee of small through small while does not have this guarantee. However, as increase, is trained using less and less data compared to , hence 's performance cannot compete with 's anymore.

The contents of this forum are to be used ONLY by readers of the Learning From Data book by Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, and participants in the Learning From Data MOOC by Yaser S. Abu-Mostafa. No part of these contents is to be communicated or made accessible to ANY other person or entity.