Exercise

Repeating random trials

In the previous exercise, you implemented a cross validation trial. We call it a trial because it involves random assignment of cases to the training and testing sets. The result of the calculation was therefore (somewhat) random.

Since the result of cross validation varies from trial to trial, it's helpful to run many trials so that you can see how much variation there is. As you'll see, this will be a common process as you move through the course.

To simplify things, the cv_pred_error() function in the statisticalModeling package will carry out this repetitive process for you. All you need do is provide one or more models as input to cv_pred_error(); the function will do all the work of creating training and testing sets for each trial and calculating the mean square error for each trial. Easy!

The context for this exercise is to see whether the prediction error calculated from the training data is consistently different from the cross-validated prediction error. To that end, you'll calculate the in-sample error using only the training data. Then, you'll do the cross validation and use a t-test to see if the in-sample error is statistically different from the cross-validated error.