Lecture: Model Evaluation, Error and Inference (Advanced Data Analysis from an Elementary Point of View)

Lecture 3, Model evaluation: error and inference. Statistical models have
three main uses: as ways of summarizing (reducing, compressing) the data; as
scientific models, facilitating actually scientific inference; and as
predictors. Both summarizing and scientific inference are linked to prediction
(though in different ways), so we'll focus on prediction. In particular for
now we focus on the expected error of prediction, under some particular
measure of error. The distinction between in-sample error and generalization
error, and why the former is almost invariably optimistic about the latter.
Over-fitting. Examples of just how spectacularly one can over-fit really very
harmless data. A brief sketch of the ideas of learning theory and capacity
control. Data-set-splitting as a first attempt at practically controlling
over-fitting. Cross-validation for estimating generalization error and for
model selection. Justifying model-based inferences.