Look before you leap: Some insights into learner evaluation with cross-validation

Abstract

Machine learning is largely an experimental science, of which the evaluation of predictive models is an important aspect. These days, cross-validation is the most widely used method for this task. There are, however, a number of important points that should be taken into account when using this methodology. First, one should clearly state what they are trying to estimate. Namely, a distinction should be made between the evaluation of a model learned on a single dataset, and that of a learner trained on a random sample from a given data population. Each of these two questions requires a different statistical approach and should not be confused with each other. While this has been noted before, the literature on this topic is generally not very accessible. This paper tries to give an understandable overview of the statistical aspects of these two evaluation tasks. We also pose that because of the often limited availability of data, and the difficulty of selecting an appropriate statistical test, it is in some cases perhaps better to abstain from statistical testing, and instead focus on an interpretation of the immediate results.

Related Material

@InProceedings{pmlr-v47-vanwinckelen14a,
title = {Look before you leap: Some insights into learner evaluation with cross-validation},
author = {Gitte Vanwinckelen and Hendrik Blockeel},
booktitle = {Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD},
pages = {3--20},
year = {2015},
editor = {Wilhelmiina Hämäläinen and François Petitjean and I. Webb},
volume = {47},
series = {Proceedings of Machine Learning Research},
address = {Nancy, France},
month = {15 Sep},
publisher = {PMLR},
pdf = {http://proceedings.mlr.press/v47/vanwinckelen14a.pdf},
url = {http://proceedings.mlr.press/v47/vanwinckelen14a.html},
abstract = {Machine learning is largely an experimental science, of which the evaluation of predictive models is an important aspect. These days, cross-validation is the most widely used method for this task. There are, however, a number of important points that should be taken into account when using this methodology. First, one should clearly state what they are trying to estimate. Namely, a distinction should be made between the evaluation of a model learned on a single dataset, and that of a learner trained on a random sample from a given data population. Each of these two questions requires a different statistical approach and should not be confused with each other. While this has been noted before, the literature on this topic is generally not very accessible. This paper tries to give an understandable overview of the statistical aspects of these two evaluation tasks. We also pose that because of the often limited availability of data, and the difficulty of selecting an appropriate statistical test, it is in some cases perhaps better to abstain from statistical testing, and instead focus on an interpretation of the immediate results. }
}

%0 Conference Paper
%T Look before you leap: Some insights into learner evaluation with cross-validation
%A Gitte Vanwinckelen
%A Hendrik Blockeel
%B Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD
%C Proceedings of Machine Learning Research
%D 2015
%E Wilhelmiina Hämäläinen
%E François Petitjean
%E I. Webb
%F pmlr-v47-vanwinckelen14a
%I PMLR
%J Proceedings of Machine Learning Research
%P 3--20
%U http://proceedings.mlr.press
%V 47
%W PMLR
%X Machine learning is largely an experimental science, of which the evaluation of predictive models is an important aspect. These days, cross-validation is the most widely used method for this task. There are, however, a number of important points that should be taken into account when using this methodology. First, one should clearly state what they are trying to estimate. Namely, a distinction should be made between the evaluation of a model learned on a single dataset, and that of a learner trained on a random sample from a given data population. Each of these two questions requires a different statistical approach and should not be confused with each other. While this has been noted before, the literature on this topic is generally not very accessible. This paper tries to give an understandable overview of the statistical aspects of these two evaluation tasks. We also pose that because of the often limited availability of data, and the difficulty of selecting an appropriate statistical test, it is in some cases perhaps better to abstain from statistical testing, and instead focus on an interpretation of the immediate results.

TY - CPAPER
TI - Look before you leap: Some insights into learner evaluation with cross-validation
AU - Gitte Vanwinckelen
AU - Hendrik Blockeel
BT - Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD
PY - 2015/11/27
DA - 2015/11/27
ED - Wilhelmiina Hämäläinen
ED - François Petitjean
ED - I. Webb
ID - pmlr-v47-vanwinckelen14a
PB - PMLR
SP - 3
DP - PMLR
EP - 20
L1 - http://proceedings.mlr.press/v47/vanwinckelen14a.pdf
UR - http://proceedings.mlr.press/v47/vanwinckelen14a.html
AB - Machine learning is largely an experimental science, of which the evaluation of predictive models is an important aspect. These days, cross-validation is the most widely used method for this task. There are, however, a number of important points that should be taken into account when using this methodology. First, one should clearly state what they are trying to estimate. Namely, a distinction should be made between the evaluation of a model learned on a single dataset, and that of a learner trained on a random sample from a given data population. Each of these two questions requires a different statistical approach and should not be confused with each other. While this has been noted before, the literature on this topic is generally not very accessible. This paper tries to give an understandable overview of the statistical aspects of these two evaluation tasks. We also pose that because of the often limited availability of data, and the difficulty of selecting an appropriate statistical test, it is in some cases perhaps better to abstain from statistical testing, and instead focus on an interpretation of the immediate results.
ER -

Vanwinckelen, G. & Blockeel, H.. (2015). Look before you leap: Some insights into learner evaluation with cross-validation. Proceedings of the Workshop on Statistically Sound Data Mining at ECML/PKDD, in PMLR 47:3-20