Characterizing rational versus exponential learning curves

Abstract

We consider the standard problem of learning a concept from random examples. Here a learning curve can be defined to be the expected error of a learner's hypotheses as a function of training sample size. Haussler, Littlestone and Warmuth have shown that, in the distribution free setting, the smallest expected error a learner can achieve in the worst case over a concept class C converges rationally to zero error (i.e., Θ(l/t) for training sample size t). However, recently Cohn and Tesauro have demonstrated how exponential convergence can often be observed in experimental settings (i.e., average error decreasing as eΘ(−t)). By addressing a simple non-uniformity in the original analysis, this paper shows how the dichotomy between rational and exponential worst case learning curves can be recovered in the distribution free theory. These results support the experimental findings of Cohn and Tesauro: for finite concept classes, any consistent learner achieves exponential convergence, even in the worst case; but for continuous concept classes, no learner can exhibit sub-rational convergence for every target concept and domain distribution. A precise boundary between rational and exponential convergence is drawn for simple concept Chains. Here we show that somewhere dense chains always force rational convergence in the worst case, but exponential convergence can always be achieved for nowhere dense chains.