Algorithmic learning in a random world

New York: Springer, 2005

The main topic of this book is conformal prediction,
a method of prediction recently developed in machine learning.
Conformal predictors are among the most accurate methods of machine learning,
and unlike other state-of-the-art methods,
they provide information about their own accuracy and reliability.

The book integrates mathematical theory and revealing experimental work.
It demonstrates mathematically the validity of the reliability claimed by conformal predictors
when they are applied to independent and identically distributed data,
and it confirms experimentally that the accuracy is sufficient for many practical problems.
Later chapters generalize these results to models called repetitive structures,
which originate in the algorithmic theory of randomness and statistical physics.
The approach is flexible enough to incorporate most existing methods of machine learning,
including newer methods such as boosting and support vector machines
and older methods such as nearest neighbors and the bootstrap.

Topics and Features:

Describes how conformal predictors yield accurate and reliable predictions,
complemented with quantitative measures of their accuracy and reliability

Handles both classification and regression problems

Explains how to apply the new algorithms to real-world data sets

Demonstrates the infeasibility of some standard prediction tasks

Explains connections with Kolmogorov's algorithmic randomness,
recent work in machine learning, and older work in statistics

Develops new methods of probability forecasting
and shows how to use them for prediction in causal networks

Researchers in computer science, statistics, and artificial intelligence
will find the book an authoritative and rigorous treatment
of some of the most promising new developments in machine learning.
Practitioners and students in all areas of research
that use quantitative prediction or machine learning
will learn about important new methods.

The book may be purchased directly from Springer
and from many on-line booksellers,
including amazon.com.

Vladimir Vovk and Alex Gammerman
are Professors of Computer Science at Royal Holloway, University of London.
Glenn Shafer is Professor
in the Rutgers School of Business - Newark and New Brunswick.
All three authors are affiliated with the
Computer Learning Research Centre
at Royal Holloway, University of London.

This article is written for statisticians.
We consider the on-line predictive version
of the standard statistical problem of linear regression;
the goal is to predict each consecutive response
given the corresponding explanatory variables
and all the previous observations.
We are mainly interested in prediction intervals rather than point predictions.
The standard treatment of prediction intervals in linear regression analysis
has two drawbacks:
(1) the usual prediction intervals
guarantee that the probability of error
is equal to the nominal significance level epsilon,
but this property per se does not imply that the long-run frequency of error
is close to epsilon;
(2) it is not suitable for prediction of complex systems
as it assumes that the number of observations
exceeds the number of parameters.
We state the book's result
showing that in the on-line protocol
the frequency of error does equal the nominal significance level,
up to statistical fluctuations,
and we describe alternative regression models
in which informative prediction intervals can be found
before the number of observations exceeds the number of parameters.
One of these models,
which only assumes that the observations are independent and identically distributed,
is greatly underused in the statistical theory of regression.
Published
in the Annals of Statistics37:1566–1590 (2009).

This is a written version of the second Computer Journal lecture,
presented at the British Computer Society London Office on 12 June 2006.
The lecture was followed by a discussion
by some of the leading experts in machine learning,
which is also included in the article.
Its definitive version is
published
in the Computer Journal50:151–177 (2007).
This is the description of conformal prediction as given in the article's abstract:

Recent advances in machine learning make it possible
to design efficient prediction algorithms for data sets with huge numbers of parameters.
This article describes a new technique for "hedging" the predictions
output by many such algorithms,
including support vector machines, kernel ridge regression, kernel nearest neighbours,
and by many other state-of-the-art methods.
The hedged predictions for the labels of new objects
include quantitative measures of their own accuracy and reliability.
These measures are provably valid under the assumption of randomness,
traditional in machine learning:
the objects and their labels are assumed to be generated independently
from the same probability distribution.
In particular, it becomes possible to control (up to statistical fluctuations)
the number of erroneous predictions by selecting a suitable confidence level.
Validity being achieved automatically,
the remaining goal of hedged prediction is efficiency:
taking full account of the new objects' features
and other available information to produce as accurate predictions as possible.
This can be done successfully using the powerful machinery of modern machine learning.

A standard assumption in machine learning is the exchangeability of data,
which is equivalent to assuming that the examples are generated
from the same probability distribution independently.
This paper is devoted to testing the assumption of exchangeability on-line:
the examples arrive one by one,
and after receiving each example we would like to have a valid measure of the degree
to which the assumption of exchangeability has been falsified.
Such measures are provided by exchangeability martingales.
We extend known techniques for constructing exchangeability martingales
and show that our new method is competitive with the martingales introduced before.
Finally we investigate the performance of our testing method on two benchmark datasets,
USPS and Statlog Satellite data;
for the former, the known techniques give satisfactory results,
but for the latter our new more flexible method becomes necessary.
This article appears in the Proceedings of ICML 2012.

Conformal predictors are automatically valid
in the sense of having coverage probability equal to or exceeding
a given confidence level.
Inductive conformal predictors are a computationally efficient version of conformal predictors
satisfying the same property of validity.
However, inductive conformal predictors have been only known to control
unconditional coverage probability.
This paper explores various versions of conditional validity
and various ways to achieve them using inductive conformal predictors and their modifications.
These are the data set
(the Spambase data set
contributed by George Forman
to the UCI Machine Learning Repository
with column names) and
R programs used in the experimental section
of the conference version of the paper.
The conference version of this article is
published
in the Proceedings of ACML 2012
(JMLR: Workshop and Conference Proceedings25:475–490, 2012).
The journal version is published in Machine Learning
(ACML 2012 Special Issue)
92:349–376 (2013).

This note introduces the method of cross-conformal prediction,
which is a hybrid of the methods of inductive conformal prediction and cross-validation,
and studies its validity and predictive efficiency empirically.
To appear in the Special Issue
of Annals of Mathematics
and Artificial Intelligence on Conformal Prediction.

This paper continues study, both theoretical and empirical,
of the method of Venn prediction, concentrating on binary prediction problems.
Venn predictors produce probability-type predictions for the labels of test objects
which are guaranteed to be well calibrated under the standard assumption
that the observations are generated independently from the same distribution.
We give a simple formalization and proof of this property.
We also introduce Venn-Abers predictors,
a new class of Venn predictors based on the idea of isotonic regression,
and report promising empirical results both for Venn-Abers predictors
and for their more computationally efficient simplified version.
For the code that we used for running the experiments,
click here.
To appear in the UAI 2014 Proceedings.

This paper,
published in the COPA 2013 Proceedings,
discusses a transductive version of conformal predictors.
This version is computationally inefficient for big test sets,
but it turns out that apparently crude "Bonferroni predictors"
are about as good in their information efficiency
and vastly superior in computational efficiency.

Conformal prediction can be applied on top of a wide range of prediction algorithms,
which often make strong assumptions about the data-generating mechanism.
For the method to be really useful it is desirable
that in the case where the assumptions of the underlying algorithm are satisfied,
the conformal predictor loses little in efficiency as compared
with the underlying algorithm
(whereas being a conformal predictor,
it produces valid results even when those assumptions are not satisfied,
provided the data are IID).
This paper
(to appear in the COLT 2014 Proceedings)
explores the degree to which this additional requirement of efficiency
is satisfied in the case of Bayesian ridge regression.

We study optimal conformity measures for various criteria of efficiency
in an idealized setting.
This leads to an important class of criteria of efficiency that we call probabilistic;
it turns out that the most standard criteria of efficiency used in literature
are not probabilistic.