2002

pages: 644, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, USA, December 2002, Parts of this book, including an introduction to kernel methods, can be downloaded here. (book)

Abstract

In the 1990s, a new type of learning algorithm was developed, based on results from statistical learning theory: the Support Vector Machine (SVM). This gave rise to a new class of theoretically elegant learning machines that use a central concept of SVMs-kernelsfor a number of learning tasks. Kernel machines provide a modular framework that can be adapted to different tasks and domains by the choice of the kernel function and the base algorithm. They are replacing neural networks in a variety of fields, including engineering, information retrieval, and bioinformatics.
Learning with Kernels provides an introduction to SVMs and related kernel methods. Although the book begins with the basics, it also includes the latest research. It provides all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms and to understand and apply the powerful algorithms that have been developed over the last few years.

We consider the learning problem of finding a dependency between a general class of objects and another, possibly different, general class of objects. The objects can be for example: vectors, images, strings, trees or graphs. Such a task is made possible by employing similarity measures in both input and output spaces using kernel functions, thus embedding the objects into vector spaces. Output kernels also make it possible to encode prior information and/or invariances in the loss function in an elegant way. We experimentally validate our approach on several tasks: mapping strings to strings, pattern recognition, and reconstruction from partial images.

We construct an geometry framework for any norm Support Vector Machine
(SVM) classifiers. Within this framework, separating hyperplanes, dual descriptions
and solutions of SVM classifiers are constructed by a purely geometric
fashion. In contrast with the optimization theory used in SVM classifiers, we have no complicated computations any more. Each step in our
theory is guided by elegant geometric intuitions.

In this paper we investigate connections between statistical learning
theory and data compression on the basis of support vector machine
(SVM) model selection. Inspired by several generalization bounds we
construct ``compression coefficients'' for SVMs, which measure the
amount by which the training labels can be compressed by some
classification hypothesis. The main idea is to relate the coding
precision of this hypothesis to the width of the margin of the
SVM. The compression coefficients connect well known quantities such
as the radius-margin ratio R^2/rho^2, the eigenvalues of the kernel
matrix and the number of support vectors. To test whether they are
useful in practice we ran model selection experiments on several real
world datasets. As a result we found that compression coefficients can
fairly accurately predict the parameters for which the test error is
minimized.

A number of methods for speeding up Gaussian Process (GP) prediction have been proposed, including the Nystr{\"o}m method of Williams and Seeger (2001). In this paper we focus on two issues (1) the relationship of the Nystr{\"o}m method to the Subset of Regressors method (Poggio and Girosi 1990; Luo and Wahba, 1997) and (2) understanding in what circumstances the Nystr{\"o}m approximation would be expected to provide a good approximation to exact GP regression.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems