Fisher Discriminant Analysis with Kernels

Abstract

A non-linear classification technique based on Fisher9s discriminant is proposed. The main ingredient is the kernel trick which allows the efficient computation of Fisher discriminant in feature space. The linear classification in feature space corresponds to a (powerful) non-linear decision function in input space. Large scale simulations demonstrate the competitiveness of our approach. DISCRIMINANT ANALYSIS In classification and other data analytic tasks it is often necessary to utilize pre-processing on the data before applying the algorithm at hand and it is common to first extract features suitable for the task to solve. Feature extraction for classification differs significantly from feature extraction for describing data. For example PCA finds directions which have minimal reconstruction error by describing as much variance of the data as possible with m orthogonal directions. Considering the first directions they need not (and in practice often will not) reveal the class structure that we need for proper classification. Discriminant analysis addresses the following question: Given a data set with two classes, say, which is the best feature or feature set (either linear or non-linear) to discriminate the two classes? Classical approaches tackle this question by starting with the (theoretically) optimal Bayes classifier and, by assuming normal distributions for the classes, standard algorithms like quadratic or linear discriminant analysis, among them the famous Fisher discriminant, can be derived (e.g. [5]). Of course any other model different from a Gaussian for the class distributions could be assumed , this, however, often sacrifices the simple closed form solution. Several modifications towards more general features have been proposed (e.g. [SI); for an introduction and review on existing methods see e.g. In this work we propose to use the kernel idea [l], originally applied in Support Vector Machines [19, 14]), kernel PCA [16] and other kernel based algorithms (cf. [14]) to define a non-linear generalization of Fisher's dis-criminant. Our method uses kernel feature spaces yielding a highly flexible