Motivated by the particular problems involved in communicating with "locked-in" paralysed patients, we aim to develop a brain-computer interface that uses auditory stimuli. We describe a paradigm that allows a user to make a binary decision by focusing attention on one of two concurrent auditory stimulus sequences. Using Support Vector Machine classification and Recursive Channel Elimination on the independent components of averaged event-related potentials, we show that an untrained user‘s EEG data can be classified with an encouragingly high level of accuracy. This suggests that it is possible for users to modulate EEG signals in a single trial by the conscious direction of attention, well enough to be useful in BCI.

Tangential neurons in the fly brain are sensitive to the typical optic
flow patterns generated during egomotion. In this study, we examine
whether a simplified linear model based on the organization principles
in tangential neurons can be used to estimate egomotion from the optic
flow. We present a theory for the construction of an estimator
consisting of a linear combination of optic flow vectors that
incorporates prior knowledge both about the distance distribution of
the environment, and about the noise and egomotion statistics of the
sensor. The estimator is tested on a gantry carrying an
omnidirectional vision sensor. The experiments show that the proposed
approach leads to accurate and robust estimates of rotation rates,
whereas translation estimates are of reasonable quality, albeit less
reliable.

Proceedings of The Royal Society of London A, 460(2501):3283-3297, A, November 2004 (article)

Abstract

We describe a fast system for the detection and localization of human faces in images using a nonlinear ‘support-vector machine‘. We approximate the decision surface in terms of a reduced set of expansion vectors and propose a cascaded evaluation which has the property that the full support-vector expansion is only evaluated on the face-like parts of the image, while the largest part of typical images is classified using a single expansion vector (a simpler and more efficient classifier). As a result, only three reduced-set vectors are used, on average, to classify an image patch. Hence, the cascaded evaluation, presented in this paper, offers a thirtyfold speed-up over an evaluation using the full set of reduced-set vectors, which is itself already thirty times faster than classification using all the support vectors.

We consider the problem of transductive inference. In many real-world problems, unlabeled data is far easier to obtain than labeled data. Hence transductive inference is very significant in many practical problems. According to Vapnik's point of view, one should predict the function value only on the given points directly rather than a function defined on the whole space, the latter being a more complicated problem. Inspired by this idea, we develop discrete calculus on finite discrete spaces, and then build discrete regularization. A family of transductive algorithms is naturally derived from this regularization framework. We validate the algorithms on both synthetic and real-world data from text/web categorization to bioinformatics problems. A significant by-product of this work is a powerful way of ranking data based on examples including images, documents, proteins and many other kinds of data.
This talk is mainly based on the followiing contribution: (1) D. Zhou and B. Sch{\"o}lkopf: Transductive Inference with Graphs, MPI Technical report, August, 2004; (2) D. Zhou, B. Sch{\"o}lkopf and T. Hofmann. Semi-supervised Learning on Directed Graphs. NIPS 2004; (3) D. Zhou, O. Bousquet, T.N. Lal, J. Weston and B. Sch{\"o}lkopf. Learning with Local and Global Consistency. NIPS 2003.

34(174.12), 34th Annual Meeting of the Society for Neuroscience (Neuroscience), October 2004 (poster)

Abstract

In the retina of primates three cone types sensitive to short, middle and long wavelengths of light convert photons into electrical signals. Many investigators have presented evidence that, in color normal observers, the signals of cones sensitive to short wavelengths of light (S-cones) do not contribute to the perception of brightness of a colored surface when this is alternated with an achromatic reference (flicker brightness). Other studies indicate that humans do use S-cone signals when performing this task. Common to all these studies is the small number of observers, whose performance data are reported. Considerable variability in the occurrence of cone types across observers has been found, but, to our knowledge, no cone counts exist from larger populations of humans. We reinvestigated how much the S-cones contribute to flicker brightness. 76 color normal observers were tested in a simple psychophysical procedure neutral to the cone type occurence (Teufel & Wehrhahn (2000), JOSA A 17: 994 - 1006). The data show that, in the majority of our observers, S-cones provide input with a negative sign - relative to L- and M-cone contribution - in the task in question. There is indeed considerable between-subject variability such that for 20 out of 76 observers the magnitude of this input does not differ significantly from 0. Finally, we argue that the sign of S-cone contribution to flicker brightness perception by an observer cannot be used to infer the relative sign their contributions to the neuronal signals carrying the information leading to the perception of flicker brightness. We conclude that studies which use only a small number of observers may easily fail to find significant evidence for the small but significant population tendency for the S-cones to contribute to flicker brightness. Our results confirm all earlier results and reconcile their contradictory interpretations.

Identifying the action potentials of individual neurons from extracellular recordings, known as spike sorting, is a challenging problem. We consider the spike sorting problem using a generative model,mixtures of factor analysers, which concurrently performs clustering and feature extraction. The most important advantage of this method is that it quantifies the certainty with which the spikes are classified. This can be used as a means for evaluating the quality of clustering and therefore spike isolation. Using this method, nearly simultaneously occurring spikes can also be modelled which is a hard task for many of the spike sorting methods. Furthermore, modelling the data with a generative model allows us to generate simulated data.

We compare two approaches to the problem of estimating the depth
of a point in space from observing its image position in two
different cameras: 1.~The classical photogrammetric approach
explicitly models the two cameras and estimates their intrinsic
and extrinsic parameters using a tedious calibration procedure;
2.~A generic machine learning approach where the mapping from
image to spatial coordinates is directly approximated by a Gaussian Process regression. Our results show that the generic
learning approach, in addition to simplifying the procedure of
calibration, can lead to higher depth accuracies than classical
calibration although no specific domain knowledge is used.

Machine Learning has become a key enabling technology for many engineering applications, investigating scientific questions and theoretical problems alike. To stimulate discussions and to disseminate new results, a summer school series was started in February 2002, the documentation of which is published as LNAI 2600.
This book presents revised lectures of two subsequent summer schools held in 2003 in Canberra, Australia, and in T{\"u}bingen, Germany. The tutorial lectures included are devoted to statistical learning theory, unsupervised learning, Bayesian inference, and applications in pattern recognition; they provide in-depth overviews of exciting new developments and contain a large number of references.
Graduate students, lecturers, researchers and professionals alike will find this book a useful resource in learning and teaching machine learning.

Considerable progress was recently achieved on semi-supervised
learning, which differs from the traditional supervised learning by
additionally exploring the information of the unlabelled examples.
However, a disadvantage of many existing methods is that it does
not generalize to unseen inputs. This paper investigates learning
methods that effectively make use of both labelled and unlabelled
data to build predictive functions, which are defined on not just
the seen inputs but the whole space. As a nice property, the proposed
method allows effcient training and can easily handle new
test points. We validate the method based on both toy data and
real world data sets.

Modern machine learning techniques are proving to be extremely valuable for the analysis of data in computational biology problems. One branch of machine learning, kernel methods, lends itself particularly well to the difficult aspects of biological data, which include high dimensionality (as in microarray measurements), representation as discrete and structured data (as in DNA or amino acid sequences), and the need to combine heterogeneous sources of information. This book provides a detailed overview of current research in kernel methods and their applications to computational biology.
Following three introductory chaptersan introduction to molecular and computational biology, a short review of kernel methods that focuses on intuitive concepts rather than technical details, and a detailed survey of recent applications of kernel methods in computational biologythe book is divided into three sections that reflect three general trends in current research. The first part presents different ideas for the design of kernel functions specifically adapted to various biological data; the second part covers different approaches to learning from heterogeneous data; and the third part offers examples of successful applications of support vector machine methods.

Motivation: The diffusion kernel is a general method for computing pairwise distances among all nodes in a graph, based on the sum of weighted paths between each pair of nodes. This technique has been used successfully, in conjunction with kernel-based learning methods, to draw inferences from several types of biological networks.
Results: We show that computing the diffusion kernel is equivalent to maximizing the von Neumann entropy, subject to a global constraint on the sum of the Euclidean distances between nodes. This global constraint allows for high variance in the pairwise distances. Accordingly, we propose an alternative, locally constrained diffusion kernel, and we demonstrate that the resulting kernel allows for more accurate support vector machine prediction of protein functional classifications from metabolic and proteinprotein interaction networks.

The recent development of graph kernel functions
has made it possible to apply well-established
machine learning methods to graphs.
However, to allow for analyses that yield a graph as a result, it is necessary to solve the so-called pre-image problem: to reconstruct a graph from its feature space representation induced by the kernel. Here, we suggest a practical solution to this problem.

In this paper we define conditional random fields in reproducing kernel Hilbert spaces and show connections to Gaussian Process classification. More specifically, we prove decomposition results for undirected graphical models and we give constructions for kernels. Finally we present efficient means of solving the optimization problem using reduced rank decompositions and we show how stationarity can be exploited efficiently in the optimization process.

In this paper, we propose to combine an efficient image representation based on local descriptors with a Support Vector Machine classifier in order to perform object categorization. For this purpose, we apply kernels defined on sets of vectors. After testing different combinations of kernel / local descriptors, we have been able to identify a very performant one.

We investigate the problem of defining Hilbertian metrics resp.
positive definite kernels on probability measures, continuing previous work. This type of kernels has shown very good
results in text classification and has a wide range of possible
applications. In this paper we extend the two-parameter family of
Hilbertian metrics of Topsoe such that it now includes all
commonly used Hilbertian metrics on probability measures. This
allows us to do model selection among these metrics in an elegant
and unified way. Second we investigate further our approach to
incorporate similarity information of the probability space into
the kernel. The analysis provides a better understanding of these
kernels and gives in some cases a more efficient way to compute
them. Finally we compare all proposed kernels in two text and one
image classification problem.

This paper gives a survey of results in the mathematical
literature on positive definite kernels and their associated
structures. We concentrate on properties which seem potentially
relevant for Machine Learning and try to clarify some results that
have been misused in the literature. Moreover we consider
different lines of generalizations of positive definite kernels.
Namely we deal with operator-valued kernels and present the
general framework of Hilbertian subspaces of Schwartz which we use
to introduce kernels which are distributions. Finally indefinite
kernels and their associated reproducing kernel spaces are
considered.

We consider the problem of transductive inference. In many real-world problems, unlabeled data is far easier to obtain than labeled data. Hence transductive inference is very significant in many practical problems. According to Vapnik's point of view, one should predict the function value only on the given points directly rather than a function defined on the whole space, the latter being a more complicated problem. Inspired by this idea, we develop discrete calculus on finite discrete spaces, and then build discrete regularization. A family of transductive algorithms is naturally derived from this regularization framework. We validate the algorithms on both synthetic and real-world data from text/web categorization to bioinformatics problems. A significant by-product of this work is a powerful way of ranking data based on examples including images, documents, proteins and many other kinds of data.

There exist many different generalization error bounds for classification. Each of these bounds contains an improvement over the others for certain situations. Our goal is to combine these different improvements into a single bound. In particular we combine the PAC-Bayes approach introduced by McAllester, which is interesting for averaging classifiers, with the optimal union bound provided by the generic chaining technique developed by Fernique and Talagrand. This combination is quite natural since the generic chaining is based on the notion of majorizing measures, which can be considered as priors on the set of classifiers, and such priors also arise in the PAC-bayesian setting.

We report and compare the performance of different learning algorithms based on data from cortical recordings. The task is to predict the orientation of visual stimuli from the activity of a population of simultaneously recorded neurons. We compare several ways of improving the coding of the input (i.e., the spike data) as well as of the output (i.e., the orientation), and report the results obtained using different kernel algorithms.

We generalise the Gaussian process (GP) framework for regression by learning a nonlinear transformation of the GP outputs. This allows for non-Gaussian processes and non-Gaussian noise. The learning algorithm chooses a nonlinear transformation such that transformed data is
well-modelled by a GP. This can be seen as including a preprocessing transformation as an integral part of the probabilistic modelling problem, rather than as an ad-hoc step. We demonstrate on several real regression problems that learning the transformation can lead to significantly better performance than using a regular GP, or a GP with
a fixed transformation.

The Google search engine has enjoyed a huge success with its web page ranking algorithm, which exploits global, rather than local, hyperlink structure of the web using random walks. Here we propose a simple universal ranking algorithm for data lying in the Euclidean space, such as text or image data. The core idea of our method is to rank the data with respect to the intrinsic manifold structure collectively revealed by a great amount of data. Encouraging experimental results from synthetic, image, and text data illustrate the validity of our method.

Designing a Brain Computer Interface (BCI) system one can choose from a variety of
features that may be useful for classifying brain activity during a mental task.
For the special case of classifying EEG signals we propose the usage of the state
of the art feature selection algorithms Recursive Feature Elimination and Zero-Norm Optimization
which are based on the training of Support Vector Machines (SVM).
These algorithms can provide more accurate solutions than standard filter methods for feature selection.
We adapt the methods for the purpose of selecting EEG channels.
For a motor imagery paradigm we show that the number of used channels can be
reduced significantly without increasing the
classification error.
The resulting best channels agree well with the expected underlying cortical activity patterns during the mental tasks.
Furthermore we show how time dependent task specific information can be visualized.

The goal of this article is to develop a framework for large margin classification in metric spaces. We want to find a generalization of linear decision functions for metric spaces and define a corresponding notion of margin such that the decision function separates the training points with a large margin. It will turn out that using Lipschitz functions as decision functions, the inverse of the Lipschitz constant can be interpreted as the size of a margin. In order to construct a clean mathematical setup we isometrically embed the given metric space into a Banach space and the space of Lipschitz functions into its dual space. To analyze the resulting algorithm, we prove several representer theorems. They state that there always exist solutions of the Lipschitz classifier which can be expressed in terms of distance functions to training points. We provide generalization bounds for Lipschitz classifiers in terms of the Rademacher complexities of some Lipschitz function classes. The generality of our approach can be seen from the fact that several well-known algorithms are special cases of the Lipschitz classifier, among them the support vector machine, the linear programming machine, and the 1-nearest neighbor classifier.

We exploit some useful properties of Gaussian process (GP) regression models for reinforcement learning in continuous state spaces and discrete time. We demonstrate how the GP model allows evaluation of the value function in closed form. The resulting policy iteration algorithm is demonstrated on a simple problem with a two dimensional
state space. Further, we speculate that the intrinsic ability of GP models to characterise distributions of functions would allow the method to capture entire distributions over future values instead of merely their expectation, which has traditionally been the focus of
much of reinforcement learning.

We consider the general problem of learning from labeled and
unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data.

The annual Neural Information Processing (NIPS) conference is the flagship meeting on neural computation. It draws a diverse group of attendeesphysicists, neuroscientists, mathematicians, statisticians, and computer scientists. The presentations are interdisciplinary, with contributions in algorithms, learning theory, cognitive science, neuroscience, brain imaging, vision, speech and signal processing, reinforcement learning and control, emerging technologies, and applications. Only thirty percent of the papers submitted are accepted for presentation at NIPS, so the quality is exceptionally high. This volume contains all the papers presented at the 2003 conference.

We consider the problem of reconstructing patterns from a feature map. Learning algorithms using kernels to operate in a reproducing kernel Hilbert space (RKHS) express their solutions in terms of input points mapped into the RKHS. We
introduce a technique based on kernel principal component analysis and regression to reconstruct corresponding patterns in the input space (aka pre-images) and review its performance in several applications requiring the construction of pre-images. The introduced technique avoids
difficult and/or unstable numerical optimization, is easy to
implement and, unlike previous methods, permits the computation of pre-images in discrete input spaces.

We address in this paper the question of how the knowledge of the marginal distribution $P(x)$ can be incorporated in a learning algorithm. We suggest three theoretical methods for taking into account this distribution for regularization and provide links to existing graph-based semi-supervised learning algorithms. We also propose practical implementations.

We attempt to understand visual classification in humans using both psychophysical and machine learning techniques. Frontal views of human faces were used for a gender classification task. Human subjects classified the faces and their gender judgment, reaction time and confidence
rating were recorded. Several hyperplane learning algorithms were used on the same classification task using the Principal Components of the texture and flowfield representation of the faces. The classification performance
of the learning algorithms was estimated using the face database with the true gender of the faces as labels, and also with the gender estimated by the subjects. We then correlated the human responses to the distance of the stimuli to the separating hyperplane of the learning algorithms. Our results suggest that human classification can be modeled by some hyperplane algorithms in the feature space we used. For classification, the brain needs more processing for stimuli close to that hyperplane than for those further away.

A common way of image denoising is to project a noisy image to the subspace of admissible images made for instance by PCA. However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion. We propose
a new method to identify the noisy pixels by 1-norm penalization and update the identified pixels only. The identification and updating of noisy pixels are formulated as one linear program which can be solved efficiently. Especially, one can apply the ν-trick to directly specify the fraction of pixels to be reconstructed. Moreover, we extend the linear program to be able to exploit prior knowledge that occlusions often appear in contiguous blocks (e.g. sunglasses on faces). The basic idea is to penalize boundary points and interior points of the occluded area differently. We are able to show the ν-property also for this extended LP leading a method which is easy to use. Experimental results impressively demonstrate the power of our approach.

A key issue in supervised protein classification is the representation of input sequences of amino acids. Recent work using string kernels for protein data has achieved state-of-the-art classification performance. However, such representations are based only on labeled data --- examples with known 3D structures, organized into structural classes --- while in practice, unlabeled data is far more plentiful. In this work, we develop simple and scalable cluster kernel techniques for incorporating unlabeled data into the representation of protein sequences. We show that our methods greatly improve the classification performance of string kernels and outperform standard approaches for using unlabeled data, such as adding close homologs of the positive examples to the training data. We achieve equal or superior performance to previously presented cluster kernel methods while achieving far greater computational efficiency.

This paper presents a method for single-frame image super-resolution
using an unsupervised learning technique. The required prior
knowledge about the high-resolution images is obtained from
Kernel Principal Component Analysis (KPCA). The original form of
KPCA, however, can be only applied to strongly restricted image
classes due to the limited number of training examples that can be
processed. We therefore propose a new iterative method for performing
KPCA, the {em Kernel Hebbian Algorithm}. By kernelizing the
Generalized Hebbian Algorithm, one can iteratively estimate the Kernel
Principal Components with only linear order memory complexity. The
resulting super-resolution algorithm shows a comparable performance to
the existing supervised methods on images containing faces and natural
scenes.

Often you want to execute a command not only on one computer, but on several at once. For example, you might want to report the current statistics on a group of managed servers or update all of your web servers at once.

Functional genomics represents a new challenging approach in order to analyze complex diseases such as osteoarthritis on a molecular level. The characterization of the molecular changes of the cartilage cells, the chondrocytes, enables a better understanding of the pathomechanisms of the disease. In particular, the identification and characterization of new target molecules for therapeutic intervention is of interest. Also, potential molecular markers for diagnosis and monitoring of osteoarthritis contribute to a more appropriate patient management. The DNA-microarray technology complements (but does not replace) biochemical and biological research in new disease-relevant genes. Large-scale functional genomics will identify molecular networks such as yet identified players in the anabolic-catabolic balance of articular cartilage as well as disease-relevant intracellular signaling cascades so far rather unknown in articular chondrocytes. However, at the moment it is also important to recognize the limitations of the microarray technology in order to avoid over-interpretation of the results. This might lead to misleading results and prevent to a significant extent a proper use of the potential of this technology in the field of osteoarthritis.

In this paper we investigate connections between statistical learning
theory and data compression on the basis of support vector machine (SVM)
model selection. Inspired by several generalization bounds we construct
"compression coefficients" for SVMs which measure the amount by which the
training labels can be compressed by a code built from the separating
hyperplane. The main idea is to relate the coding precision to geometrical
concepts such as the width of the margin or the shape of the data in the
feature space. The so derived compression coefficients combine well known
quantities such as the radius-margin term R^2/rho^2, the eigenvalues of the
kernel matrix, and the number of support vectors. To test whether they are
useful in practice we ran model selection experiments on benchmark data
sets. As a result we found that compression coefficients can fairly
accurately predict the parameters for which the test error is minimized.

We attempt to understand visual classication in humans using both psychophysical and machine
learning techniques. Frontal views of human faces were used for a gender classication
task. Human subjects classied the faces and their gender judgment, reaction time (RT) and
condence rating (CR) were recorded for each face. RTs are longer for incorrect answers than
for correct ones, high CRs are correlated with low classication errors and RTs decrease as the
CRs increase. This results suggest that patterns difcult to classify need more computation by
the brain than patterns easy to classify. Hyperplane learning algorithms such as Support Vector
Machines (SVM), Relevance Vector Machines (RVM), Prototype learners (Prot) and K-means
learners (Kmean) were used on the same classication task using the Principal Components
of the texture and oweld representation of the faces. The classication performance of the
learning algorithms was estimated using the face database with the true gender of the faces as
labels, and also with the gender estimated by the subjects. Kmean yield a classication performance
close to humans while SVM and RVM are much better. This surprising behaviour
may be due to the fact that humans are trained on real faces during their lifetime while they
were here tested on articial ones, while the algorithms were trained and tested on the same
set of stimuli. We then correlated the human responses to the distance of the stimuli to the
separating hyperplane (SH) of the learning algorithms. On the whole stimuli far from the SH
are classied more accurately, faster and with higher condence than those near to the SH if
we pool data across all our subjects and stimuli. We also nd three noteworthy results. First,
SVMs and RVMs can learn to classify faces using the subjects' labels but perform much better
when using the true labels. Second, correlating the average response of humans (classication
error, RT or CR) with the distance to the SH on a face-by-face basis using Spearman's rank
correlation coefcients shows that RVMs recreate human performance most closely in every
respect. Third, the mean-of-class prototype, its popularity in neuroscience notwithstanding, is
the least human-like classier in all cases examined.

We explored several ways to improve the efficiency of measuring psychometric functions without resorting to adaptive procedures. a) The number m of alternatives in an m-alternative-forced-choice (m-AFC) task improves the efficiency of the method of constant stimuli. b) When alternatives are presented simultaneously on different positions on a screen rather than sequentially time can be saved and memory load for the subject can be reduced. c) A touch-screen can further help to make the experimental procedure more intuitive. We tested these ideas in the measurement of contrast sensitivity and compared them to results obtained by sequential presentation in two-interval-forced-choice (2-IFC). Qualitatively all methods (m-AFC and 2-IFC) recovered the characterictic shape of the contrast sensitivity function in three subjects. The m-AFC paradigm only took about 60% of the time of the 2-IFC task. We tried m=2,4,8 and found 4-AFC to give the best model fits and 2-AFC to have the least bias.

In face detection, support vector machines (SVM) and neural networks (NN) have been shown
to outperform most other classication methods. While both approaches are learning-based,
there are distinct advantages and drawbacks to each method: NNs are difcult to design and
train but can lead to very small and efcient classiers. In comparison, SVM model selection
and training is rather straightforward, and, more importantly, guaranteed to converge to
a globally optimal (in the sense of training errors) solution. Unfortunately, SVM classiers
tend to have large representations which are inappropriate for time-critical image processing
applications.
In this work, we examine various existing and new methods for simplifying support vector
decision rules. Our goal is to obtain efcient classiers (as with NNs) while keeping the numerical
and statistical advantages of SVMs. For a given SVM solution, we compute a cascade
of approximations with increasing complexities. Each classier is tuned so that the detection
rate is near 100%. At run-time, the rst (simplest) detector is evaluated on the whole image.
Then, any subsequent classier is applied only to those positions that have been classied as
positive throughout all previous stages. The false positive rate at the end equals that of the
last (i.e. most complex) detector. In contrast, since many image positions are discarded by
lower-complexity classiers, the average computation time per patch decreases signicantly
compared to the time needed for evaluating the highest-complexity classier alone.

During the last 20 years several paradigms for Brain Computer Interfaces have been proposed
see [1] for a recent review. They can be divided into (a) stimulus-driven paradigms, using e.g.
event-related potentials or visual evoked potentials from an EEG signal, and (b) patient-driven
paradigms such as those that use premotor potentials correlated with imagined action, or slow
cortical potentials (e.g. [2]).
Our aim is to develop a stimulus-driven paradigm that is applicable in practice to patients.
Due to the unreliability of visual perception in locked-in patients in the later stages of disorders
such as Amyotrophic Lateral Sclerosis, we concentrate on the auditory modality. Speci-
cally, we look for the effects, in the EEG signal, of selective attention to one of two concurrent
auditory stimulus streams, exploiting the increased activation to attended stimuli that is seen
under some circumstances [3].
We present the results of our preliminary experiments on normal subjects. On each of
400 trials, two repetitive stimuli (sequences of drum-beats or other pulsed stimuli) could be
heard simultaneously. The two stimuli were distinguishable from one another by their acoustic
properties, by their source location (one from a speaker to the left of the subject, the other from
the right), and by their differing periodicities. A visual cue preceded the stimulus by 500 msec,
indicating which of the two stimuli to attend to, and the subject was instructed to count the
beats in the attended stimulus stream. There were up to 6 beats of each stimulus: with equal
probability on each trial, all 6 were played, or the fourth was omitted, or the fth was omitted.
The 40-channel EEG signals were analyzed ofine to reconstruct which of the streams was
attended on each trial. A linear Support Vector Machine [4] was trained on a random subset of
the data and tested on the remainder. Results are compared from two types of pre-processing
of the signal: for each stimulus stream, (a) EEG signals at the stream's beat periodicity are
emphasized, or (b) EEG signals following beats are contrasted with those following missing
beats. Both forms of pre-processing show promising results, i.e. that selective attention to
one or the other auditory stream yields signals that are classiable signicantly above chance
performance. In particular, the second pre-processing was found to be robust to reduction in
the number of features used for classication (cf. [5]), helping us to eliminate noise.

In a number of models of depth cue combination the depth percept is constructed via a weighted
average combination of independent depth estimations. The inuence of each cue in such average
depends on the reliability of the source of information [1,5]. In particular, Ernst and Banks
(2002) formulate such combination as that of the minimum variance unbiased estimator that
can be constructed from the available cues. We have observed systematic differences in slant
discrimination performance of human observers when different types of textures were used as
cue to slant [4]. If the depth percept behaves as described above, our measurements of the
slopes of the psychometric functions provide the predicted weights for the texture cue for the
ranked texture types. However, the results for slant discrimination obtained when combining
these texture types with object motion results are difcult to reconcile with the minimum variance
unbiased estimator model [3]. This apparent failure of such model might be explained by
the existence of a coupling of texture and motion, violating the assumption of independence of
cues. Hillis, Ernst, Banks, and Landy (2002) [2] have shown that while for between-modality
combination the human visual system has access to the single-cue information, for withinmodality
combination (visual cues) the single-cue information is lost. This suggests a coupling
between visual cues and independence between visual and haptic cues. Then, in the present
study we combined the different texture types with haptic information in a slant discrimination
task, to test whether in the between-modality condition these cues are combined as predicted
by an unbiased, minimum variance estimator model. The measured weights for the cues were
consistent with a combination rule sensitive to the reliability of the sources of information, but
did not match the predictions of a statistically optimal combination.

In face detection, support vector machines (SVM) and neural networks (NN) have been shown
to outperform most other classication methods. While both approaches are learning-based,
there are distinct advantages and drawbacks to each method: NNs are difcult to design and
train but can lead to very small and efcient classiers. In comparison, SVM model selection
and training is rather straightforward, and, more importantly, guaranteed to converge to
a globally optimal (in the sense of training errors) solution. Unfortunately, SVM classiers
tend to have large representations which are inappropriate for time-critical image processing
applications.
In this work, we examine various existing and new methods for simplifying support vector
decision rules. Our goal is to obtain efcient classiers (as with NNs) while keeping the numerical
and statistical advantages of SVMs. For a given SVM solution, we compute a cascade
of approximations with increasing complexities. Each classier is tuned so that the detection
rate is near 100%. At run-time, the rst (simplest) detector is evaluated on the whole image.
Then, any subsequent classier is applied only to those positions that have been classied as
positive throughout all previous stages. The false positive rate at the end equals that of the
last (i.e. most complex) detector. In contrast, since many image positions are discarded by
lower-complexity classiers, the average computation time per patch decreases signicantly
compared to the time needed for evaluating the highest-complexity classier alone.

A Brain Computer Interface (BCI) system allows the direct interpretation of brain activity
patterns (e.g. EEG signals) by a computer. Typical BCI applications comprise spelling aids
or environmental control systems supporting paralyzed patients that have lost motor control
completely.
The design of an EEG based BCI system requires good answers for the problem of selecting
useful features during the performance of a mental task as well as for the problem of classifying
these features.
For the special case of choosing appropriate EEG channels from several available channels,
we propose the application of variants of the Support Vector Machine (SVM) for both
problems. Although these algorithms do not rely on prior knowledge they can provide more
accurate solutions than standard lter methods [1] for feature selection which usually incorporate
prior knowledge about neural activity patterns during the performed mental tasks.
For judging the importance of features we introduce a new relevance measure and apply it
to EEG channels. Although we base the relevance measure for this purpose on the previously
introduced algorithms, it does in general not depend on specic algorithms but can be derived
using arbitrary combinations of feature selectors and
classifiers.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems