Tangential neurons in the fly brain are sensitive to the typical optic
flow patterns generated during egomotion. In this study, we examine
whether a simplified linear model based on the organization principles
in tangential neurons can be used to estimate egomotion from the optic
flow. We present a theory for the construction of an estimator
consisting of a linear combination of optic flow vectors that
incorporates prior knowledge both about the distance distribution of
the environment, and about the noise and egomotion statistics of the
sensor. The estimator is tested on a gantry carrying an
omnidirectional vision sensor. The experiments show that the proposed
approach leads to accurate and robust estimates of rotation rates,
whereas translation estimates are of reasonable quality, albeit less
reliable.

Proceedings of The Royal Society of London A, 460(2501):3283-3297, A, November 2004 (article)

Abstract

We describe a fast system for the detection and localization of human faces in images using a nonlinear ‘support-vector machine‘. We approximate the decision surface in terms of a reduced set of expansion vectors and propose a cascaded evaluation which has the property that the full support-vector expansion is only evaluated on the face-like parts of the image, while the largest part of typical images is classified using a single expansion vector (a simpler and more efficient classifier). As a result, only three reduced-set vectors are used, on average, to classify an image patch. Hence, the cascaded evaluation, presented in this paper, offers a thirtyfold speed-up over an evaluation using the full set of reduced-set vectors, which is itself already thirty times faster than classification using all the support vectors.

Machine Learning has become a key enabling technology for many engineering applications, investigating scientific questions and theoretical problems alike. To stimulate discussions and to disseminate new results, a summer school series was started in February 2002, the documentation of which is published as LNAI 2600.
This book presents revised lectures of two subsequent summer schools held in 2003 in Canberra, Australia, and in T{\"u}bingen, Germany. The tutorial lectures included are devoted to statistical learning theory, unsupervised learning, Bayesian inference, and applications in pattern recognition; they provide in-depth overviews of exciting new developments and contain a large number of references.
Graduate students, lecturers, researchers and professionals alike will find this book a useful resource in learning and teaching machine learning.

Motivation: The diffusion kernel is a general method for computing pairwise distances among all nodes in a graph, based on the sum of weighted paths between each pair of nodes. This technique has been used successfully, in conjunction with kernel-based learning methods, to draw inferences from several types of biological networks.
Results: We show that computing the diffusion kernel is equivalent to maximizing the von Neumann entropy, subject to a global constraint on the sum of the Euclidean distances between nodes. This global constraint allows for high variance in the pairwise distances. Accordingly, we propose an alternative, locally constrained diffusion kernel, and we demonstrate that the resulting kernel allows for more accurate support vector machine prediction of protein functional classifications from metabolic and proteinprotein interaction networks.

Designing a Brain Computer Interface (BCI) system one can choose from a variety of
features that may be useful for classifying brain activity during a mental task.
For the special case of classifying EEG signals we propose the usage of the state
of the art feature selection algorithms Recursive Feature Elimination and Zero-Norm Optimization
which are based on the training of Support Vector Machines (SVM).
These algorithms can provide more accurate solutions than standard filter methods for feature selection.
We adapt the methods for the purpose of selecting EEG channels.
For a motor imagery paradigm we show that the number of used channels can be
reduced significantly without increasing the
classification error.
The resulting best channels agree well with the expected underlying cortical activity patterns during the mental tasks.
Furthermore we show how time dependent task specific information can be visualized.

The goal of this article is to develop a framework for large margin classification in metric spaces. We want to find a generalization of linear decision functions for metric spaces and define a corresponding notion of margin such that the decision function separates the training points with a large margin. It will turn out that using Lipschitz functions as decision functions, the inverse of the Lipschitz constant can be interpreted as the size of a margin. In order to construct a clean mathematical setup we isometrically embed the given metric space into a Banach space and the space of Lipschitz functions into its dual space. To analyze the resulting algorithm, we prove several representer theorems. They state that there always exist solutions of the Lipschitz classifier which can be expressed in terms of distance functions to training points. We provide generalization bounds for Lipschitz classifiers in terms of the Rademacher complexities of some Lipschitz function classes. The generality of our approach can be seen from the fact that several well-known algorithms are special cases of the Lipschitz classifier, among them the support vector machine, the linear programming machine, and the 1-nearest neighbor classifier.

The annual Neural Information Processing (NIPS) conference is the flagship meeting on neural computation. It draws a diverse group of attendeesphysicists, neuroscientists, mathematicians, statisticians, and computer scientists. The presentations are interdisciplinary, with contributions in algorithms, learning theory, cognitive science, neuroscience, brain imaging, vision, speech and signal processing, reinforcement learning and control, emerging technologies, and applications. Only thirty percent of the papers submitted are accepted for presentation at NIPS, so the quality is exceptionally high. This volume contains all the papers presented at the 2003 conference.

Functional genomics represents a new challenging approach in order to analyze complex diseases such as osteoarthritis on a molecular level. The characterization of the molecular changes of the cartilage cells, the chondrocytes, enables a better understanding of the pathomechanisms of the disease. In particular, the identification and characterization of new target molecules for therapeutic intervention is of interest. Also, potential molecular markers for diagnosis and monitoring of osteoarthritis contribute to a more appropriate patient management. The DNA-microarray technology complements (but does not replace) biochemical and biological research in new disease-relevant genes. Large-scale functional genomics will identify molecular networks such as yet identified players in the anabolic-catabolic balance of articular cartilage as well as disease-relevant intracellular signaling cascades so far rather unknown in articular chondrocytes. However, at the moment it is also important to recognize the limitations of the microarray technology in order to avoid over-interpretation of the results. This might lead to misleading results and prevent to a significant extent a proper use of the potential of this technology in the field of osteoarthritis.

In this paper we investigate connections between statistical learning
theory and data compression on the basis of support vector machine (SVM)
model selection. Inspired by several generalization bounds we construct
"compression coefficients" for SVMs which measure the amount by which the
training labels can be compressed by a code built from the separating
hyperplane. The main idea is to relate the coding precision to geometrical
concepts such as the width of the margin or the shape of the data in the
feature space. The so derived compression coefficients combine well known
quantities such as the radius-margin term R^2/rho^2, the eigenvalues of the
kernel matrix, and the number of support vectors. To test whether they are
useful in practice we ran model selection experiments on benchmark data
sets. As a result we found that compression coefficients can fairly
accurately predict the parameters for which the test error is minimized.

Usually, noise is considered to be destructive. We present a new method that constructively injects noise to assess the reliability and the grouping structure of empirical ICA component estimates. Our method can be viewed as a Monte-Carlo-style approximation of the curvature of some performance measure at the solution. Simulations show that the true root-mean-squared angle distances between the real sources and the source estimates can be approximated well by our method. In a toy experiment, we see that we are also able to reveal the underlying grouping structure of the extracted ICA components. Furthermore, an experiment with fetal ECG data demonstrates that our approach is useful for exploratory data analysis of real-world data.

In Support Vector (SV) regression, a parameter ν controls the number of Support Vectors and the number of points that come to lie outside of the so-called var epsilon-insensitive tube. For various noise models and SV parameter settings, we experimentally determine the values of ν that lead to the lowest generalization error. We find good agreement with the values that had previously been predicted by a theoretical argument based on the asymptotic efficiency of a simplified model of SV regression. As a side effect of the experiments, valuable information about the generalization behavior of the remaining SVM parameters and their dependencies is gained. The experimental findings are valid even for complex real-world data sets. Based on our results on the role of the ν-SVM parameters, we discuss various model selection methods.

Proceedings of the National Academy of Science, 101(17):6559-6563, 2004 (article)

Abstract

Biologists regularly search databases of DNA or protein sequences for evolutionary or functional relationships to a given query sequence. We describe a ranking algorithm that exploits the entire network structure of similarity relationships among proteins in a sequence database by performing a diffusion operation on a pre-computed, weighted network. The resulting ranking algorithm, evaluated using a human-curated database of protein structures, is efficient and provides significantly better rankings than a local network search algorithm such as PSI-BLAST.

We measure the performance of five subjects in a slant-discrimination task for differently textured planes. As textures we used uniform lattices, randomly displaced lattices, circles (polka dots), Voronoi tessellations, plaids, 1/f noise, coherent noise and a leopard skin-like texture. Our results show: (1) Improving performance with larger slants for all textures. (2) Thus, following from (1), cases of non-symmetrical performance around a particular orientation. (3) For orientations sufficiently slanted, the different textures do not elicit major differences in performance, (4) while for orientations closer to the vertical plane there are marked differences between them. (5) These differences allow a rank-order of textures to be formed according to their helpfulness that is, how easy the discrimination task is when a particular texture is mapped on the plane. Polka dots tend to allow the best slant discrimination performance, noise patterns the worst. Two additional experiments were conducted to test the generality of the obtained rank-order. First, the tilt of the planes was rotated to break the axis of gravity present in the original discrimination experiment. Second, the task was changed to a slant report task via probe adjustment. The results of both control experiments confirmed the texture-based rank-order previously obtained. We comment on the importance of these results for depth perception research in general, and in particular the implications our results have for studies of cue combination (sensor fusion) using texture as one of the cues involved.

Remote homology detection between protein sequences is a central problem in computational biology. Discriminative methods involving support vector machines (SVM) are currently the most effective methods for the problem of superfamily recognition in the SCOP database. The performance of SVMs depend critically on the kernel function used to quantify the similarity between sequences. We propose new kernels for strings adapted to biological sequences, which we call local alignment kernels. These kernels measure the similarity between two sequences by summing up scores obtained from local alignments with gaps of the sequences. When tested in combination with SVM on their ability to recognize SCOP superfamilies on a benchmark dataset, the new kernels outperform state-of-the art methods for remote homology detection.

The retrieval of wind vectors from satellite scatterometer observations is a non-linear inverse problem.A common approach to solving inverse problems is to adopt a Bayesian framework and to infer the posterior distribution of the parameters of interest given the observations by using a likelihood model relating the observations to the parameters, and a prior distribution over the parameters.We show how Gaussian process priors can be used efficiently with a variety of likelihood models, using local forward (observation) models and direct inverse models for the scatterometer.We present an enhanced Markov chain Monte Carlo method to sample from the resulting multimodal posterior distribution.We go on to show how the computational complexity of the inference can be controlled by using a sparse, sequential Bayes algorithm for estimation with Gaussian processes.This helps to overcome the most serious barrier to the use of probabilistic, Gaussian process methods in remote sensing inverse problems, which is the prohibitively large size of the data sets.We contrast the sampling results with the approximations that are found by using the sparse, sequential Bayes algorithm.

DNA microarray analysis was used to investigate the molecular phenotype of one of the first human chondrocyte cell lines, C-20/A4, derived from juvenile costal chondrocytes by immortalization with origin-defective simian virus 40 large T antigen. Clontech Human Cancer Arrays 1.2 and quantitative PCR were used to examine gene expression profiles of C-20/A4 cells cultured in the presence of serum in monolayer and alginate beads. In monolayer cultures, genes involved in cell proliferation were strongly upregulated compared to those expressed by human adult articular chondrocytes in primary culture. Of the cell cycle-regulated genes, only two, the CDK regulatory subunit and histone H4, were downregulated after culture in alginate beads, consistent with the ability of these cells to proliferate in suspension culture. In contrast, the expression of several genes that are involved in pericellular matrix formation, including MMP-14, COL6A1, fibronectin, biglycan and decorin, was upregulated when the C-20/A4 cells were transferred to suspension culture in alginate. Also, nexin-1, vimentin, and IGFBP-3, which are known to be expressed by primary chondrocytes, were differentially expressed in our study. Consistent with the proliferative phenotype of this cell line, few genes involved in matrix synthesis and turnover were highly expressed in the presence of serum. These results indicate that immortalized chondrocyte cell lines, rather than substituting for primary chondrocytes, may serve as models for extending findings on chondrocyte function not achievable by the use of primary chondrocytes.

We show via an equivalence of mathematical programs that a support vector (SV) algorithm can be translated into an equivalent boosting-like algorithm and vice versa. We exemplify this translation procedure for a new algorithmone-class leveragingstarting from the one-class support vector machine (1-SVM). This is a first step toward unsupervised learning in a boosting framework. Building on so-called barrier methods known from the theory of constrained optimization, it returns a function, written as a convex combination of base hypotheses, that characterizes whether a given test point is likely to have been generated from the distribution underlying the training data. Simulations on one-class classification problems demonstrate the usefulness of our approach.

Motivation: Large scale gene expression data are often analysed by clustering genes based on gene expression data alone, though a priori knowledge in the form of biological networks is available. The use of this additional information promises to improve exploratory analysis considerably.
Results: We propose constructing a distance function which combines information from expression data and biological networks. Based on this function, we compute a joint clustering of genes and vertices of the network. This general approach is elaborated for metabolic networks. We define a graph distance function on such networks and combine it with a correlation-based distance function for gene expression measurements. A hierarchical clustering and an associated statistical measure is computed to arrive at a reasonable number of clusters. Our method is validated using expression data of the yeast diauxic shift. The resulting clusters are easily interpretable in terms of the biochemical network and the gene expression data and suggest that our method is able to automatically identify processes that are relevant under the measured conditions.

The authors used a recognition memory paradigm to assess the influence of color information on visual memory for images of natural scenes. Subjects performed 5-10% better for colored than for black-and-white images independent of exposure duration. Experiment 2 indicated little influence of contrast once the images were suprathreshold, and Experiment 3 revealed that performance worsened when images were presented in color and tested in black and white, or vice versa, leading to the conclusion that the surface property color is part of the memory representation. Experiments 4 and 5 exclude the possibility that the superior recognition memory for colored images results solely from attentional factors or saliency. Finally, the recognition memory advantage disappears for falsely colored images of natural scenes: The improvement in recognition memory depends on the color congruence of presented images with learned knowledge about the color gamut found within natural scenes. The results can be accounted for within a multiple memory systems framework.

Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification
problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines,
provide experimental results, and discuss their respective merits. One of the significant new results reported in this work is our recent achievement of the
lowest reported test error on the well-known MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than
previous SVM methods.

Model selection is an important ingredient of many machine
learning algorithms, in particular when the sample size in
small, in order to strike the right trade-off between overfitting
and underfitting. Previous classical results for linear regression
are based on an asymptotic analysis. We present a new
penalization method for performing model selection for
regression that is appropriate even for small samples.
Our penalization is based on an accurate estimator of the
ratio of the expected training error and the expected
generalization error, in terms of the expected eigenvalues
of the input covariance matrix.

The detectability of contrast increments was measured as a function of the contrast of a masking or pedestal grating at a number of different spatial frequencies ranging from 2 to 16 cycles per degree of visual angle. The pedestal grating always had the same orientation, spatial frequency and phase as the signal. The shape of the contrast increment threshold versus pedestal contrast (TvC) functions depend of the performance level used to define the threshold, but when both axes are normalized by the contrast corresponding to 75% correct detection at each frequency, the (TvC) functions at a given performance level are identical. Confidence intervals on the slope of the rising part of the TvC functions are so wide that it is not possible with our data to reject Webers Law.

We introduce new concentration inequalities for functions on product spaces.
They allow to obtain a Bennett type deviation bound for suprema of
empirical processes indexed by upper bounded functions.
The result is an improvement on Rio's version \cite{Rio01b} of Talagrand's
inequality \cite{Talagrand96} for equidistributed variables.

We describe in this article a new code for evolving
axisymmetric isolated systems in general relativity. Such systems are described by asymptotically flat space-times, which have the property that they admit a conformal extension. We are working directly in the extended conformal manifold and solve numerically Friedrich's conformal field equations, which state that Einstein's equations hold in the physical space-time. Because of the compactness of the conformal space-time the entire space-time can be calculated on a finite numerical grid. We describe in detail the numerical scheme, especially the treatment of the axisymmetry and the boundary.

We define notions of stability for learning algorithms
and show
how to use these notions to derive generalization error bounds
based on the empirical error and the leave-one-out error. The
methods we use can be applied in the regression framework as well
as in the classification one when the classifier is obtained by
thresholding a real-valued function. We study the stability
properties of large classes of learning algorithms such as
regularization based algorithms. In particular we focus on Hilbert
space regularization and Kullback-Leibler regularization. We
demonstrate how to apply the results to SVM for regression and
classification.

The quantification of perfusion using dynamic susceptibility contrast MR imaging requires deconvolution to obtain the residual impulse-response function (IRF). Here, a method using a Gaussian process for deconvolution, GPD, is proposed. The fact that the IRF is smooth is incorporated as a constraint in the method. The GPD method, which automatically estimates the noise level in each voxel, has the advantage that model parameters are optimized automatically. The GPD is compared to singular value decomposition (SVD) using a common threshold for the singular values and to SVD using a threshold optimized according to the noise level in each voxel. The comparison is carried out using artificial data as well as using data from healthy volunteers. It is shown that GPD is comparable to SVD variable optimized threshold when determining the maximum of the IRF, which is directly related to the perfusion. GPD provides a better estimate of the entire IRF. As the signal to noise ratio increases or the time resolution of the measurements increases, GPD is shown to be superior to SVD. This is also found for large distribution volumes.

In this paper, we examine on-line learning problems in which the target
concept is allowed to change over time. In each trial a master algorithm
receives predictions from a large set of n experts. Its goal is to predict
almost as well as the best sequence of such experts chosen off-line by
partitioning the training sequence into k+1 sections and then choosing
the best expert for each section. We build on methods developed by
Herbster and Warmuth and consider an open problem posed by
Freund where the experts in the best partition are from a small
pool of size m.
Since k >> m, the best expert shifts back and forth
between the experts of the small pool.
We propose algorithms that solve
this open problem by mixing the past posteriors maintained by the master
algorithm. We relate the number of bits needed for encoding the best
partition to the loss bounds of the algorithms.
Instead of paying log n for
choosing the best expert in each section we first pay log (n choose m)
bits in the bounds for identifying the pool of m experts
and then log m bits per new section.
In the bounds we also pay twice for encoding the
boundaries of the sections.

Detection performance was measured with sinusoidal and pulse-train gratings. Although the 2.09-c/deg pulse-train, or line gratings, contained at least 8 harmonics all at equal contrast, they were no more detectable than their most detectable component. The addition of broadband pink noise designed to equalize the detectability of the components of the pulse train made the pulse train about a factor of four more detectable than any of its components. However, in contrast-discrimination experiments, with a pedestal or masking grating of the same form and phase as the signal and 15% contrast, the noise did not affect the discrimination performance of the pulse train relative to that obtained with its sinusoidal components. We discuss the implications of these observations for models of early vision in particular the implications for possible sources of internal noise.

The problem of automatically tuning multiple parameters for pattern recognition Support Vector Machines (SVM) is considered. This is done by minimizing some estimates of the generalization error of SVMs using a gradient descent algorithm over the set of parameters. Usual methods for choosing parameters, based on exhaustive search become intractable as soon as the number of parameters exceeds two. Some experimental results assess the feasibility of our approach for a large number of parameters (more than 100) and demonstrate an improvement of generalization performance.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems