We show via an equivalence of mathematical programs that a support vector (SV) algorithm can be translated into an equivalent boosting-like algorithm and vice versa. We exemplify this translation procedure for a new algorithmone-class leveragingstarting from the one-class support vector machine (1-SVM). This is a first step toward unsupervised learning in a boosting framework. Building on so-called barrier methods known from the theory of constrained optimization, it returns a function, written as a convex combination of base hypotheses, that characterizes whether a given test point is likely to have been generated from the distribution underlying the training data. Simulations on one-class classification problems demonstrate the usefulness of our approach.

Motivation: Large scale gene expression data are often analysed by clustering genes based on gene expression data alone, though a priori knowledge in the form of biological networks is available. The use of this additional information promises to improve exploratory analysis considerably.
Results: We propose constructing a distance function which combines information from expression data and biological networks. Based on this function, we compute a joint clustering of genes and vertices of the network. This general approach is elaborated for metabolic networks. We define a graph distance function on such networks and combine it with a correlation-based distance function for gene expression measurements. A hierarchical clustering and an associated statistical measure is computed to arrive at a reasonable number of clusters. Our method is validated using expression data of the yeast diauxic shift. The resulting clusters are easily interpretable in terms of the biochemical network and the gene expression data and suggest that our method is able to automatically identify processes that are relevant under the measured conditions.

The authors used a recognition memory paradigm to assess the influence of color information on visual memory for images of natural scenes. Subjects performed 5-10% better for colored than for black-and-white images independent of exposure duration. Experiment 2 indicated little influence of contrast once the images were suprathreshold, and Experiment 3 revealed that performance worsened when images were presented in color and tested in black and white, or vice versa, leading to the conclusion that the surface property color is part of the memory representation. Experiments 4 and 5 exclude the possibility that the superior recognition memory for colored images results solely from attentional factors or saliency. Finally, the recognition memory advantage disappears for falsely colored images of natural scenes: The improvement in recognition memory depends on the color congruence of presented images with learned knowledge about the color gamut found within natural scenes. The results can be accounted for within a multiple memory systems framework.

Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification
problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines,
provide experimental results, and discuss their respective merits. One of the significant new results reported in this work is our recent achievement of the
lowest reported test error on the well-known MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than
previous SVM methods.

Model selection is an important ingredient of many machine
learning algorithms, in particular when the sample size in
small, in order to strike the right trade-off between overfitting
and underfitting. Previous classical results for linear regression
are based on an asymptotic analysis. We present a new
penalization method for performing model selection for
regression that is appropriate even for small samples.
Our penalization is based on an accurate estimator of the
ratio of the expected training error and the expected
generalization error, in terms of the expected eigenvalues
of the input covariance matrix.

The detectability of contrast increments was measured as a function of the contrast of a masking or pedestal grating at a number of different spatial frequencies ranging from 2 to 16 cycles per degree of visual angle. The pedestal grating always had the same orientation, spatial frequency and phase as the signal. The shape of the contrast increment threshold versus pedestal contrast (TvC) functions depend of the performance level used to define the threshold, but when both axes are normalized by the contrast corresponding to 75% correct detection at each frequency, the (TvC) functions at a given performance level are identical. Confidence intervals on the slope of the rising part of the TvC functions are so wide that it is not possible with our data to reject Webers Law.

We introduce new concentration inequalities for functions on product spaces.
They allow to obtain a Bennett type deviation bound for suprema of
empirical processes indexed by upper bounded functions.
The result is an improvement on Rio's version \cite{Rio01b} of Talagrand's
inequality \cite{Talagrand96} for equidistributed variables.

We describe in this article a new code for evolving
axisymmetric isolated systems in general relativity. Such systems are described by asymptotically flat space-times, which have the property that they admit a conformal extension. We are working directly in the extended conformal manifold and solve numerically Friedrich's conformal field equations, which state that Einstein's equations hold in the physical space-time. Because of the compactness of the conformal space-time the entire space-time can be calculated on a finite numerical grid. We describe in detail the numerical scheme, especially the treatment of the axisymmetry and the boundary.

We define notions of stability for learning algorithms
and show
how to use these notions to derive generalization error bounds
based on the empirical error and the leave-one-out error. The
methods we use can be applied in the regression framework as well
as in the classification one when the classifier is obtained by
thresholding a real-valued function. We study the stability
properties of large classes of learning algorithms such as
regularization based algorithms. In particular we focus on Hilbert
space regularization and Kullback-Leibler regularization. We
demonstrate how to apply the results to SVM for regression and
classification.

The quantification of perfusion using dynamic susceptibility contrast MR imaging requires deconvolution to obtain the residual impulse-response function (IRF). Here, a method using a Gaussian process for deconvolution, GPD, is proposed. The fact that the IRF is smooth is incorporated as a constraint in the method. The GPD method, which automatically estimates the noise level in each voxel, has the advantage that model parameters are optimized automatically. The GPD is compared to singular value decomposition (SVD) using a common threshold for the singular values and to SVD using a threshold optimized according to the noise level in each voxel. The comparison is carried out using artificial data as well as using data from healthy volunteers. It is shown that GPD is comparable to SVD variable optimized threshold when determining the maximum of the IRF, which is directly related to the perfusion. GPD provides a better estimate of the entire IRF. As the signal to noise ratio increases or the time resolution of the measurements increases, GPD is shown to be superior to SVD. This is also found for large distribution volumes.

In this paper, we examine on-line learning problems in which the target
concept is allowed to change over time. In each trial a master algorithm
receives predictions from a large set of n experts. Its goal is to predict
almost as well as the best sequence of such experts chosen off-line by
partitioning the training sequence into k+1 sections and then choosing
the best expert for each section. We build on methods developed by
Herbster and Warmuth and consider an open problem posed by
Freund where the experts in the best partition are from a small
pool of size m.
Since k >> m, the best expert shifts back and forth
between the experts of the small pool.
We propose algorithms that solve
this open problem by mixing the past posteriors maintained by the master
algorithm. We relate the number of bits needed for encoding the best
partition to the loss bounds of the algorithms.
Instead of paying log n for
choosing the best expert in each section we first pay log (n choose m)
bits in the bounds for identifying the pool of m experts
and then log m bits per new section.
In the bounds we also pay twice for encoding the
boundaries of the sections.

Detection performance was measured with sinusoidal and pulse-train gratings. Although the 2.09-c/deg pulse-train, or line gratings, contained at least 8 harmonics all at equal contrast, they were no more detectable than their most detectable component. The addition of broadband pink noise designed to equalize the detectability of the components of the pulse train made the pulse train about a factor of four more detectable than any of its components. However, in contrast-discrimination experiments, with a pedestal or masking grating of the same form and phase as the signal and 15% contrast, the noise did not affect the discrimination performance of the pulse train relative to that obtained with its sinusoidal components. We discuss the implications of these observations for models of early vision in particular the implications for possible sources of internal noise.

The problem of automatically tuning multiple parameters for pattern recognition Support Vector Machines (SVM) is considered. This is done by minimizing some estimates of the generalization error of SVMs using a gradient descent algorithm over the set of parameters. Usual methods for choosing parameters, based on exhaustive search become intractable as soon as the number of parameters exceeds two. Some experimental results assess the feasibility of our approach for a large number of parameters (more than 100) and demonstrate an improvement of generalization performance.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems