2005

We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prove that when the RKHSs are universal, both functionals are zero if and only if the random variables are pairwise independent.
We also show that the kernel mutual information is an upper bound near independence on the Parzen window estimate of the mutual information.
Analogous results apply for two correlation-based dependence functionals introduced earlier: we show the kernel canonical correlation and the kernel generalised variance to be independence measures for universal
kernels, and prove the latter to be an upper bound on the mutual information near independence. The performance of the kernel dependence functionals in measuring independence is verified in the context of independent component analysis.

We provide a new unifying view, including all existing proper probabilistic
sparse approximations for Gaussian process regression. Our approach relies on
expressing the effective prior which the methods are using. This
allows new insights to be gained, and highlights the relationship between
existing methods. It also allows for a clear theoretically justified ranking
of the closeness of the known approximations to the corresponding full GPs.
Finally we point directly to designs of new better sparse approximations,
combining the best of the existing strategies, within attractive
computational constraints.

Journal of Computer and System Sciences, 71(3):333-359, October 2005 (article)

Abstract

In order to apply the maximum margin method in arbitrary metric
spaces, we suggest to embed the metric space into a Banach or
Hilbert space and to perform linear classification in this space.
We propose several embeddings and recall that an isometric embedding
in a Banach space is always possible while an isometric embedding in
a Hilbert space is only possible for certain metric spaces. As a
result, we obtain a general maximum margin classification
algorithm for arbitrary metric spaces (whose solution is
approximated by an algorithm of Graepel.
Interestingly enough, the embedding approach, when applied to a metric
which can be embedded into a Hilbert space, yields the SVM
algorithm, which emphasizes the fact that its solution depends on
the metric and not on the kernel. Furthermore we give upper bounds
of the capacity of the function classes corresponding to both
embeddings in terms of Rademacher averages. Finally we compare the
capacities of these function classes directly.

Gaussian process priors can be used to define flexible, probabilistic classification models. Unfortunately exact Bayesian inference is analytically intractable and various approximation techniques have been proposed. In this work we review and compare Laplace‘s method and Expectation Propagation for approximate Bayesian inference in the binary Gaussian process classification model. We present a comprehensive comparison of the approximations, their predictive performance and marginal likelihood estimates to results obtained by MCMC sampling. We explain theoretically and corroborate empirically the advantages of Expectation Propagation compared to Laplace‘s method.

Several large scale data mining applications, such as text categorization and gene expression analysis, involve high-dimensional data that is also inherently directional in nature. Often such data is L2 normalized so that it lies on the surface of a unit hypersphere. Popular models such as (mixtures of) multi-variate Gaussians are inadequate for characterizing such data. This paper proposes a generative mixture-model approach to clustering directional data based on the von Mises-Fisher (vMF) distribution, which arises naturally for data distributed on the unit hypersphere. In particular, we derive and analyze two variants of the Expectation Maximization (EM) framework for estimating the mean and concentration parameters of this mixture. Numerical estimation of the concentration parameters is non-trivial in high dimensions since it involves functional inversion of ratios of Bessel functions. We also formulate two clustering algorithms corresponding to the variants of EM that we derive. Our approach provides a
theoretical basis for the use of cosine similarity that has been widely employed by the information retrieval community, and obtains the spherical kmeans algorithm (kmeans with cosine similarity) as a special case of both variants. Empirical results on clustering of high-dimensional text and gene-expression data based on a mixture of vMF distributions show that the ability to estimate the concentration parameter for each vMF component, which is not present in existing approaches, yields superior results, especially for difficult clustering tasks in high-dimensional spaces.

We propose statistical learning methods for approximating implicit surfaces and computing dense 3D deformation fields. Our approach is based on Support Vector (SV) Machines, which are state of the art in machine learning. It is straightforward to implement and computationally competitive; its parameters can be automatically set using standard machine learning methods.
The surface approximation is based on a modified Support Vector regression. We present applications to 3D head reconstruction, including automatic removal of outliers and hole filling.
In a second step, we build on our SV representation to compute dense 3D deformation fields between two objects.
The fields are computed using a generalized SVMachine enforcing correspondence between the previously learned implicit SV object representations, as well as correspondences between feature points if such points are available.
We apply the method to the morphing of 3D heads and other objects.

ENTROPY index monitoring, based on spectral entropy of the electroencephalogram,
is a promising new method to measure the depth of anaesthesia. We examined the
association between spectral entropy and regional cerebral blood flow in healthy
subjects anaesthetised with 2%, 3% and 4% end-expiratory concentrations of
sevoflurane and 7.6, 12.5 and 19.0 microg.ml(-1) plasma drug concentrations of
propofol. Spectral entropy from the frequency band 0.8-32 Hz was calculated and
cerebral blood flow assessed using positron emission tomography and
[(15)O]-labelled water at baseline and at each anaesthesia level. Both drugs
induced significant reductions in spectral entropy and cortical and global
cerebral blood flow. Midfrontal-central spectral entropy was associated with
individual frontal and whole brain blood flow values across all conditions,
suggesting that this novel measure of anaesthetic depth can depict global
changes in neuronal activity induced by the drugs. The cortical areas of the
most significant associations were remarkably similar for both drugs.

Support vector machines (SVM) have been successfully used to classify proteins into functional categories.
Recently, to integrate multiple data sources, a semidefinite programming (SDP) based SVM method was introduced Lanckriet et al (2004). In SDP/SVM, multiple kernel matrices corresponding to each of data sources are combined with
weights obtained by solving an SDP. However, when trying to apply SDP/SVM to large problems, the computational cost can become prohibitive, since both converting the data to a kernel matrix for the SVM and solving the SDP are time and memory demanding. Another application-specific drawback arises when some of the data sources are protein networks. A common method of converting the network to a kernel matrix is the diffusion kernel method, which has
time complexity of O(n^3), and produces a dense matrix of size n x n. We propose an efficient method of protein classification using multiple protein networks. Available protein networks, such as a physical interaction network or a
metabolic network, can be directly incorporated. Vectorial data can also be incorporated after conversion into a network by means of neighbor point connection. Similarly to the SDP/SVM method, the combination weights are obtained by convex optimization. Due to the sparsity of network edges, the computation time is nearly linear in the number of edges
of the combined network. Additionally, the combination weights provide information useful for discarding noisy or irrelevant networks. Experiments on function prediction of 3588 yeast proteins show promising results: the computation time is enormously reduced, while the accuracy is still comparable to the SDP/SVM method.

Motivation: We tackle the problem of finding regularities in microarray data. Various data mining tools, such as clustering, classification, Bayesian networks and association rules, have been applied so far to gain insight into gene-expression data. Association rule mining techniques used so far work on discretizations of the data and cannot account for cumulative effects. In this paper, we investigate the use of quantitative association rules that can operate directly on numeric data and represent cumulative effects of variables. Technically speaking, this type of quantitative association rules based on half-spaces can find non-axis-parallel regularities.
Results: We performed a variety of experiments testing the utility of quantitative association rules for microarray data. First of all, the results should be statistically significant and robust against fluctuations in the data. Next, the approach should be scalable in the number of variables, which is important for such high-dimensional data. Finally, the rules should make sense biologically and be sufficiently different from rules found in regular association rule mining working with discretizations. In all of these dimensions, the proposed approach performed satisfactorily. Therefore, quantitative association rules based on half-spaces should be considered as a tool for the analysis of microarray gene-expression data.

In recent years, Kernel Principal Component Analysis (KPCA) has been suggested for various image processing tasks requiring an image model such as, e.g., denoising or compression. The original form of KPCA, however, can be only applied to strongly restricted image classes due to the limited number of training examples that can be processed. We therefore propose a new iterative method for performing KPCA, the Kernel Hebbian Algorithm which iteratively estimates the Kernel Principal Components with only linear order memory complexity. In our experiments, we compute models for complex image classes such as faces and natural images which require a large number of training examples. The resulting image models are tested in single-frame super-resolution and denoising applications. The KPCA model is not specifically tailored to these tasks; in fact, the same model can be used in super-resolution with variable input resolution, or denoising with unknown noise characteristics. In spite of this, both super-resolution a
nd denoising performance are comparable to existing methods.

Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary issue of designing classification algorithms that can deal with more complex outputs, such as trees, sequences, or sets. More generally, we consider problems involving multiple dependent output variables, structured output spaces, and classification problems with class attributes. In order to accomplish this, we propose to appropriately generalize the well-known notion of a separation margin and derive a corresponding maximum-margin formulation. While this leads to a quadratic program with a potentially prohibitive, i.e. exponential, number of constraints, we present a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems. The proposed method has important applications in areas such as computational biology, natural language processing, information retrieval/extraction, and optical character recognition. Experiments from various domains involving different types of output spaces emphasize the breadth and generality of our approach.

In order to understand the cellular disease mechanisms of osteoarthritic cartilage degeneration it is of primary importance to understand both the anabolic and the catabolic processes going on in parallel in the diseased tissue. In this study, we have applied cDNA-array technology (Clontech) to study gene expression patterns of primary human normal adult articular chondrocytes isolated from one donor cultured under anabolic (serum) and catabolic (IL-1beta) conditions. Significant differences between the different in vitro cultures tested were detected. Overall, serum and IL-1beta significantly altered gene expression levels of 102 and 79 genes, respectively. IL-1beta stimulated the matrix metalloproteinases-1, -3, and -13 as well as members of its intracellular signaling cascade, whereas serum increased the expression of many cartilage matrix genes. Comparative gene expression analysis with previously published in vivo data (normal and osteoarthritic cartilage) showed significant differences of all in vitro s
timulations compared to the changes detected in osteoarthritic cartilage in vivo. This investigation allowed us to characterize gene expression profiles of two classical anabolic and catabolic stimuli of human adult articular chondrocytes in vitro. No in vitro model appeared to be adequate to study overall gene expression alterations in osteoarthritic cartilage. Serum stimulated in vitro cultures largely reflected the results that were only consistent with the anabolic activation seen in osteoarthritic chondrocytes. In contrast, IL-1beta did not appear to be a good model for mimicking catabolic gene alterations in degenerating chondrocytes.

Gene expression profiling of three chondrosarcoma derived cell lines (AD, SM, 105KC) showed an increased proliferative activity and a reduced expression of chondrocytic-typical matrix products compared to primary chondrocytes. The incapability to maintain an adequate matrix synthesis as well as a notable proliferative activity at the same time is comparable to neoplastic chondrosarcoma cells in vivo which cease largely cartilage matrix formation as soon as their proliferative activity increases. Thus, the investigated cell lines are of limited value as substitute of primary chondrocytes but might have a much higher potential to investigate the behavior of neoplastic chondrocytes, i.e. chondrosarcoma biology.

We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present some applications to classification and prediction with convex function classes, and with kernel classes in particular.

International Journal of Imaging Systems and Technology, 15(1):48-55, July 2005 (article)

Abstract

This paper proposes a new independent component analysis (ICA) method which is able to unmix overcomplete mixtures of sparce or structured signals like speech, music or images. Furthermore, the method is designed to be robust against outliers, which is a favorable feature for ICA algorithms since most of them are extremely sensitive to outliers. Our approach is based on a simple outlier index. However, instead of robustifying an existing algorithm by some outlier rejection technique we show how this index can be used directly to solve the ICA problem for super-Gaussian sources. The resulting inlier-based ICA (IBICA) is outlier-robust by construction and can be used for standard ICA as well as for overcomplete ICA (i.e. more source signals than observed signals).

This paper addresses the problem of choosing a kernel suitable for
estimation with a Support Vector
Machine, hence further automating machine learning.
This goal is achieved by defining a Reproducing Kernel Hilbert
Space on the space of kernels itself. Such a formulation leads to a
statistical estimation problem similar to the problem of minimizing
a regularized risk functional.
We state the equivalent
representer theorem for the choice of kernels and present a
semidefinite programming formulation of the resulting optimization
problem. Several recipes for constructing hyperkernels are provided, as
well as the details of common machine learning problems. Experimental
results for classification, regression and novelty
detection on UCI data show the feasibility of our approach.

One way of image denoising is to project a noisy image to the subspace of admissible images derived, for instance, by PCA. However, a major drawback of this method is that all pixels are updated by the projection, even when only a few pixels are corrupted by noise or occlusion. We propose a new method to identify the noisy pixels by l1-norm penalization and to update the identified pixels only. The identification and updating of noisy pixels are formulated as one linear program which can be efficiently solved. In particular, one can apply the upsilon trick to directly specify the fraction of pixels to be reconstructed. Moreover, we extend the linear program to be able to exploit prior knowledge that occlusions often appear in contiguous blocks (e.g., sunglasses on faces). The basic idea is to penalize boundary points and interior points of the occluded area differently. We are also able to show the upsilon property for this extended LP leading to a method which is easy to use. Experimental results demonstrate the power of our approach.

We address the problem of learning a symmetric positive definite matrix. The central issue is to design
parameter updates that preserve positive definiteness. Our updates are motivated with the von
Neumann divergence. Rather than treating the most general case, we focus on two key applications
that exemplify our methods: on-line learning with a simple square loss, and finding a symmetric
positive definite matrix subject to linear constraints. The updates generalize the exponentiated gradient
(EG) update and AdaBoost, respectively: the parameter is now a symmetric positive definite
matrix of trace one instead of a probability vector (which in this context is a diagonal positive definite
matrix with trace one). The generalized updates use matrix logarithms and exponentials to
preserve positive definiteness. Most importantly, we show how the derivation and the analyses of
the original EG update and AdaBoost generalize to the non-diagonal case. We apply the resulting
matrix exponentiated gradient (MEG) update and DefiniteBoost to the problem of learning a kernel
matrix from distance measurements.

Motivation: Computational approaches to protein function prediction infer protein function by finding proteins with similar sequence, structure, surface clefts, chemical properties, amino acid motifs, interaction partners or phylogenetic profiles. We present a new approach that combines sequential, structural and chemical information into one graph model of proteins. We predict functional class membership of enzymes and non-enzymes using graph kernels and support vector machine classification on these protein graphs.
Results: Our graph model, derivable from protein sequence and structure only, is competitive with vector models that require additional protein information, such as the size of surface pockets. If we include this extra information into our graph model, our classifier yields significantly higher accuracy levels than the vector models. Hyperkernels allow us to select and to optimally combine the most relevant node attributes in our protein graphs. We have laid the foundation for a protein function prediction system that integrates protein information from various sources efficiently and effectively.

Journal of the Optical Society of America A, 22(5):801-809, May 2005 (article)

Abstract

A number of models of depth cue combination suggest that the final depth percept results from a weighted average of independent depth estimates based on the different cues available. The weight of each cue in such an average is thought to depend on the reliability of each cue. In principle, such a depth estimation could be statistically optimal in the sense of producing the minimum variance unbiased estimator that can be constructed from the available information. Here we test such models using visual and haptic depth information. Different texture types produce differences in slant discrimination performance, providing a means for testing a reliability-sensitive cue combination model using texture as one of the cues to slant. Our results show that the weights for the cues were generally sensitive to their reliability, but fell short of statistically optimal combinationwe find reliability-based re-weighting, but not statistically optimal cue combination.

In psychophysical studies, the psychometric function is used to model the relation between physical stimulus intensity and the observers ability to detect or discriminate between stimuli of different intensities. In this study, we propose the use of Bayesian inference to extract the information contained in experimental data to estimate the parameters of psychometric functions. Because Bayesian inference cannot be performed analytically, we describe how a Markov chain Monte Carlo method can be used to generate samples from the posterior distribution over parameters. These samples are used to estimate Bayesian confidence intervals and other characteristics of the posterior distribution. In addition, we discuss the parameterization of psychometric functions and the role of prior distributions in the analysis. The proposed approach is exemplified using artificially generated data and in a case study for real experimental data. Furthermore, we compare our approach with traditional methods based on maximum likelihood parameter estimation combined with bootstrap techniques for confidence interval estimation and find the Bayesian approach to be superior.

Regulatory regions of plant genes tend to be more compact than those of animal genes, but the complement of transcription factors encoded in plant genomes is as large or larger than that found in those of animals. Plants therefore provide an opportunity to study how transcriptional programs control multicellular development. We analyzed global gene expression during development of the reference plant Arabidopsis thaliana in samples covering many stages, from embryogenesis to senescence, and diverse organs. Here, we provide a first analysis of this data set, which is part of the AtGenExpress expression atlas. We observed that the expression levels of transcription factor genes and signal transduction components are similar to those of metabolic genes. Examining the expression patterns of large gene families, we found that they are often more similar than would be expected by chance, indicating that many gene families have been co-opted for specific developmental processes.

We obtained tomograms of isolated mammalian excitatory synapses by cryo-electron tomography. This method allows the investigation of biological material in the frozen-hydrated state, without staining, and can therefore provide reliable structural information at the molecular level. We developed an automated procedure for the segmentation of molecular complexes present in the synaptic cleft based on thresholding and connectivity, and calculated several morphological characteristics of these complexes. Extensive lateral connections along the synaptic cleft are shown to form a highly connected structure with a complex topology. Our results are essentially parameter-free, i.e., they do not depend on the choice of certain parameter values (such as threshold). In addition, the results are not sensitive to noise; the same conclusions can be drawn from the analysis of both nondenoised and denoised tomograms.

If the training pattern set is large, it takes a large memory and a long time to train support vector machine (SVM). Recently, we proposed neighborhood property based pattern selection algorithm (NPPS) which selects only the patterns that are likely to be near the decision boundary ahead of SVM training [Proc. of the 7th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), Lecture Notes in Artificial Intelligence (LNAI 2637), Seoul, Korea, pp. 376387]. NPPS tries to identify those patterns that are likely to become support vectors in feature space. Preliminary reports show its effectiveness: SVM training time was reduced by two orders of magnitude with almost no loss in accuracy for various datasets. It has to be noted, however, that decision boundary of SVM and support vectors are all defined in feature space while NPPS described above operates in input space. If neighborhood relation in input space is not preserved in feature space, NPPS may not always be effective. In this paper, we sh
ow that the neighborhood relation is invariant under input to feature space mapping. The result assures that the patterns selected by NPPS in input space are likely to be located near decision boundary in feature space.

Increased availability of large repositories of chemical compounds is creating new
challenges and opportunities for the application of machine learning methods to
problems in computational chemistry and chemical informatics. Because chemical
compounds are often represented by the graph of their covalent bonds, machine
learning methods in this domain must be capable of processing graphical structures
with variable size. Here we first briefly review the literature on graph kernels and
then introduce three new kernels (Tanimoto, MinMax, Hybrid) based on the idea
of molecular fingerprints and counting labeled paths of depth up to d using depthfirst
search from each possible vertex. The kernels are applied to three classification
problems to predict mutagenicity, toxicity, and anti-cancer activity on three publicly
available data sets. The kernels achieve performances at least comparable, and most
often superior, to those previously reported in the literature reaching accuracies of
91.5% on the Mutag dataset, 65-67% on the PTC (Predictive Toxicology Challenge)
dataset, and 72% on the NCI (National Cancer Institute) dataset. Properties and
tradeoffs of these kernels, as well as other proposed kernels that leverage 1D or 3D
representations of molecules, are briefly discussed.

The linear mixture model has been investigated in most articles tackling the problem of blind source separation. Recently, several articles have addressed a more complex model: blind source separation (BSS) of post-nonlinear (PNL) mixtures. These mixtures are assumed to be generated by applying an unknown invertible nonlinear distortion to linear instantaneous mixtures of some independent sources. The gaussianization technique for BSS of PNL mixtures emerged based on the assumption that the distribution of the linear mixture of independent sources is gaussian. In this letter, we review the gaussianization method and then extend it to apply to PNL mixture in which the linear mixture is close to gaussian. Our proposed method approximates the linear mixture using the Cornish-Fisher expansion. We choose the mutual information as the independence measurement to develop a learning algorithm to separate PNL mixtures. This method provides better applicability and accuracy. We then discuss the sufficient condition for the method to be valid. The characteristics of the nonlinearity do not affect the performance of this method. With only a few parameters to tune, our algorithm has a comparatively low computation. Finally, we present experiments to illustrate the efficiency of our method.

The last few years have witnessed important new developments in the theory and practice
of pattern classification. We intend to survey some of the main new ideas that have lead to these
important recent developments.

A general method for obtaining moment inequalities for functions
of independent random variables is presented. It is a
generalization of the entropy method which has been used to
derive concentration
inequalities for such functions cite{BoLuMa01}, and is based on
a generalized tensorization inequality due to Lata{l}a and Oleszkiewicz
cite{LaOl00}.
The new inequalities prove to be a versatile tool in a
wide range of applications.
We illustrate the power of the method by showing how
it can be used to effortlessly re-derive classical
inequalities including
Rosenthal and Kahane-Khinchine-type inequalities for sums
of independent random variables, moment inequalities for suprema
of empirical processes, and moment inequalities for Rademacher chaos
and $U$-statistics. Some of these corollaries are apparently new.
In particular, we generalize Talagrands exponential inequality
for Rademacher chaos of order two to any order.
We also discuss applications for other complex functions
of independent random variables, such as suprema of boolean polynomials
which include, as special cases, subgraph counting problems in
random graphs.

As the number of complete genomes rapidly increases, accurate methods to automatically predict the subcellular location of proteins are increasingly useful to help their functional annotation. In order to improve the predictive accuracy of the many prediction methods developed to date, a novel representation of protein sequences is proposed. This representation involves local compositions of amino acids and twin amino acids, and local frequencies of distance between successive (basic, hydrophobic, and other) amino acids. For calculating the local features, each sequence is split into three parts: N-terminal, middle, and C-terminal. The N-terminal part is further divided into four regions to consider ambiguity in the length and position of signal sequences. We tested this representation with support vector machines on two data sets extracted from the SWISS-PROT database. Through fivefold cross-validation tests, overall accuracies of more than 87% and 91% were obtained for eukaryotic and prokaryotic proteins, respectively. It is concluded that considering the respective features in the N-terminal, middle, and C-terminal parts is helpful to predict the subcellular location.
Keywords: subcellular location; signal sequence; amino acid composition; distance frequency; support vector machine; predictive accuracy

Most EEG-based Brain Computer Interface (BCI)
paradigms come along with specific electrode positions, e.g.~for a
visual based BCI electrode positions close to the primary visual
cortex are used. For new BCI paradigms
it is usually not known where task relevant activity can be
measured from the scalp. For individual subjects Lal et.~al showed that recording positions can
be found without the use of prior knowledge about the paradigm used. However it remains unclear to what extend their
method of Recursive Channel Elimination (RCE)
can be generalized across subjects.
In this paper we transfer channel rankings from a group of subjects
to a new subject.
For motor imagery tasks the results are promising, although cross-subject channel
selection does not quite achieve the performance of channel selection on data of single subjects.
Although the RCE method was not provided with prior knowledge about the
mental task, channels that are
well known to be important (from a physiological point of view)
were consistently selected whereas task-irrelevant channels
were reliably disregarded.

We show via an equivalence of mathematical programs that a support vector (SV) algorithm can be translated into an equivalent boosting-like algorithm and vice versa. We exemplify this translation procedure for a new algorithmone-class leveragingstarting from the one-class support vector machine (1-SVM). This is a first step toward unsupervised learning in a boosting framework. Building on so-called barrier methods known from the theory of constrained optimization, it returns a function, written as a convex combination of base hypotheses, that characterizes whether a given test point is likely to have been generated from the distribution underlying the training data. Simulations on one-class classification problems demonstrate the usefulness of our approach.

Motivation: Large scale gene expression data are often analysed by clustering genes based on gene expression data alone, though a priori knowledge in the form of biological networks is available. The use of this additional information promises to improve exploratory analysis considerably.
Results: We propose constructing a distance function which combines information from expression data and biological networks. Based on this function, we compute a joint clustering of genes and vertices of the network. This general approach is elaborated for metabolic networks. We define a graph distance function on such networks and combine it with a correlation-based distance function for gene expression measurements. A hierarchical clustering and an associated statistical measure is computed to arrive at a reasonable number of clusters. Our method is validated using expression data of the yeast diauxic shift. The resulting clusters are easily interpretable in terms of the biochemical network and the gene expression data and suggest that our method is able to automatically identify processes that are relevant under the measured conditions.

The authors used a recognition memory paradigm to assess the influence of color information on visual memory for images of natural scenes. Subjects performed 5-10% better for colored than for black-and-white images independent of exposure duration. Experiment 2 indicated little influence of contrast once the images were suprathreshold, and Experiment 3 revealed that performance worsened when images were presented in color and tested in black and white, or vice versa, leading to the conclusion that the surface property color is part of the memory representation. Experiments 4 and 5 exclude the possibility that the superior recognition memory for colored images results solely from attentional factors or saliency. Finally, the recognition memory advantage disappears for falsely colored images of natural scenes: The improvement in recognition memory depends on the color congruence of presented images with learned knowledge about the color gamut found within natural scenes. The results can be accounted for within a multiple memory systems framework.

Practical experience has shown that in order to obtain the best possible performance, prior knowledge about invariances of a classification
problem at hand ought to be incorporated into the training procedure. We describe and review all known methods for doing so in support vector machines,
provide experimental results, and discuss their respective merits. One of the significant new results reported in this work is our recent achievement of the
lowest reported test error on the well-known MNIST digit recognition benchmark task, with SVM training times that are also significantly faster than
previous SVM methods.

Model selection is an important ingredient of many machine
learning algorithms, in particular when the sample size in
small, in order to strike the right trade-off between overfitting
and underfitting. Previous classical results for linear regression
are based on an asymptotic analysis. We present a new
penalization method for performing model selection for
regression that is appropriate even for small samples.
Our penalization is based on an accurate estimator of the
ratio of the expected training error and the expected
generalization error, in terms of the expected eigenvalues
of the input covariance matrix.

The detectability of contrast increments was measured as a function of the contrast of a masking or pedestal grating at a number of different spatial frequencies ranging from 2 to 16 cycles per degree of visual angle. The pedestal grating always had the same orientation, spatial frequency and phase as the signal. The shape of the contrast increment threshold versus pedestal contrast (TvC) functions depend of the performance level used to define the threshold, but when both axes are normalized by the contrast corresponding to 75% correct detection at each frequency, the (TvC) functions at a given performance level are identical. Confidence intervals on the slope of the rising part of the TvC functions are so wide that it is not possible with our data to reject Webers Law.

We introduce new concentration inequalities for functions on product spaces.
They allow to obtain a Bennett type deviation bound for suprema of
empirical processes indexed by upper bounded functions.
The result is an improvement on Rio's version \cite{Rio01b} of Talagrand's
inequality \cite{Talagrand96} for equidistributed variables.

We describe in this article a new code for evolving
axisymmetric isolated systems in general relativity. Such systems are described by asymptotically flat space-times, which have the property that they admit a conformal extension. We are working directly in the extended conformal manifold and solve numerically Friedrich's conformal field equations, which state that Einstein's equations hold in the physical space-time. Because of the compactness of the conformal space-time the entire space-time can be calculated on a finite numerical grid. We describe in detail the numerical scheme, especially the treatment of the axisymmetry and the boundary.

We define notions of stability for learning algorithms
and show
how to use these notions to derive generalization error bounds
based on the empirical error and the leave-one-out error. The
methods we use can be applied in the regression framework as well
as in the classification one when the classifier is obtained by
thresholding a real-valued function. We study the stability
properties of large classes of learning algorithms such as
regularization based algorithms. In particular we focus on Hilbert
space regularization and Kullback-Leibler regularization. We
demonstrate how to apply the results to SVM for regression and
classification.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems