Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture models provide a non-parametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model based clustering methods have been to short time series data. In this paper we present a case study of the application of non-parametric Bayesian clustering methods to the clustering of high-dimensional non-time series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a Dirichlet process mixture model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one
of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data.

A basic task of information processing is information transfer (flow).
P0
Here we study a pair of Brownian particles each coupled to a thermal bath
at temperatures T1 and T2 . The information flow in such a system is defined
via the time-shifted mutual information. The information flow nullifies at
equilibrium, and its efficiency is defined as the ratio of the flow to the total
entropy production in the system. For a stationary state the information flows
from higher to lower temperatures, and its efficiency is bounded from above by
(max[T1 , T2 ])/(|T1 &amp;amp;amp;amp;amp;#8722; T2 |). This upper bound is imposed by the second law and
it quantifies the thermodynamic cost for information flow in the present class
of systems. It can be reached in the adiabatic situation, where the particles
have widely different characteristic times. The efficiency of heat flowdefined
as the heat flow over the total amount of dissipated heatis limited from above
by the same factor. There is a complementarity between heat and information
flow: the set-up which is most efficient for the former is the least efficient for the
latter and vice versa. The above bound for the efficiency can be (transiently)
overcome in certain non-stationary situations, but the efficiency is still limited
from above. We study yet another measure of information processing (transfer
entropy) proposed in the literature. Though this measure does not require any
thermodynamic cost, the information flow and transfer entropy are shown to be
intimately related for stationary states.

Kernel methods are among the most successful tools in machine learning and are used in challenging data analysis problems in many disciplines. Here we provide examples where kernel methods have proven to be powerful tools for analyzing behavioral data, especially for identifying features in categorization experiments. We also demonstrate that kernel methods relate to perceptrons and exemplar models of categorization. Hence, we argue that kernel methods have neural and psychological plausibility, and theoretical results concerning their behavior are therefore potentially relevant for human category learning. In particular, we believe kernel methods have the potential to provide explanations ranging from the implementational via the algorithmic to the computational level.

Creating autonomous robots that can learn to act in unpredictable environments has been a long-standing goal of robotics, artificial intelligence, and the cognitive sciences. In contrast, current commercially available industrial and service robots mostly execute fixed tasks and exhibit little adaptability. To bridge this gap, machine learning offers a myriad set of methods, some of which have already been applied with great success to robotics problems. As a result, there is an increasing interest in machine learning and statistics within the robotics community. At the same time, there has been a growth in the learning community in using robots as motivating applications for new algorithms and formalisms. Considerable evidence of this exists in the use of learning in high-profile competitions such as RoboCup and the Defense Advanced Research Projects Agency (DARPA) challenges, and the growing number of research programs funded by governments around the world.

Foundations and Trends in Computer Graphics and Vision, 4(3):193-285, September 2009 (article)

Abstract

Over the last years, kernel methods have established themselves as powerful tools for computer vision researchers as well as for practitioners. In this tutorial, we give an introduction to kernel methods in computer vision from a geometric perspective, introducing not only the ubiquitous support vector machines, but also less known techniques for regression, dimensionality reduction, outlier detection and clustering. Additionally, we give an outlook on very recent, non-classical techniques for the prediction of structure data, for the estimation of statistical dependency and for learning the kernel function itself. All methods are illustrated with examples of successful application from the recent computer vision research literature.

Recent approaches to independent component analysis (ICA) have used kernel independence measures to obtain highly accurate solutions, particularly where classical methods experience difficulty (for instance, sources with near-zero kurtosis). FastKICA (fast HSIC-based kernel ICA) is a new optimization method for one such kernel independence measure, the Hilbert-Schmidt Independence Criterion (HSIC). The high computational efficiency of this approach is achieved by combining geometric optimization techniques, specifically an approximate Newton-like method on the orthogonal group, with accurate estimates of the gradient and Hessian based on an incomplete Cholesky decomposition. In contrast to other efficient kernel-based ICA algorithms, FastKICA is applicable to any twice differentiable kernel function. Experimental results for problems with large numbers of sources and observations indicate that FastKICA provides more accurate solutions at a given cost than gradient descent on HSIC. Comparing with other recently published ICA methods, FastKICA is competitive in terms of accuracy, relatively insensitive to local minima when initialized far from independence, and more robust towards outliers. An analysis of the local convergence properties of FastKICA is provided.

Many motor skills in humanoid robotics can be learned using parametrized motor primitives from demonstrations. However, most interesting motor learning problems require self-improvement often beyond the reach of current reinforcement learning methods due to the high dimensionality of the state-space. We develop an EM-inspired algorithm applicable to complex motor learning tasks. We compare this algorithm to several well-known parametrized policy search methods and show that it outperforms them. We apply it to motor learning problems and show that it can learn the complex Ball-in-a-Cup task using a real Barrett WAM robot arm.

The pedestal effect is the improvement in the detectability of a sinusoidal grating in the presence of another grating of the same orientation, spatial frequency, and phase—usually called the pedestal. Recent evidence has demonstrated that the pedestal effect is differently modified by spectrally flat and notch-filtered noise: The pedestal effect is reduced in flat noise but virtually disappears in the presence of notched noise (G. B. Henning & F. A. Wichmann, 2007). Here we consider a network consisting of units whose contrast response functions resemble those of the cortical cells believed to underlie human pattern vision and demonstrate that, when the outputs of multiple units are combined by simple weighted summation—a heuristic decision rule that resembles optimal information combination and produces a contrast-dependent weighting profile—the network produces contrast-discrimination data consistent with psychophysical observations: The pedestal effect is present without noise, reduced in broadband noise, but almost disappears in notched noise. These findings follow naturally from the normalization model of simple cells in primary visual cortex, followed by response-based pooling, and suggest that in processing even low-contrast sinusoidal gratings, the visual system may combine information across neurons tuned to different spatial frequencies and orientations.

Journal for General Philosophy of Science, 40(1):51-58, July 2009 (article)

Abstract

We compare Karl Poppers ideas concerning the falsifiability of a theory with similar notions from the part of statistical learning theory known as VC-theory. Poppers notion of the dimension of a theory is contrasted with the apparently very similar VC-dimension. Having located some divergences, we discuss how best to view Poppers work from the perspective of statistical learning theory, either as a precursor or as aiming to capture a different learning activity.

Three simple and explicit procedures for testing the independence of two multi-dimensional random
variables are described. Two of the associated test statistics (L1, log-likelihood) are defined when the empirical
distribution of the variables is restricted to finite partitions. A third test statistic is defined as a kernel-based
independence measure. Two kinds of tests are provided. Distribution-free strong consistent tests are derived on the
basis of large deviation bounds on the test statistcs: these tests make almost surely no Type I or Type II error after
a random sample size. Asymptotically alpha-level tests are obtained from the limiting distribution of the test statistics.
For the latter tests, the Type I error converges to a fixed non-zero value alpha, and the Type II error drops to zero, for
increasing sample size. All tests reject the null hypothesis of independence if the test statistics become large. The
performance of the tests is evaluated experimentally on benchmark data.

We present a geometric method to determine confidence sets for the
ratio E(Y)/E(X) of the means of random variables X and Y. This
method reduces the problem of constructing confidence sets for the
ratio of two random variables to the problem of constructing
confidence sets for the means of one-dimensional random variables. It
is valid in a large variety of circumstances. In the case of normally
distributed random variables, the so constructed confidence sets
coincide with the standard Fieller confidence sets. Generalizations of
our construction lead to definitions of exact and conservative
confidence sets for very general classes of distributions, provided
the joint expectation of (X,Y) exists and the linear combinations of
the form aX + bY are well-behaved. Finally, our geometric method
allows to derive a very simple bootstrap approach for constructing
conservative confidence sets for ratios which perform favorably in
certain situations, in particular in the asymmetric heavy-tailed
regime.

Kernel Canonical Correlation Analysis is a very general technique for subspace learning that incorporates
PCA and LDA as special cases. Functional magnetic resonance imaging (fMRI) acquired data is naturally
amenable to these techniques as data are well aligned. fMRI data of the human brain is a particularly interesting
candidate. In this study we implemented various supervised and semi-supervised versions of KCCA on human
fMRI data, with regression to single- and multi-variate labels (corresponding to video content subjects viewed
during the image acquisition). In each variate condition, the semi-supervised variants of KCCA performed better
than the supervised variants, including a supervised variant with Laplacian regularization. We additionally analyze
the weights learned by the regression in order to infer brain regions that are important to different types of visual
processing.

The human visual system is foveated, that is, outside the central visual field resolution and acuity drop rapidly. Nonetheless much of a visual scene is perceived after only a few saccadic eye movements, suggesting an effective strategy for selecting saccade targets. It has been known for some time that local image structure at saccade targets influences the selection process. However, the question of what the most relevant visual features are is still under debate. Here we show that center-surround patterns emerge as the optimal solution for predicting saccade targets from their local image structure. The resulting model, a one-layer feed-forward network, is surprisingly simple compared to previously suggested models which assume much more complex computations such as multi-scale processing and multiple feature channels. Nevertheless, our model is equally predictive. Furthermore, our findings are consistent with neurophysiological hardware in the superior colliculus. Bottom-up visual saliency may thus not be computed cortically as has been thought previously.

The ambiguous restraint for iterative assignment (ARIA) approach for NMR structure calculation is evaluated for symmetric homodimeric proteins by assessing the effect of several data analysis and assignment methods on the structure quality. In particular, we study the effects of network anchoring and spin-diffusion correction. The spin-diffusion correction improves the protein structure quality systematically, whereas network anchoring enhances the assignment efficiency by speeding up the convergence and coping with highly ambiguous data. For some homodimeric folds, network anchoring has been proved essential for unraveling both chain and proton assignment ambiguities.

Spatial filtering (SF) constitutes an integral part of building EEG-based braincomputer interfaces (BCIs). Algorithms frequently used for SF, such as common spatial patterns (CSPs) and independent component analysis, require labeled training data for identifying filters that provide information on a subject‘s intention, which renders these algorithms susceptible to overfitting on artifactual EEG components. In this study, beamforming is employed to construct spatial filters that extract EEG sources originating within predefined regions of interest within the brain. In this way, neurophysiological knowledge on which brain regions are relevant for a certain experimental paradigm can be utilized to construct unsupervised spatial filters that are robust against artifactual EEG components. Beamforming is experimentally compared with CSP and Laplacian spatial filtering (LP) in a two-class motor-imagery paradigm. It is demonstrated that beamforming outperforms CSP and LP on noisy datasets, while CSP and beamforming perform almost equally well on datasets with few artifactual trials. It is concluded that beamforming constitutes an alternative method for SF that might be particularly useful for BCIs used in clinical settings, i.e., in an environment where artifact-free datasets are difficult to obtain.

In this brief, a novel method that constructs a sparse kernel machine is proposed. The proposed method generates attractors as sparse solutions from a built-in kernel machine via a dynamical system framework. By readjusting the corresponding coefficients and bias terms, a sparse kernel machine that approximates a conventional kernel machine is constructed. The simulation results show that the constructed sparse kernel machine improves the efficiency of testing phase while maintaining comparable test error.

We study clustering algorithms based on neighborhood graphs on a random sample of data points. The question we ask is how such a graph should be constructed in order to obtain optimal clustering results. Which type of neighborhood graph should one choose, mutual k-nearest-neighbor or symmetric k-nearest-neighbor? What is the optimal parameter k? In our setting, clusters are defined as connected components of the t-level set of the underlying probability distribution. Clusters are said to be identified in the neighborhood graph if connected components in the graph correspond to the true underlying clusters. Using techniques from random geometric graph theory, we prove bounds on the probability that clusters are identified successfully, both in a noise-free and in a noisy setting. Those bounds lead to several conclusions. First, k has to be chosen surprisingly high (rather of the order n than of the order logn) to maximize the probability of cluster identification. Secondly, the major difference between the mutual and the symmetric k-nearest-neighbor graph occurs when one attempts to detect the most significant cluster only.

We reveal the presence of refractory and overlap effects in the event-related potentials in visual P300 speller datasets, and we show their negative impact on the performance of the system. This finding has important implications for how to encode the letters that can be selected for communication. However, we show that such effects are dependent on stimulus parameters: an alternative stimulus type based on apparent motion suffers less from the refractory effects and leads to an improved letter prediction performance.

Clustering is often formulated as a discrete optimization problem. The objective is to
find, among all partitions of the data set, the best one according to some quality measure.
However, in the statistical setting where we assume that the finite data set has been sampled
from some underlying space, the goal is not to find the best partition of the given
sample, but to approximate the true partition of the underlying space. We argue that the
discrete optimization approach usually does not achieve this goal, and instead can lead to
inconsistency. We construct examples which provably have this behavior. As in the case
of supervised learning, the cure is to restrict the size of the function classes under consideration.
For appropriate small function classes we can prove very general consistency
theorems for clustering optimization schemes. As one particular algorithm for clustering
with a restricted function space we introduce nearest neighbor clustering. Similar to the
k-nearest neighbor classifier in supervised learning, this algorithm can be seen as a general
baseline algorithm to minimize arbitrary clustering objective functions. We prove that it
is statistically consistent for all commonly used clustering objective functions.

In bioinformatics, there exist multiple descriptions of graphs for the same set of genes or proteins. For instance, in yeast systems, graph edges can represent different relationships such as proteinprotein interactions, genetic interactions, or co-participation in a protein complex, etc. Relying on similarities between nodes, each graph can be used independently for prediction of protein function. However, since different graphs contain partly independent and partly complementary information about the problem at hand, one can enhance the total information extracted by combining all graphs. In this paper, we propose a method for integrating multiple graphs within a framework of semi-supervised learning. The method alternates between minimizing the objective function with respect to network output and with respect to combining weights. We apply the method to the task of protein functional class prediction in yeast. The proposed method performs significantly better than the same algorithm trained on any singl
e graph.

Reinforcement learning (RL) and optimal control of systems with contin-
uous states and actions require approximation techniques in most interesting
cases. In this article, we introduce Gaussian process dynamic programming
(GPDP), an approximate value-function based RL algorithm. We consider
both a classic optimal control problem, where problem-specific prior knowl-
edge is available, and a classic RL problem, where only very general priors
can be used. For the classic optimal control problem, GPDP models the
unknown value functions with Gaussian processes and generalizes dynamic
programming to continuous-valued states and actions. For the RL problem,
GPDP starts from a given initial state and explores the state space using
Bayesian active learning. To design a fast learner, available data has to be
used efficiently. Hence, we propose to learn probabilistic models of the a
priori unknown transition dynamics and the value functions on the fly. In
both cases, we successfully apply the resulting continuous-valued controllers
to the under-actuated pendulum swing up and analyze the performances of
the suggested algorithms. It turns out that GPDP uses data very efficiently
and can be applied to problems, where classic dynamic programming would
be cumbersome.

European Journal of Nuclear Medicine and Molecular Imaging, 36(Supplement 1):93-104, March 2009 (article)

Abstract

Introduction Positron emission tomography (PET) is a fully quantitative technology for imaging metabolic pathways and dynamic processes in vivo. Attenuation correction of raw PET data is a prerequisite for quantification and is typically based on separate transmission measurements. In PET/CT attenuation correction, however, is performed routinely based on the available CT transmission data.
Objective Recently, combined PET/magnetic resonance (MR) has been proposed as a viable alternative to PET/CT. Current concepts of PET/MRI do not include CT-like transmission sources and, therefore, alternative methods of PET attenuation correction must be found. This article reviews existing approaches to MR-based attenuation correction (MR-AC). Most groups have proposed MR-AC algorithms for brain PET studies and more recently also for torso PET/MR imaging. Most MR-AC strategies require the use of complementary MR and transmission images, or morphology templates generated from transmission images. We review and discuss these algorithms and point out challenges for using MR-AC in clinical routine.
Discussion MR-AC is work-in-progress with potentially promising results from a template-based approach applicable to both brain and torso imaging. While efforts are ongoing in making clinically viable MR-AC fully automatic, further studies are required to realize the potential benefits of MR-based motion compensation and partial volume correction of the PET data.

Spike trains recorded from populations of neurons can exhibit substantial pairwise correlations between neurons and rich temporal structure. Thus, for the realistic simulation and analysis of neural systems, it is essential to have efficient methods for generating artificial spike trains with specified correlation structure. Here we show how correlated binary spike trains can be simulated by means of a latent multivariate gaussian model. Sampling from the model is computationally very efficient and, in particular, feasible even for large populations of neurons. The entropy of the model is close to the theoretical maximum for a wide range of parameters. In addition, this framework naturally extends to correlations over time and offers an elegant way to model correlated neural spike counts with arbitrary marginal distributions.

Background: Treatment of neurodegenerative diseases is likely to be most beneficial in the very early, possibly preclinical stages of degeneration. We explored the usefulness of fully automatic structural MRI classification methods for detecting subtle degenerative change. The availability of a definitive genetic test for Huntington disease (HD) provides an excellent metric for judging the performance of such methods in gene mutation carriers who are free of symptoms.
Methods: Using the gray matter segment of MRI scans, this study explored the usefulness of a multivariate support vector machine to automatically identify presymptomatic HD gene mutation carriers (PSCs) in the absence of any a priori information. A multicenter data set of 96 PSCs and 95 age- and sex-matched controls was studied. The PSC group was subclassified into three groups based on time from predicted clinical onset, an estimate that is a function of DNA mutation size and age.
Results: Subjects with at least a 33% chance of developing unequivocal signs of HD in 5 years were correctly assigned to the PSC group 69% of the time. Accuracy improved to 83% when regions affected by the disease were selected a priori for analysis. Performance was at chance when the probability of developing symptoms in 5 years was less than 10%.
Conclusions: Presymptomatic Huntington disease gene mutation carriers close to estimated diagnostic onset were successfully separated from controls on the basis of single anatomic scans, without additional a priori information. Prior information is required to allow separation when degenerative changes are either subtle or variable.

Motivation: Modern systems biology aims at understanding how the different molecular components of a biological cell interact. Often, cellular functions are performed by complexes consisting of many different proteins. The composition of these complexes may change according to the cellular environment, and one protein may be involved in several different processes. The automatic discovery of functional complexes from protein interaction data is challenging. While previous approaches use approximations to extract dense modules, our approach exactly solves the problem of dense module enumeration. Furthermore, constraints from additional information sources such as gene expression and phenotype data can be integrated, so we can systematically mine for dense modules with interesting profiles.
Results: Given a weighted protein interaction network, our method discovers all protein sets that satisfy a user-defined minimum density threshold. We employ a reverse search strategy, which allows us to exploit the density criterion in an efficient way. Our experiments show that the novel approach is feasible and produces biologically meaningful results. In comparative validation studies using yeast data, the method achieved the best overall prediction performance with respect to confirmed complexes. Moreover, by enhancing the yeast network with phenotypic and phylogenetic profiles and the human network with tissue-specific expression data, we identified condition-dependent complex variants.

We shed light on the discrimination between patterns belonging to two different classes by casting this decoding problem into a generalized prototype framework. The discrimination process is then separated into two stages: a projection stage that reduces the dimensionality of the data by projecting it on a line and a threshold stage where the distributions of the projected patterns of both classes are separated. For this, we extend the popular mean-of-class prototype classification using algorithms from machine learning that satisfy a set of invariance properties. We report a simple yet general approach to express different types of linear classification algorithms in an identical and easy-to-visualize formal framework using generalized prototypes where these prototypes are used to express the normal vector and offset of the hyperplane. We investigate nonmargin classifiers such as the classical prototype classifier, the Fisher classifier, and the relevance vector machine. We then study hard and soft margin cl
assifiers such as the support vector machine and a boosted version of the prototype classifier. Subsequently, we relate mean-of-class prototype classification to other classification algorithms by showing that the prototype classifier is a limit of any soft margin classifier and that boosting a prototype classifier yields the support vector machine. While giving novel insights into classification per se by presenting a common and unified formalism, our generalized prototype framework also provides an efficient visualization and a principled comparison of machine learning classification.

The DICS database is a dynamic web repository of computationally predicted functional modules from the human proteinprotein interaction network. It provides references to the CORUM, DrugBank, KEGG and Reactome pathway databases. DICS can be accessed for retrieving sets of overlapping modules and protein complexes that are significantly enriched in a gene list, thereby providing valuable information about the functional context.

We present a highly accurate gene-prediction system for eukaryotic genomes, called mGene. It combines in an unprecedented manner the flexibility of generalized hidden Markov models (gHMMs) with the predictive power of modern machine learning methods, such as Support Vector Machines (SVMs). Its excellent performance was proved in an objective competition based on the genome of the nematode Caenorhabditis elegans. Considering the average of sensitivity and specificity, the developmental version of mGene exhibited the best prediction performance on nucleotide, exon, and transcript level for ab initio and multiple-genome gene-prediction tasks. The fully developed version shows superior performance in 10 out of 12 evaluation criteria compared with the other participating gene finders, including Fgenesh++ and Augustus. An in-depth analysis of mGene's genome-wide predictions revealed that approximately 2200 predicted genes were not contained in the current genome annotation. Testing a subset of 57 of these genes by RT-PCR and sequencing, we confirmed expression for 24 (42%) of them. mGene missed 300 annotated genes, out of which 205 were unconfirmed. RT-PCR testing of 24 of these genes resulted in a success rate of merely 8%. These findings suggest that even the gene catalog of a well-studied organism such as C. elegans can be substantially improved by mGene's predictions. We also provide gene predictions for the four nematodes C. briggsae, C. brenneri, C. japonica, and C. remanei. Comparing the resulting proteomes among these organisms and to the known protein universe, we identified many species-specific gene inventions. In a quality assessment of several available annotations for these genomes, we find that mGene's predictions are most accurate.

The proteasome forms the core of the protein quality control system in archaea and eukaryotes and also occurs in one bacterial lineage, the Actinobacteria. Access to its proteolytic compartment is controlled by AAA ATPases, whose N-terminal domains (N domains) are thought to mediate substrate recognition. The N domains of an archaeal proteasomal ATPase, Archaeoglobus fulgidus PAN, and of its actinobacterial homolog, Rhodococcus erythropolis ARC, form hexameric rings, whose subunits consist of an N-terminal coiled coil and a C-terminal OB domain. In ARC-N, the OB domains are duplicated and form separate rings. PAN-N and ARC-N can act as chaperones, preventing the aggregation of heterologous proteins in vitro, and this activity is preserved in various chimeras, even when these include coiled coils and OB domains from unrelated proteins. The structures suggest a molecular mechanism for substrate processing based on concerted radial motions of the coiled coils relative to the OB rings.

For simple visual patterns under the experimenter's control we impose which information, or features, an observer can use to solve a given perceptual task. For natural vision tasks, however, there are typically a multitude of potential features in a given visual scene which the visual system may be exploiting when analyzing it: edges, corners, contours, etc. Here we describe a novel non-linear system identification technique based on modern machine learning methods that allows the critical features an observer uses to be inferred directly from the observer's data. The method neither requires stimuli to be embedded in noise nor is it limited to linear perceptive fields (classification images). We demonstrate our technique by deriving the critical image features observers fixate in natural scenes (bottom-up visual saliency). Unlike previous studies where the relevant structure is determined manuallyâ€”e.g. by selecting Gabors as visual filtersâ€”we do not make any assumptions in this regard, but numerically infer number and properties them from the eye-movement data. We show that center-surround patterns emerge as the optimal solution for predicting saccade targets from local image structure. The resulting model, a one-layer feed-forward network with contrast gain-control, is surprisingly simple compared to previously suggested saliency models. Nevertheless, our model is equally predictive. Furthermore, our findings are consistent with neurophysiological hardware in the superior colliculus. Bottom-up visual saliency may thus not be computed cortically as has been thought previously.

We describe mGene.web, a web service for the genome-wide prediction of protein coding genes from eukaryotic DNA sequences. It offers pre-trained models for the recognition of gene structures including untranslated regions in an increasing number of organisms. With mGene.web, users have the additional possibility to train the system with their own data for other organisms on the push of a button, a functionality that will greatly accelerate the annotation of newly sequenced genomes. The system is built in a highly modular way, such that individual components of the framework, like the promoter prediction tool or the splice site predictor, can be used autonomously. The underlying gene finding system mGene is based on discriminative machine learning techniques and its high accuracy has been demonstrated in an international competition on nematode genomes. mGene.web is available at http://www.mgene.org/web, it is free of charge and can be used for eukaryotic genomes of small to moderate size (several hundred Mbp).

2007

In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

The abilities to learn and to categorize are fundamental for cognitive systems, be it animals or machines, and therefore have attracted attention from engineers and psychologists alike. Modern machine learning methods and psychological models of categorization are remarkably similar, partly because these two fields share a common history in artificial neural networks and reinforcement learning. However, machine learning is now an independent and mature field that has moved beyond psychologically or neurally inspired algorithms towards providing foundations for a theory of learning that is rooted in statistics and functional analysis. Much of this research is potentially interesting for psychological theories of learning and categorization but also hardly accessible for psychologists. Here, we provide a tutorial introduction to a popular class of machine learning tools, called kernel methods. These methods are closely related to perceptrons, radial-basis-function neural networks and exemplar theories of catego
rization. Recent theoretical advances in machine learning are closely tied to the idea that the similarity of patterns can be encapsulated in a positive definite kernel. Such a positive definite kernel can define a reproducing kernel Hilbert space which allows one to use powerful tools from functional analysis for the analysis of learning algorithms. We give basic explanations of some key conceptsthe so-called kernel trick, the representer theorem and regularizationwhich may open up the possibility that insights from machine learning can feed back into psychology.

Background: For splice site recognition, one has to solve two classification problems:
discriminating true from decoy splice sites for both acceptor and donor sites. Gene finding systems
typically rely on Markov Chains to solve these tasks.
Results: In this work we consider Support Vector Machines for splice site recognition. We employ
the so-called weighted degree kernel which turns out well suited for this task, as we will illustrate in
several experiments where we compare its prediction accuracy with that of recently proposed
systems. We apply our method to the genome-wide recognition of splice sites in Caenorhabditis
elegans, Drosophila melanogaster, Arabidopsis thaliana, Danio rerio, and Homo sapiens. Our
performance estimates indicate that splice sites can be recognized very accurately in these genomes
and that our method outperforms many other methods including Markov Chains, GeneSplicer and
SpliceMachine. We provide genome-wide predictions of splice sites and a stand-alone prediction
tool ready to be used for incorporation in a gene finder.
Availability: Data, splits, additional information on the model selection, the whole genome
predictions, as well as the stand-alone prediction tool are available for download at http://
www.fml.mpg.de/raetsch/projects/splice.

Recently, Udwadia (Proc. R. Soc. Lond. A 2003:17831800, 2003) suggested to derive tracking controllers for mechanical systems with redundant degrees-of-freedom (DOFs) using a generalization of Gauss principle of least constraint. This method allows reformulating control problems as a special class of optimal controllers. In this paper, we take this line of reasoning one step further and demonstrate that several well-known and also novel nonlinear robot control laws can be derived from this generic methodology. We show experimental verifications on a Sarcos Master Arm robot for some of the derived controllers. The suggested approach offers a promising unification and simplification of nonlinear control law design for robots obeying rigid body dynamics equations, both with or without external constraints, with over-actuation or underactuation, as well as open-chain and closed-chain kinematics.

Open source tools have recently reached a level of maturity which makes them suitable for building large-scale real-world systems. At the same time, the field of machine learning has developed a large body of powerful learning algorithms for diverse applications. However, the true potential of these methods is not realized, since existing implementations are not openly shared, resulting in software with low usability, and weak interoperability. We argue that this situation can be significantly improved by increasing incentives for researchers to publish their software under an open source model. Additionally, we outline the problems authors are faced with when trying to publish algorithmic implementations of machine learning methods. We believe that a resource of peer reviewed software accompanied by short articles would be highly valuable to both the machine learning and the general scientific community.

Journal of the Optical Society of America A, 24(10):3233-3241, October 2007 (article)

Abstract

There are 8 cycle / deg ripples or oscillations in performance as a function of location near Mach bands in experiments measuring Mach bands masking effects on random polarity signal bars. The oscillations with increments are 180 degrees out of phase with those for decrements. The oscillations, much larger than the measurement error, appear to relate to the weighting function of the spatial-frequency-tuned channel detecting the broad-
band signals. The ripples disappear with step maskers and become much smaller at durations below 25 ms, implying either that the site of masking has changed or that the weighting function and hence spatial-frequency tuning is slow to develop.

Human immunodeficiency virus type 1 (HIV-1) evolves in human body,
and its exposure to a drug often causes mutations that enhance
the resistance against the drug.
To design an effective pharmacotherapy for an individual patient,
it is important to accurately predict the drug resistance
based on genotype data.
Notably, the resistance is not just
the simple sum of the effects of all mutations.
Structural biological studies suggest that
the association of mutations is crucial:
Even if mutations A or B alone do not affect the resistance,
a significant change might happen
when the two mutations occur together.
Linear regression methods cannot take the associations into account,
while decision tree methods can reveal only limited associations.
Kernel methods and neural networks implicitly use all possible
associations for prediction, but cannot select salient associations
explicitly.
Our method, itemset boosting, performs linear regression
in the complete space of power sets of mutations.
It implements a forward feature selection procedure where,
in each iteration, one mutation combination is
found by an efficient branch-and-bound search.
This method uses all possible combinations,
and salient associations are explicitly shown.
In experiments, our method worked particularly well for predicting the
resistance of nucleotide reverse transcriptase inhibitors
(NRTIs). Furthermore, it successfully recovered many mutation
associations known in biological literature.

Abstract. This paper considers kernels invariant to translation, rotation and dilation. We show that no non-trivial
positive definite (p.d.) kernels exist which are radial and dilation invariant, only conditionally positive definite
(c.p.d.) ones. Accordingly, we discuss the c.p.d. case and provide some novel analysis, including an elementary
derivation of a c.p.d. representer theorem. On the practical side, we give a support vector machine (s.v.m.) algorithm
for arbitrary c.p.d. kernels. For the thin-plate kernel this leads to a classifier with only one parameter (the
amount of regularisation), which we demonstrate to be as effective as an s.v.m. with the Gaussian kernel, even
though the Gaussian involves a second parameter (the length scale).

(TR-07-47), University of Texas, Austin, TX, USA, September 2007 (techreport)

Abstract

Several important machine learning problems can be modeled and solved via semidefinite programs. Often, researchers invoke off-the-shelf software for the associated optimization, which can be inappropriate for many applications due to computational and storage requirements. In this paper, we introduce the use of convex perturbations for semidefinite programs (SDPs). Using a particular perturbation function, we arrive
at an algorithm for SDPs that has several advantages over existing techniques: a) it is simple, requiring only a few lines of MATLAB, b) it is a first-order method which makes it scalable, c) it can easily exploit the structure of a particular SDP to gain efficiency (e.g., when the constraint matrices are low-rank). We demonstrate on several machine learning applications that the proposed algorithm is effective in finding fast approximations to large-scale SDPs.

Electrophysiological signals of the developing fetal brain and heart can be investigated by fetal magnetoencephalography (fMEG). During such investigations, the fetal heart activity and that of the mother should be monitored continuously to provide an important indication of current well-being. Due to physical constraints of an fMEG system, it is not possible to use clinically established heart monitors for this purpose. Considering this constraint, we developed a real-time heart monitoring system for biomagnetic measurements and showed its reliability and applicability in research and for clinical examinations. The developed system consists of real-time access to fMEG data, an algorithm based on Independent Component Analysis (ICA), and a graphical user interface (GUI). The algorithm extracts the current fetal and maternal heart signal from a noisy and artifact-contaminated data stream in real-time and is able to adapt automatically to continuously varying environmental parameters. This algorithm has been na
med Adaptive Real-time ICA (ARICA) and is applicable to real-time artifact removal as well as to related blind signal separation problems.

Most existing sparse Gaussian process (g.p.) models seek computational advantages by basing their
computations on a set of m basis functions that are the covariance function of the g.p. with one of its two inputs
fixed. We generalise this for the case of Gaussian covariance function, by basing our computations on m Gaussian
basis functions with arbitrary diagonal covariance matrices (or length scales). For a fixed number of basis
functions and any given criteria, this additional flexibility permits approximations no worse and typically better
than was previously possible. Although we focus on g.p. regression, the central idea is applicable to all kernel
based algorithms, such as the support vector machine. We perform gradient based optimisation of the marginal
likelihood, which costs O(m2n) time where n is the number of data points, and compare the method to various
other sparse g.p. methods. Our approach outperforms the other methods, particularly for the case of very few basis
functions, i.e. a very high sparsity ratio.

Recent years have seen huge advances in object recognition from images. Recognition rates beyond 95% are the rule rather than the exception on many datasets. However, most state-of-the-art methods can only decide if an object is present or not. They are not able to provide information on the object location or extent within in the image.
We report on a simple yet powerful scheme that extends many existing recognition methods to also perform localization of object bounding boxes. This is achieved by maximizing the classification score over all possible subrectangles in the image. Despite the impression that this would be computationally intractable, we show that in many situations efficient algorithms exist which solve a generalized maximum subrectangle problem.
We show how our method is applicable to a variety object detection frameworks and demonstrate its performance by applying it to the popular bag of visual words model, achieving competitive results on the PASCAL VOC 2006 dataset.

The final properties of sophisticated products can
be affected by many unapparent dependencies within the manufacturing
process, and the products integrity can often only be
checked in a final measurement. Troubleshooting can therefore
be very tedious if not impossible in large assembly lines.
In this paper we show that Feature Selection is an efficient tool for
serial-grouped lines to reveal causes for irregularities in product
attributes. We compare the performance of several methods for
Feature Selection on real-world problems in mass-production of
semiconductor devices.
Note to Practitioners We present a data based procedure
to localize flaws in large production lines: using the results of
final quality inspections and information about which machines
processed which batches, we are able to identify machines which
cause low yield.

Motivation: Identifying significant genes among thousands of sequences on a microarray is a central challenge for cancer research in bioinformatics. The ultimate goal is to detect the genes that are involved in disease outbreak and progression. A multitude of methods have been proposed for this task of feature selection, yet the selected gene lists differ greatly between different methods. To accomplish biologically meaningful gene selection from microarray data, we have to understand the theoretical connections and the differences between these methods. In this article, we define a kernel-based framework for feature selection based on the Hilbert–Schmidt independence criterion and backward elimination, called BAHSIC. We show that several well-known feature selectors are instances of BAHSIC, thereby clarifying their relationship. Furthermore, by choosing a different kernel, BAHSIC allows us to easily define novel feature selection algorithms. As a further advantage, feature selection via BAHSIC works directly on multiclass problems.
Results: In a broad experimental evaluation, the members of the BAHSIC family reach high levels of accuracy and robustness when compared to other feature selection techniques. Experiments show that features selected with a linear kernel provide the best classification performance in general, but if strong non-linearities are present in the data then non-linear kernels can be more suitable.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems