2009

Mini-Symposia on Assistive Machine Learning for People with Disabilities at NIPS (AMD), December 2009 (talk)

Abstract

Brain-computer interfaces (BCI) aim to be the ultimate in assistive technology: decoding a user&lsquo;s intentions directly from brain signals without involving any muscles or peripheral nerves. Thus, some classes of BCI potentially offer hope for users with even the most extreme cases of paralysis, such as in late-stage Amyotrophic Lateral Sclerosis, where nothing else currently allows communication of any kind. Other lines in BCI research aim to restore lost motor function in as natural a way as possible, reconnecting and in some cases re-training motor-cortical areas to control prosthetic, or previously paretic, limbs. Research and development are progressing on both invasive and non-invasive fronts, although BCI has yet to make a breakthrough to widespread clinical application.
The high-noise high-dimensional nature of brain-signals, particularly in non-invasive approaches and in patient populations, make robust decoding techniques a necessity. Generally, the approach has been to use relatively simple feature extraction techniques, such as template matching and band-power estimation, coupled to simple linear classifiers. This has led to a prevailing view among applied BCI researchers that (sophisticated) machine-learning is irrelevant since "it doesn&lsquo;t matter what classifier you use once you&lsquo;ve done your preprocessing right and extracted the right features." I shall show a few examples of how this runs counter to both the empirical reality and the spirit of what needs to be done to bring BCI into clinical application. Along the way I&lsquo;ll highlight some of the interesting problems that remain open for machine-learners.

Clustering is a widely used tool for exploratory data analysis. However, the theoretical understanding of clustering is very limited. We still do not have a well-founded answer to the seemingly simple question of
"how many clusters are present in the data?", and furthermore a formal comparison of clusterings based on different optimization objectives is far beyond our abilities. The lack of good theoretical support gives rise to multiple heuristics that confuse the practitioners and stall development of the field. We suggest that the ill-posed nature of clustering problems is caused by the fact that clustering is often taken out of its subsequent application context. We argue that one does not cluster the data just for the sake of clustering it, but rather to facilitate the solution of some higher level task. By evaluation of the clustering‘s contribution to the solution of the higher level task it is possible to compare different clusterings, even those obtained by different optimization objectives. In the preceding work it was shown that such an approach can be applied to evaluation and design of co-clustering solutions. Here we suggest that this approach can be extended to other settings, where clustering is applied.

Kernel Canonical Correlation Analysis (KCCA) is a general technique for subspace learning that incorporates
principal components analysis (PCA) and Fisher linear discriminant analysis (LDA) as special
cases. By finding directions that maximize correlation, KCCA learns representations tied more closely
to underlying process generating the the data and can ignore high-variance noise directions. However,
for data where acquisition in a given modality is expensive or otherwise limited, KCCA may suffer from
small sample effects. We propose to use semi-supervised Laplacian regularization to utilize data that are
present in only one modality. This manifold learning approach is able to find highly correlated directions
that also lie along the data manifold, resulting in a more robust estimate of correlated subspaces. Functional
magnetic resonance imaging (fMRI) acquired data are naturally amenable to subspace techniques
as data are well aligned and such data of the human brain are a particularly interesting candidate. In this
study we implemented various supervised and semi-supervised versions of KCCA on human fMRI data,
with regression to single and multivariate labels (corresponding to video content subjects viewed during
the image acquisition). In each variate condition, Laplacian regularization improved performance whereas
the semi-supervised variants of KCCA yielded the best performance. We additionally analyze the weights
learned by the regression in order to infer brain regions that are important during different types of visual processing.

We propose a novel probabilistic method for detection of signals and reconstruction
of images in the presence of random noise. The method uses results from percolation
and random graph theories (see Grimmett (1999)). We address the problem of
detection and estimation of signals in situations where the signal-to-noise ratio is
particularly low.
We present an algorithm that allows to detect objects of various shapes in
noisy images. The algorithm has linear complexity and exponential accuracy. Our
algorithm substantially diers from wavelets-based algorithms (see Arias-Castro
et.al. (2005)). Moreover, we present an algorithm that produces a crude estimate
of an object based on the noisy picture. This algorithm also has linear complexity
and is appropriate for real-time systems. We prove results on consistency and algorithmic
complexity of our procedures.

The acquisition and self-improvement of novel motor skills is among the most important problems in robotics. Motor primitives offer one of the most promising frameworks for the application of machine learning techniques in this context. Employing the Dynamic Systems Motor primitives originally introduced by Ijspeert et al. (2003), appropriate learning algorithms for a concerted approach of both imitation and reinforcement learning are presented. Using these algorithms new motor skills, i.e., Ball-in-a-Cup, Ball-Paddling and Dart-Throwing, are learned.

Secondary metabolic pathway in plant is important for finding druggable candidate enzymes. However, there are many enzymes whose functions are still undiscovered especially in organism-specific metabolic pathways. We propose reaction graph kernels for automatically assigning the EC numbers to unknown enzymatic reactions in a metabolic network. Experiments are carried out on KEGG/REACTION database and our method successfully predicted the first three digits of the EC number with 83% accuracy.We also exhaustively predicted missing enzymatic functions in the plant secondary metabolism pathways, and evaluated our results in biochemical validity.

At the heart of many important bioinformatics problems, such as gene finding and function prediction, is the classification of biological sequences, above all of DNA and proteins. In many cases, the most accurate classifiers are obtained by training SVMs with complex sequence kernels, for instance for transcription starts or splice sites. However, an often criticized downside of SVMs with complex kernels is that it is very hard for humans to understand the learned decision rules and to derive biological insights from them. To close this gap, we introduce the concept of positional oligomer importance matrices (POIMs) and develop an efficient algorithm for their computation. We demonstrate how they overcome the limitations of sequence logos, and how they can be used to find relevant motifs for different biological phenomena in a straight-forward way. Note that the concept of POIMs is not limited to interpreting SVMs, but is applicable to general k&#8722;mer based scoring systems.

Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions.We propose a new class of protein sequence kernels which considers all motifs including motifs with gaps. This class of kernels allows the inclusion of pairwise amino acid distances into their computation. We utilize an extension of the multiclass support vector machine (SVM)method which directly solves protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. To automatically search over families of possible amino acid motifs, we optimize over multiple kernels at the same time. We compare our automated approach to four other predictors on three different datasets, and show that we perform better than the current state of the art. Furthermore, our method provides some insights as to which features are most useful for determining subcellular localization, which are in agreement with biological reasoning.

Invited keynote talk at the launch of BrainGain, the Dutch BCI research consortium, November 2007 (talk)

Abstract

I‘ll present a perspective on Brain-Computer Interface development from T{\"u}bingen. Some of the benefits promised by BCI technology lie in the near foreseeable future, and some further away. Our motivation is to make BCI technology feasible for the people who could benefit from what it has to offer soon: namely, people in the "completely locked-in" state. I‘ll mention some of the challenges of working with this user group, and explain the specific directions they have motivated us to take in developing experimental methods, algorithms, and software.

2nd Workshop on Machine Learning and Optimization at the ISM, October 2007 (talk)

Abstract

Many problems in unsupervised learning require the analysis of features of probability distributions. At the most fundamental level, we might wish to determine whether two distributions are the same, based on samples from each - this is known as the two-sample or homogeneity problem. We use kernel methods to address this problem, by mapping probability distributions to elements in a reproducing kernel Hilbert space (RKHS). Given a sufficiently rich RKHS, these representations are unique: thus comparing feature space representations allows us to compare distributions without ambiguity. Applications include testing whether cancer subtypes are distinguishable on the basis of DNA microarray data, and whether low frequency oscillations measured at an electrode in the cortex have a different distribution during a neural spike.
A more difficult problem is to discover whether two random variables drawn from a joint distribution are independent. It turns out that any dependence between pairs of random variables can be encoded in a cross-covariance operator between appropriate RKHS representations of the variables, and we may test independence by looking at a norm of the operator. We demonstrate this independence test by establishing dependence between an English text and its French translation, as opposed to French text on the same topic but otherwise unrelated. Finally, we show that this operator norm is itself a difference in feature means.

PET/MR combines the high soft tissue contrast of Magnetic Resonance Imaging (MRI) and the functional information of Positron Emission Tomography (PET). For quantitative PET information, correction of tissue photon attenuation is mandatory. Usually in conventional PET, the attenuation map is obtained from a transmission scan, which uses a rotating source, or from the CT scan in case of combined PET/CT. In the case of a PET/MR scanner, there is insufficient space for the rotating source and ideally one would want to calculate the attenuation map from the MR image instead. Since MR images provide information about proton density of the different tissue types, it is not trivial to use this data for PET attenuation correction. We present a method for predicting the PET attenuation map from a given the MR image, using a combination of atlas-registration and recognition of local patterns.
Using "leave one out cross validation" we show on a database of 16 MR-CT image pairs that our method reliably allows estimating the CT image from the MR image. Subsequently, as in PET/CT, the PET attenuation map can be predicted from the CT image. On an additional dataset of MR/CT/PET triplets we quantitatively validate that our approach allows PET quantification with an error that is smaller than what would be clinically significant.
We demonstrate our approach on T1-weighted human brain scans. However, the presented methods are more general and current research focuses on applying the established methods to human whole body PET/MRI applications.

My principal interest is in applying machine-learning methods to the
development of Brain-Computer Interfaces (BCI). This involves the
classification of a user‘s intentions or mental states, or regression
against some continuous intentional control signal, using brain signals
obtained for example by EEG, ECoG or MEG. The long-term aim is to
develop systems that a completely paralysed person (such as someone
suffering from advanced Amyotrophic Lateral Sclerosis) could use to
communicate.
Such systems have the potential to improve the lives of many people who
would be otherwise completely unable to communicate, but they are still
very much in the research and development stages.

Mixture of factor analysers (MFA) is a well-known model that combines the dimensionality reduction technique of Factor Analysis (FA) with mixture modeling. The key issue in MFA is deciding on the latent dimension and the number of mixture components to be used. The Bayesian treatment of MFA has been considered by Beal and Ghahramani (2000) using variational approximation and by Fokoué and Titterington (2003) using birth-and death Markov chain Monte Carlo (MCMC). Here, we present the nonparametric MFA model utilizing a Dirichlet process (DP) prior on the component parameters (that is, the factor loading matrix and the mean vector of each component) and describe an MCMC scheme for inference. The clustering property of the DP provides automatic selection of the number of mixture components. The latent dimensionality of each component is inferred by automatic relevance determination (ARD). Identifying the action potentials of individual neurons from extracellular recordings, known as spike sorting, is a challenging clustering problem. We apply our model for clustering the waveforms recorded from the cortex of a macaque monkey.

Invited talk at the PASCAL Workshop on Methods of Data Analysis in Computational Neuroscience and Brain Computer Interfaces, June 2007 (talk)

Abstract

When considering Brain-Computer Interface (BCI) development for patients in the most severely paralysed states, there is considerable motivation to move away from BCI systems based on either motor cortex activity, or on visual stimuli. Together these account for most of current BCI research. I present the results of our recent exploration of new auditory- and tactile-stimulus-driven BCIs.
The talk includes a tutorial on the construction and interpretation of classifiers which extract spatio-temporal features from event-related potential data. The effects and implications of whitening are discussed, and preliminary results on the effectiveness of a low-rank constraint (Tomioka and Aihara 2007) are shown.

We study the problem of learning kernel machines transductively for structured output variables. Transductive learning can be reduced to combinatorial optimization problems over all possible labelings of the unlabeled data. In order to scale transductive learning to structured variables, we transform the corresponding non-convex, combinatorial, constrained optimization problems into continuous, unconstrained optimization problems. The discrete optimization parameters are eliminated and the resulting differentiable problems can be optimized efficiently. We study the effectiveness of the generalized TSVM on multiclass classification and label-sequence learning problems empirically.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems