Computational models of spatial vision typically make use of a (rectified) linear filter, a nonlinearity and dominant late noise to account for human contrast discrimination data. Linear–nonlinear cascade models predict an improvement in observers' contrast detection performance when low, subthreshold levels of external noise are added (i.e., stochastic resonance). Here, we address the issue whether a single contrast gain-control model of early spatial vision can account for both the pedestal effect, i.e., the improved detectability of a grating in the presence of a low-contrast masking grating, and stochastic resonance. We measured contrast discrimination performance without noise and in both weak and moderate levels of noise. Making use of a full quantitative description of our data with few parameters combined with comprehensive model selection assessments, we show the pedestal effect to be more reduced in the presence of weak noise than in moderate noise. This reduction rules out independent, additive sources of performance improvement and, together with a simulation study, supports the parsimonious explanation that a single mechanism underlies the pedestal effect and stochastic resonance in contrast perception.

International Journal of Neuroscience, 118(11):1534-1546, November 2008 (article)

Abstract

Objective: This study investigated the influence of mutual information (MI) on temporal and dipole reconstruction based on independent components (ICs) derived from independent component analysis (ICA). Method: Artificial electroencephalogram (EEG) datasets were created by means of a neural mass model simulating cortical activity of two neural sources within a four-shell spherical head model. Mutual information between neural sources was systematicallyvaried. Results: Increasing spatial error for reconstructed locations of ICs with increasing MI was observed. By contrast, the reconstruction error for the time course of source activity was largely independent of MI but varied systematically with Gaussianity of the sources. Conclusion: Independent component analysis is a viable tool for analyzing the temporal activity of EEG/MEG (magnetoencephalography) sources even if the underlying neural sources are mutually dependent. However, if ICA is used as a preprocessing algorithm for source localization, mutual information between sources introduces a bias in the reconstructed locations of the sources. Significance: Studies using ICA-algorithms based on MI have to be aware of possible errors in the spatial reconstruction of sources if these are coupled with other neural sources.

Graph mining methods enumerate frequently appearing subgraph patterns, which can be used as features for subsequent classification or regression. However, frequent patterns are not necessarily informative for the given learning problem. We propose a mathematical programming boosting method (gBoost) that progressively collects informative patterns. Compared to AdaBoost, gBoost can build the prediction rule with fewer iterations. To apply the boosting method to graph data, a branch-and-bound pattern search algorithm is developed based on the DFS code tree. The constructed search space is reused in later iterations to minimize the computation time. Our method can learn more efficiently than the simpler method based on frequent substructure mining, because the output labels are used as an extra information source for pruning the search space. Furthermore, by engineering the mathematical program, a wide range of machine learning problems can be solved without modifying the pattern search algorithm.

Autonomous robots that can adapt to novel situations has been a long standing vision of robotics, artificial intelligence, and
the cognitive sciences. Early approaches to this goal during the heydays of artificial intelligence research in the late 1980s,
however, made it clear that an approach purely based on reasoning or human insights would not be able to model all the
perceptuomotor tasks of future robots. Instead, new hope was put in the growing wake of machine learning that promised fully
adaptive control algorithms which learn both by observation and trial-and-error. However, to date, learning techniques have yet
to fulfill this promise as only few methods manage to scale into the high-dimensional domains of manipulator and humanoid
robotics and usually scaling was only achieved in precisely pre-structured domains. We have investigated the ingredients for
a general approach to motor skill learning in order to get one step closer towards human-like performance. For doing so, we
study two major components for such an approach, i.e., firstly, a theoretically well-founded general approach to representing
the required control structures for task representation and execution and, secondly, appropriate learning algorithms which can
be applied in this setting.

Many common machine learning methods such as Support Vector Machines or Gaussian process
inference make use of positive definite kernels, reproducing kernel Hilbert spaces, Gaussian processes, and
regularization operators. In this work these objects are presented in a general, unifying framework, and
interrelations are highlighted.
With this in mind we then show how linear stochastic differential equation models can be incorporated
naturally into the kernel framework. And vice versa, many kernel machines can be interpreted in terms of
differential equations. We focus especially on ordinary differential equations, also known as dynamical
systems, and it is shown that standard kernel inference algorithms are equivalent to Kalman filter methods
based on such models.
In order not to cloud qualitative insights with heavy mathematical machinery, we restrict ourselves to finite
domains, implying that differential equations are treated via their corresponding finite difference equations.

Plasmodium knowlesi is an intracellular malaria parasite whose natural vertebrate host is Macaca fascicularis (the 'kra' monkey); however, it is now increasingly recognized as a significant cause of human malaria, particularly in southeast Asia1, 2. Plasmodium knowlesi was the first malaria parasite species in which antigenic variation was demonstrated3, and it has a close phylogenetic relationship to Plasmodium vivax 4, the second most important species of human malaria parasite (reviewed in ref. 4). Despite their relatedness, there are important phenotypic differences between them, such as host blood cell preference, absence of a dormant liver stage or 'hypnozoite' in P. knowlesi, and length of the asexual cycle (reviewed in ref. 4). Here we present an analysis of the P. knowlesi (H strain, Pk1(A+) clone5) nuclear genome sequence. This is the first monkey malaria parasite genome to be described, and it provides an opportunity for comparison with the recently completed P. vivax genome4 and other sequenced Plasmodium genomes6, 7, 8. In contrast to other Plasmodium genomes, putative variant antigen families are dispersed throughout the genome and are associated with intrachromosomal telomere repeats. One of these families, the KIRs9, contains sequences that collectively match over one-half of the host CD99 extracellular domain, which may represent an unusual form of molecular mimicry.

Proceedings of the National Academy of Sciences of the United States of America, 105(41):15641-15642 , October 2008 (article)

Abstract

The brain is never inactive. Neurons fire at leisurely rates most of the time, even in sleep (1), although occasionally they fire more intensely, for example, when presented with certain stimuli. Coordinated changes in the activity and excitability of many neurons underlie spontaneous fluctuations in the electroencephalogram (EEG), first observed almost a century ago. These fluctuations can be very slow (infraslow oscillations, <0.1 Hz; slow oscillations, <1 Hz; and slow waves or delta waves, 1–4 Hz), intermediate (theta, 4–8 Hz; alpha, 8–12 Hz; and beta, 13–20 Hz), and fast (gamma, >30 Hz). Moreover, slower fluctuations appear to group and modulate faster ones (1, 2). The BOLD signal underlying functional magnetic resonance imaging (fMRI) also exhibits spontaneous fluctuations at the timescale of tens of seconds (infraslow, <0.1 Hz), which occurs at all times, during task-performance as well as during quiet wakefulness, rapid eye movement (REM) sleep, and non-REM sleep (NREM). Although the precise mechanism underlying the BOLD signal is still being investigated (3–5), it is becoming clear that spontaneous BOLD fluctuations are not just noise, but are tied to fluctuations in neural activity. In this issue of PNAS, He et al. (6) have been able to directly investigate the relationship between BOLD fluctuations and fluctuations in the brain's electrical activity in human subjects.
He et al. (6) took advantage of the seminal observation by Biswal et al. (7) that spontaneous BOLD fluctuations in regions belonging to the same functional system are strongly correlated. As expected, He et al. saw that fMRI BOLD fluctuations were strongly correlated among regions within the sensorimotor system, but much less between sensorimotor regions and control regions (nonsensorimotor). The twist was that they did the fMRI recordings in subjects who had been implanted with intracranial electrocorticographic (ECoG) electrodes to record regional EEG signals (to localize epileptic foci). In a separate session, He et al. examined correlations in EEG signals between different regions. They found that, just like the BOLD fluctuations, infraslow and slow fluctuations in the EEG signal from sensorimotor-sensorimotor pairs of electrodes were positively correlated, whereas signals from sensorimotor-control pairs were not. Moreover, the correlation persisted across arousal states: in waking, NREM, and REM sleep. Finally, using several statistical approaches, they found a remarkable correspondence between regional correlations in the infraslow BOLD signal and regional correlations in the infraslow-slow EEG signal (<0.5 Hz or 1–4 Hz). Notably, another report has just appeared showing that mirror sites of auditory cortex across the two hemispheres, which show correlated BOLD activity, also show correlated infraslow EEG fluctuations recorded with ECoG electrodes (8). In this case, the correlated fluctuations reflected infraslow changes in EEG power in the gamma range [however, no significant correlations were found for slow ECoG frequencies (1–4 Hz)].

Proceedings of the National Academy of Sciences of the United States of America, 105(40):15370-15375, October 2008 (article)

Abstract

The voltage-dependent anion channel (VDAC), also known as mitochondrial porin, is the most abundant protein in the mitochondrial outer membrane (MOM). VDAC is the channel known to guide the metabolic flux across the MOM and plays a key role in mitochondrially induced apoptosis. Here, we present the 3D structure of human VDAC1, which was solved conjointly by NMR spectroscopy and x-ray crystallography. Human VDAC1 (hVDAC1) adopts a &amp;#946;-barrel architecture composed of 19 &amp;#946;-strands with an &amp;#945;-helix located horizontally midway within the pore. Bioinformatic analysis indicates that this channel architecture is common to all VDAC proteins and is adopted by the general import pore TOM40 of mammals, which is also located in the MOM.

For quantitative PET information, correction of tissue photon attenuation is mandatory. Generally in conventional PET, the attenuation map is obtained from a transmission scan, which uses a rotating radionuclide source, or from the CT scan in a combined PET/CT scanner. In the case of PET/MRI scanners currently under development, insufficient space for the rotating source exists; the attenuation map can be calculated from the MR image instead. This task is challenging because MR intensities correlate with proton densities and tissue-relaxation properties, rather than with attenuation-related mass density. METHODS: We used a combination of local pattern recognition and atlas registration, which captures global variation of anatomy, to predict pseudo-CT images from a given MR image. These pseudo-CT images were then used for attenuation correction, as the process would be performed in a PET/CT scanner. RESULTS: For human brain scans, we show on a database of 17 MR/CT image pairs that our method reliably enables e
stimation of a pseudo-CT image from the MR image alone. On additional datasets of MRI/PET/CT triplets of human brain scans, we compare MRI-based attenuation correction with CT-based correction. Our approach enables PET quantification with a mean error of 3.2% for predefined regions of interest, which we found to be clinically not significant. However, our method is not specific to brain imaging, and we show promising initial results on 1 whole-body animal dataset. CONCLUSION: This method allows reliable MRI-based attenuation correction for human brain scans. Further work is necessary to validate the method for whole-body imaging.

We provide a comprehensive overview of many recent algorithms for approximate inference in
Gaussian process models for probabilistic binary classification. The relationships between several
approaches are elucidated theoretically, and the properties of the different algorithms are
corroborated by experimental results. We examine both 1) the quality of the predictive distributions and
2) the suitability of the different marginal likelihood approximations for model selection (selecting
hyperparameters) and compare to a gold standard based on MCMC. Interestingly, some methods
produce good predictive distributions although their marginal likelihood approximations are poor.
Strong conclusions are drawn about the methods: The Expectation Propagation algorithm is almost
always the method of choice unless the computational budget is very tight. We also extend
existing methods in various ways, and provide unifying code implementing all approaches.

The use of generous distance bounds has been the hallmark of NMR structure determination. However, bounds necessitate the estimation of data quality before the calculation, reduce the information content, introduce human bias, and allow for major errors in the structures. Here, we propose a new rapid structure calculation scheme based on Bayesian analysis. The minimization of an extended energy function, including a new type of distance restraint and a term depending on the data quality, results in an estimation of the data quality in addition to coordinates. This allows for the determination of the optimal weight on the experimental information. The resulting structures are of better quality and closer to the Xray crystal structure of the same molecule. With the new calculation approach, the analysis of discrepancies from the target distances becomes meaningful. The strategy may be useful in other applicationsfor example, in homology modeling.

Similarity is used as an explanatory construct throughout psychology and multidimensional scaling (MDS) is the most popular way to assess similarity. In MDS, similarity is intimately connected to the idea of a geometric representation of stimuli in a perceptual space. Whilst connecting similarity and closeness of stimuli in a geometric representation may be intuitively plausible, Tversky and Gati [Tversky, A., Gati, I. (1982). Similarity, separability, and the triangle inequality. Psychological Review, 89(2), 123154] have reported data which are inconsistent with the usual geometric representations that are based on segmental additivity. We show that similarity measures based on Shepards universal law of generalization [Shepard, R. N. (1987). Toward a universal law of generalization for psychologica science. Science, 237(4820), 13171323] lead to an inner product representation in a reproducing kernel Hilbert space. In such a space stimuli are represented by their similarity to all other stimuli. This representation, based on Shepards law, has a natural metric that does not have additive segments whilst still retaining the intuitive notion of connecting similarity and distance between stimuli. Furthermore, this representation has the psychologically appealing property that the distance between stimuli is bounded.

Pattern recognition methods have shown that functional magnetic resonance imaging (fMRI) data can reveal significant information about brain activity. For example, in the debate of how object categories are represented in the brain, multivariate analysis has been used to provide evidence of a distributed encoding scheme [Science 293:5539 (2001) 24252430]. Many follow-up studies have employed different methods to analyze human fMRI data with varying degrees of success [Nature reviews 7:7 (2006) 523534]. In this study, we compare four popular pattern recognition methods: correlation analysis, support-vector machines (SVM), linear discriminant analysis (LDA) and Gaussian naïve Bayes (GNB), using data collected at high field (7 Tesla) with higher resolution than usual fMRI studies. We investigate prediction performance on single trials and for averages across varying numbers of stimulus presentations. The performance of the various algorithms depends on the nature of the brain activity being categorized: for
several tasks, many of the methods work well, whereas for others, no method performs above chance level. An important factor in overall classification performance is careful preprocessing of the data, including dimensionality reduction, voxel selection and outlier elimination.

In measurement-based quantum computation, quantum algorithms are implemented via sequences of measurements. We describe a translationally invariant finite-range interaction on a one-dimensional qudit chain and prove that a single-shot measurement of the energy of an appropriate computational basis state with respect to this Hamiltonian provides the output of any quantum circuit. The required measurement accuracy scales inverse polynomially with the size of the simulated quantum circuit. This shows that the implementation of energy measurements on generic qudit chains is as hard as the realization of quantum computation. Here, a ‘measurement‘ is any procedure that samples from the spectral measurement induced by the observable and the state under consideration. As opposed to measurement-based quantum computation, the post-measurement state is irrelevant.

Braincomputer interfaces (BCIs) can be used for communication in writing without muscular activity or for learning to control seizures by voluntary regulation of brain signals such as the electroencephalogram (EEG). Three of five patients with epilepsy were able to spell their names with electrocorticogram (ECoG) signals derived from motor-related areas within only one or two training sessions. Imagery of finger or tongue movements was classified with support-vector classification of autoregressive coefficients derived from the ECoG signals. After training of the classifier, binary classification responses were used to select letters from a computer-generated menu. Offline analysis showed increased theta activity in the unsuccessful patients, whereas the successful patients exhibited dominant sensorimotor rhythms that they could control. The high spatial resolution and increased signal-to-noise ratio in ECoG signals, combined with short training periods, may offer an alternative for communication in complete paralysis, locked-in syndrome, and motor restoration.

Predicting the phenotype of an organism from its genotype is a central question in genetics. Most importantly, we would like to find out if the perturbation of a single gene may be the cause of a disease. However, our current ability to predict the phenotypic effects of perturbations of individual genes is limited. Network models of genes are one tool for tackling this problem. In a recent study, (Lee et al.) it has been shown that network models covering the majority of genes of an organism can be used for accurately predicting phenotypic effects of gene perturbations in multicellular organisms.

We address two shortcomings of the common spatial patterns (CSP) algorithm for spatial filtering in the context of brain--computer interfaces (BCIs) based on electroencephalography/magnetoencephalography (EEG/MEG): First, the question of optimality of CSP in terms of the minimal achievable classification error remains unsolved. Second, CSP has been initially proposed for two-class paradigms. Extensions to multiclass paradigms have been suggested, but are based on heuristics. We address these shortcomings in the framework of information theoretic feature extraction (ITFE). We show that for two-class paradigms, CSP maximizes an approximation of mutual information of extracted EEG/MEG components and class labels. This establishes a link between CSP and the minimal classification error. For multiclass paradigms, we point out that CSP by joint approximate diagonalization (JAD) is equivalent to independent component analysis (ICA), and provide a method to choose those independent components (ICs) that approximately
maximize mutual information of ICs and class labels. This eliminates the need for heuristics in multiclass CSP, and allows incorporating prior class probabilities. The proposed method is applied to the dataset IIIa of the third BCI competition, and is shown to increase the mean classification accuracy by 23.4% in comparison to multiclass CSP.

The analysis of extra-cellular neural recordings typically begins with careful spike sorting and all analysis
of the data then rests on the correctness of the resulting spike trains. In many situations this is
unproblematic as experimental and spike sorting procedures often focus on well isolated units. There is
evidence in the literature, however, that errors in spike sorting can occur even with carefully collected
and selected data. Additionally, chronically implanted electrodes and arrays with fixed electrodes cannot
be easily adjusted to provide well isolated units. In these situations, multiple units may be recorded and
the assignment of waveforms to units may be ambiguous. At the same time, analysis of such data may
be both scientifically important and clinically relevant. In this paper we address this issue using a novel
probabilistic model that accounts for several important sources of uncertainty and error in spike sorting.
In lieu of sorting neural data to produce a single best spike train, we estimate a probabilistic model of
spike trains given the observed data. We show how such a distribution over spike sortings can support
standard neuroscientific questions while providing a representation of uncertainty in the analysis. As a
representative illustration of the approach, we analyzed primary motor cortical tuning with respect to
hand movement in data recorded with a chronic multi-electrode array in non-human primates.We found
that the probabilistic analysis generally agrees with human sorters but suggests the presence of tuned
units not detected by humans.

Gene expression maps for model organisms, including Arabidopsis thaliana, have typically been created using gene-centric expression arrays. Here, we describe a comprehensive expression atlas, Arabidopsis thaliana Tiling Array Express (At-TAX), which is based on whole-genome tiling arrays. We demonstrate that tiling arrays are accurate tools for gene expression analysis and identified more than 1,000 unannotated transcribed regions. Visualizations of gene expression estimates, transcribed regions, and tiling probe measurements are accessible online at the At-TAX homepage.

BACKGROUND: The Ambiguous Restraints for Iterative Assignment (ARIA) approach is widely used for NMR structure determination. It is based on simultaneously calculating structures and assigning NOE through an iterative protocol. The final solution consists of a set of conformers and a list of most probable assignments for the input NOE peak list. RESULTS: ARIA was extended with a series of graphical tools to facilitate a detailed analysis of the intermediate and final results of the ARIA protocol. These additional features provide (i) an interactive contact map, serving as a tool for the analysis of assignments, and (ii) graphical representations of structure quality scores and restraint statistics. The interactive contact map between residues can be clicked to obtain information about the restraints and their contributions. Profiles of quality scores are plotted along the protein sequence, and contact maps provide information of the agreement with the data on a residue pair level. CONCLUSIONS: The g
raphical tools and outputs described here significantly extend the validation and analysis possibilities of NOE assignments given by ARIA as well as the analysis of the quality of the final structure ensemble. These tools are included in the latest version of ARIA, which is available at http://aria.pasteur.fr. The Web site also contains an installation guide, a user manual and example calculations.

We review machine learning methods employing positive definite kernels. These methods formulate learning and estimation problems in a reproducing kernel Hilbert space (RKHS) of functions defined on the data domain, expanded in terms of a kernel. Working in linear spaces of function has the benefit of facilitating the construction and analysis of learning algorithms while at the same time allowing large classes of functions. The latter include nonlinear functions as well as functions defined on nonvectorial data.

We propose a highly efficient framework for penalized likelihood kernel methods applied
to multi-class models with a large, structured set of classes. As opposed to many previous
approaches which try to decompose the fitting problem into many smaller ones, we focus
on a Newton optimization of the complete model, making use of model structure and
linear conjugate gradients in order to approximate Newton search directions. Crucially,
our learning method is based entirely on matrix-vector multiplication primitives with the
kernel matrices and their derivatives, allowing straightforward specialization to new kernels,
and focusing code optimization efforts to these primitives only.
Kernel parameters are learned automatically, by maximizing the cross-validation log
likelihood in a gradient-based way, and predictive probabilities are estimated. We demonstrate
our approach on large scale text classification tasks with hierarchical structure on
thousands of classes, achieving state-of-the-art results in an order of magnitude less time
than previous work.

This paper introduces a time- and state-dependent measure of integrated information, φ, which captures the repertoire of causal states available to a system as a whole. Specifically, φ quantifies how much information is generated (uncertainty is reduced) when a system enters a particular state through causal interactions among its elements, above and beyond the information generated independently by its parts. Such mathematical characterization is motivated by the observation that integrated information captures two key phenomenological properties of consciousness: (i) there is a large repertoire of conscious experiences so that, when one particular experience occurs, it generates a large amount of information by ruling out all the others; and (ii) this information is integrated, in that each experience appears as a whole that cannot be decomposed into independent parts. This paper extends previous work on stationary systems and applies integrated information to discrete networks as a function of their dynamics and causal architecture. An analysis of basic examples indicates the following: (i) φ varies depending on the state entered by a network, being higher if active and inactive elements are balanced and lower if the network is inactive or hyperactive. (ii) φ varies for systems with identical or similar surface dynamics depending on the underlying causal architecture, being low for systems that merely copy or replay activity states. (iii) φ varies as a function of network architecture. High φ values can be obtained by architectures that conjoin functional specialization with functional integration. Strictly modular and homogeneous systems cannot generate high φ because the former lack integration, whereas the latter lack information. Feedforward and lattice architectures are capable of generating high φ but are inefficient. (iv) In Hopfield networks, φ is low for attractor states and neutral states, but increases if the networks are optimized to achieve tension between local and global interactions. These basic examples appear to match well against neurobiological evidence concerning the neural substrates of consciousness. More generally, φ appears to be a useful metric to characterize the capacity of any physical system to integrate information.

AbstractBayesian nonparametric models are widely and successfully
used for statistical prediction. While posterior consistency properties are
well studied in quite general settings, results have been proved using abstract
concepts such as metric entropy, and they come with subtle conditions
which are hard to validate and not intuitive when applied to concrete
models. Furthermore, convergence rates are difficult to obtain.
By focussing on the concept of information consistency for Bayesian
Gaussian process (GP)models, consistency results and convergence rates
are obtained via a regret bound on cumulative log loss. These results
depend strongly on the covariance function of the prior process, thereby
giving a novel interpretation to penalization with reproducing kernel
Hilbert space norms and to commonly used covariance function classes
and their parameters. The proof of the main result employs elementary
convexity arguments only. A theorem of Widom is used in order to obtain
precise convergence rates for several covariance functions widely used in
practice.

Consider a Hamiltonian system that consists of a slow subsystem S and a fast subsystem F. The autonomous dynamics of S is driven by an effective Hamiltonian, but its thermodynamics is unexpected. We show that a well-defined thermodynamic arrow of time (second law) emerges for S whenever there is a well-defined causal arrow from S to F and the back-action is negligible. This is because the back-action of F on S is described by a non-globally Hamiltonian BornOppenheimer term that violates the Liouville theorem, and makes the second law inapplicable to S. If S and F are mixing, under the causal arrow condition they are described by microcanonical distributions P(S) and P(S|F). Their structure supports a causal inference principle proposed recently in machine learning.

Exemplar theories of categorization depend on similarity for explaining subjects ability to
generalize to new stimuli. A major criticism of exemplar theories concerns their lack of abstraction
mechanisms and thus, seemingly, generalization ability. Here, we use insights from
machine learning to demonstrate that exemplar models can actually generalize very well. Kernel
methods in machine learning are akin to exemplar models and very successful in real-world
applications. Their generalization performance depends crucially on the chosen similaritymeasure.
While similarity plays an important role in describing generalization behavior it is not
the only factor that controls generalization performance. In machine learning, kernel methods
are often combined with regularization techniques to ensure good generalization. These same
techniques are easily incorporated in exemplar models. We show that the Generalized Context
Model (Nosofsky, 1986) and ALCOVE (Kruschke, 1992) are closely related to a statistical
model called kernel logistic regression. We argue that generalization is central to the enterprise
of understanding categorization behavior and suggest how insights from machine learning can
offer some guidance. Keywords: kernel, similarity, regularization, generalization, categorization.

We consider testing statistical hypotheses about densities of signals in deconvolution models. A new approach to this problem is proposed. We constructed score tests for the deconvolution density testing with the known noise density and efficient score tests for the case of unknown density. The tests are incorporated with model selection rules to choose reasonable model dimensions automatically by the data. Consistency of the tests is proved.

We present a generalization of thin-plate splines for interpolation and approximation of manifold-valued data, and
demonstrate its usefulness in computer graphics with several applications from different fields. The cornerstone
of our theoretical framework is an energy functional for mappings between two Riemannian manifolds which
is independent of parametrization and respects the geometry of both manifolds. If the manifolds are Euclidean,
the energy functional reduces to the classical thin-plate spline energy. We show how the resulting optimization
problems can be solved efficiently in many cases. Our example applications range from orientation interpolation
and motion planning in animation over geometric modelling tasks to color interpolation.

Metric nearness refers to the problem of optimally restoring metric properties to distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric data can be important in various settings, for example, in clustering, classification, metric-based indexing, query processing, and graph theoretic approximation algorithms. This paper formulates and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a nearest set of distances that satisfy the properties of a metricprincipally the triangle inequality. For solving this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative projection method. An intriguing aspect of the metric nearness problem is that a special case turns out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and develops a new algorithm for the latter problem using a primal-dual method. Applications to graph clustering are provided as an illustratio
n. We include experiments that demonstrate the computational superiority of triangle fixing over general purpose convex programming software. Finally, we conclude by suggesting various useful extensions and generalizations to metric nearness.

The linear model with sparsity-favouring prior on the coefficients has important applications in many different domains. In machine learning, most methods to date search for maximum a posteriori sparse solutions and neglect to represent posterior uncertainties. In this paper, we address problems of Bayesian optimal design (or experiment planning), for which accurate estimates of uncertainty are essential. To this end, we employ expectation propagation approximate inference for the linear model with Laplace prior, giving new insight into numerical stability properties and proposing a robust algorithm. We also show how to estimate model hyperparameters by empirical Bayesian maximisation of the marginal likelihood, and propose ideas in order to scale up the method to very large underdetermined problems.
We demonstrate the versatility of our framework on the application of gene regulatory network identification from micro-array expression data, where both the Laplace prior and the active experimental design approach are shown to result in significant improvements. We also address the problem of sparse coding of natural images, and show how our framework can be used for compressive sensing tasks.

Consistency is a key property of statistical algorithms when the data
is drawn from some underlying probability distribution. Surprisingly,
despite decades of work, little is known about consistency of most
clustering algorithms. In this paper we investigate consistency of
the popular family of spectral clustering algorithms, which clusters
the data with the help of eigenvectors of graph Laplacian matrices. We
develop new methods to establish that for increasing sample size,
those eigenvectors converge to the eigenvectors of certain limit
operators. As a result we can prove that one of the two major classes
of spectral clustering (normalized clustering) converges under very
general conditions, while the other (unnormalized clustering) is only
consistent under strong additional assumptions, which are not always
satisfied in real data. We conclude that our analysis provides strong
evidence for the superiority of normalized spectral clustering.

Classification of plants according to their echoes is an elementary component of bat behavior that plays an important role in spatial orientation and food acquisition. Vegetation echoes are, however, highly complex stochastic signals: from an acoustical point of view, a plant can be thought of as a three-dimensional array of leaves reflecting the emitted bat call. The received echo is therefore a superposition of many reflections. In this work we suggest that the classification of these echoes might not be such a troublesome routine for bats as formerly thought. We present a rather simple approach to classifying signals from a large database of plant echoes that were created by ensonifying plants with a frequency-modulated bat-like ultrasonic pulse. Our algorithm uses the spectrogram of a single echo from which it only uses features that are undoubtedly accessible to bats. We used a standard machine learning algorithm (SVM) to automatically extract suitable linear combinations of time and frequency cues from
the spectrograms such that classification with high accuracy is enabled. This demonstrates that ultrasonic echoes are highly informative about the species membership of an ensonified plant, and that this information can be extracted with rather simple, biologically plausible analysis. Thus, our findings provide a new explanatory basis for the poorly understood observed abilities of bats in classifying vegetation and other complex objects.

We propose a method to quantify the complexity of conditional probability measures by a Hilbert space seminorm of the logarithm of its density. The concept of reproducing kernel Hilbert spaces (RKHSs) is a flexible tool to define such a seminorm by choosing an appropriate kernel. We present several examples with artificial data sets where our kernel-based complexity measure is consistent with our intuitive understanding of complexity of densities. The intention behind the complexity measure is to provide a new approach to inferring causal directions. The idea is that the
factorization of the joint probability measure P(effect, cause) into P(effect|cause)P(cause) leads typically to "simpler" and "smoother" terms than the factorization into P(cause|effect)P(effect). Since the conventional constraint-based approach of causal discovery is not able to determine the causal direction between only two variables, our inference principle can in particular be useful when combined with other existing methods. We provide several simple examples with real-world data where the true causal directions indeed lead to simpler (conditional) densities.

In this paper, we suggest a novel reinforcement learning architecture, the Natural
Actor-Critic. The actor updates are achieved using stochastic policy gradients em-
ploying Amaris natural gradient approach, while the critic obtains both the natural
policy gradient and additional parameters of a value function simultaneously by lin-
ear regression. We show that actor improvements with natural policy gradients are
particularly appealing as these are independent of coordinate frame of the chosen
policy representation, and can be estimated more efficiently than regular policy gra-
dients. The critic makes use of a special basis function parameterization motivated
by the policy-gradient compatible function approximation. We show that several
well-known reinforcement learning methods such as the original Actor-Critic and
Bradtkes Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms.
Empirical evaluations illustrate the effectiveness of our techniques in comparison to
previous methods, and also demonstrate their applicability for learning control on
an anthropomorphic robot arm.

International Journal of Mathematics , 19(3):339-367, March 2008 (article)

Abstract

The moduli space of G-bundles on an elliptic curve with additional flag structure admits a Poisson structure. The bivector can be defined using double loop group, loop group and sheaf cohomology constructions. We investigate the links between these methods and for the case SL2 perform explicit computations, describing the bracket and its leaves in detail.

We investigated whether it is possible to
infer spike trains solely on the basis of the underlying local field
potentials (LFPs). Using support vector machines and linear regression
models, we found that in the primary visual cortex (V1) of
monkeys, spikes can indeed be inferred from LFPs, at least with
moderate success. Although there is a considerable degree of variation
across electrodes, the low-frequency structure in spike trains (in the
100-ms range) can be inferred with reasonable accuracy, whereas
exact spike positions are not reliably predicted. Two kinds of features
of the LFP are exploited for prediction: the frequency power of bands
in the high gamma-range (40&amp;amp;amp;amp;amp;#8211;90 Hz) and information contained in lowfrequency
oscillations ( 10 Hz), where both phase and power modulations
are informative. Information analysis revealed that both
features code (mainly) independent aspects of the spike-to-LFP relationship,
with the low-frequency LFP phase coding for temporally
clustered spiking activity. Although both features and prediction
quality are similar during seminatural movie stimuli and spontaneous
activity, prediction performance during spontaneous activity degrades
much more slowly with increasing electrode distance. The general
trend of data obtained with anesthetized animals is qualitatively
mirrored in that of a more limited data set recorded in V1 of non-anesthetized
monkeys. In contrast to the cortical field potentials, thalamic LFPs
(e.g., LFPs derived from recordings in the dorsal lateral geniculate
nucleus) hold no useful information for predicting spiking activity.

SUMMARY: The conventional approach to calculating biomolecular structures from nuclear magnetic resonance (NMR) data is often viewed as subjective due to its dependence on rules of thumb for deriving geometric constraints and suitable values for theory parameters from noisy experimental data. As a result, it can be difficult to judge the precision of an NMR structure in an objective manner. The Inferential Structure Determination (ISD) framework, which has been introduced recently, addresses this problem by using Bayesian inference to derive a probability distribution that represents both the unknown structure and its uncertainty. It also determines additional unknowns, such as theory parameters, that normally need be chosen empirically. Here we give an overview of the ISD software package, which implements this methodology. AVAILABILITY: The program is available at http://www.bioc.cam.ac.uk/isd

Molecular structures are usually calculated from experimental data with some method of energy minimisation or non-linear optimisation. Key aims of a structure calculation are to estimate the coordinate uncertainty, and to provide a meaningful measure of the quality of the fit to the data. We discuss approaches to optimally combine prior information and experimental data and the connection to probability theory. We analyse the appropriate statistics for NOEs and NOE-derived distances, and the related question of restraint potentials. Finally, we will discuss approaches to determine the appropriate weight on the experimental evidence and to obtain in this way an estimate of the data quality from the structure calculation. Whereas objective estimates of coordinates and their uncertainties can only be obtained by a full Bayesian treatment of the problem, standard structure calculation methods continue to play an important role. To obtain the full benefit of these methods, they should be founded on a rigorous Baye
sian analysis.

Due to its wide applicability, the problem of semi-supervised classification is attracting increasing attention in machine learning. Semi-Supervised Support Vector Machines (S3VMs) are based on applying the margin maximization principle to both labeled and unlabeled examples. Unlike SVMs, their formulation leads to a non-convex optimization problem. A suite of algorithms have recently been proposed for solving S3VMs. This paper reviews key ideas in this literature. The performance and behavior of various S3VMs algorithms is studied together, under a common experimental setting.

Nonnegative matrix approximation (NNMA) is a popular matrix decomposition technique that has proven to be useful across a diverse variety of fields with applications ranging from document analysis and image processing to bioinformatics and signal processing. Over the years, several algorithms for NNMA have been proposed, e.g. Lee and Seung&amp;lsquo;s multiplicative updates, alternating least squares (ALS), and gradient descent-based procedures. However, most of these procedures suffer from either slow convergence, numerical instability, or at worst, serious theoretical drawbacks. In this paper, we develop a new and improved algorithmic framework for the least-squares NNMA problem, which is not only theoretically well-founded, but also overcomes many deficiencies of other methods. Our framework readily admits powerful optimization techniques and as concrete realizations we present implementations based on the Newton, BFGS and conjugate gradient methods. Our algorithms provide numerical resu
lts
supe
rior to both Lee and Seung&amp;lsquo;s method as well as to the alternating least squares heuristic, which was reported to work well in some situations but has no theoretical guarantees[1]. Our approach extends naturally to include regularization and box-constraints without sacrificing convergence guarantees. We present experimental results on both synthetic and real-world datasets that demonstrate the superiority of our methods, both in terms of better approximations as well as computational efficiency.

Residual dipolar couplings provide complementary information to the nuclear Overhauser effect measurements that are traditionally used in biomolecular structure determination by NMR. In a de novo structure determination, however, lack of knowledge about the degree and orientation of molecular alignment complicates the analysis of dipolar coupling data. We present a probabilistic framework for analyzing residual dipolar couplings and demonstrate that it is possible to estimate the atomic coordinates, the complete molecular alignment tensor, and the error of the couplings simultaneously. As a by-product, we also obtain estimates of the uncertainty in the coordinates and the alignment tensor. We show that our approach encompasses existing methods for determining the alignment tensor as special cases, including least squares estimation, histogram fitting, and elimination of an explicit alignment tensor in the restraint energy.

A new technique, Serial Block Face Scanning Electron Microscopy (SBFSEM), allows for automatic
sectioning and imaging of biological tissue with a scanning electron microscope. Image
stacks generated with this technology have a resolution sufficient to distinguish different cellular
compartments, including synaptic structures, which should make it possible to obtain detailed
anatomical knowledge of complete neuronal circuits. Such an image stack contains several thousands
of images and is recorded with a minimal voxel size of 10-20nm in the x and y- and 30nm
in z-direction. Consequently, a tissue block of 1mm3 (the approximate volume of the Calliphora
vicina brain) will produce several hundred terabytes of data. Therefore, highly automated 3D
reconstruction algorithms are needed. As a first step in this direction we have developed semiautomated
segmentation algorithms for a precise contour tracing of cell membranes. These
algorithms were embedded into an easy-to-operate user interface, which allows direct 3D observation
of the extracted objects during the segmentation of image stacks. Compared to purely
manual tracing, processing time is greatly accelerated.

We propose an extension of Gaussian mixture models in the statistical-mechanical point of view. The conventional Gaussian mixture models are formulated to divide all points in given data to some kinds of classes. We introduce some quantum states constructed by
superposing conventional classes in linear combinations. Our extension can provide a new algorithm in classifications of data by means of linear response formulas in the statistical mechanics.

It is well known that solutions to the nonlinear independent component analysis (ICA) problem are highly non-unique. In this paper we propose the "minimal nonlinear distortion" (MND) principle for tackling the ill-posedness of nonlinear ICA problems. MND prefers the nonlinear ICA solution with the estimated mixing procedure as close as possible to linear, among all possible solutions. It also helps to avoid local optima in the solutions. To achieve MND, we exploit a regularization term to minimize the mean square error between the nonlinear mixing mapping and the best-fitting linear one. The effect of MND on the inherent trivial and non-trivial indeterminacies in nonlinear ICA solutions is investigated. Moreover, we show that local MND is closely related to the smoothness regularizer penalizing large curvature, which provides another useful regularization condition for nonlinear ICA. Experiments on synthetic data show the usefulness of the MND principle for separating various nonlinear mixtures. Finally, as an application, we use nonlinear ICA with MND to separate daily returns of a set of stocks in Hong Kong, and the linear causal relations among them are successfully discovered. The resulting causal relations give some interesting insights into the stock market. Such a result can not be achieved by linear ICA. Simulation studies also verify that when doing causality discovery, sometimes one should not ignore the nonlinear distortion in the data generation procedure, even if it is weak.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems