Learning by imitation has shown to be a powerful paradigm for automated learning in autonomous robots. This paper presents a general framework of learning by imitation for stochastic and partially observable systems. The model is a Predictive Policy Representation (PPR) whose goal is to represent the teacher‘s policies without any reference to states. The model is fully described in terms of actions and observations only. We show how this model can efficiently learn the personal behavior and preferences of an assistive robot user.

We study the stereo matching problem for reconstruction of the location of 3D-points on an unknown surface patch from two calibrated identical cameras without using any a priori information about the pointwise correspondences. We assume that camera parameters and the pose between the cameras are known. Our approach follows earlier work for coplanar cameras where a gradient flow algorithm was proposed to match associated Gramians. Here we extend this method by allowing arbitrary poses for the cameras. We introduce an intrinsic Riemannian Newton algorithm that achieves local quadratic convergence rates. A closed form solution is presented, too. The efficiency of both algorithms is demonstrated by numerical experiments.

We present a new technique for structured
prediction that works in a hybrid generative/
discriminative way, using a one-class
support vector machine to model the joint
probability of (input, output)-pairs in a joint
reproducing kernel Hilbert space.
Compared to discriminative techniques, like
conditional random elds or structured out-
put SVMs, the proposed method has the advantage
that its training time depends only
on the number of training examples, not on
the size of the label space. Due to its generative
aspect, it is also very tolerant against
ambiguous, incomplete or incorrect labels.
Experiments on realistic data show that our
method works eciently and robustly in situations
for which discriminative techniques
have computational or statistical problems.

While data mining in chemoinformatics studied graph data with dozens of nodes, systems biology and the Internet are now generating graph data with thousands and millions of nodes. Hence data mining faces the algorithmic challenge of coping with this significant increase in graph size: Classic algorithms for data analysis are often too expensive and too slow on large graphs. While one strategy to overcome this problem is to design novel efficient algorithms, the other is to 'reduce' the size of the large graph by sampling. This is the scope of this paper: We will present novel Metropolis algorithms for sampling a 'representative' small subgraph from the original large graph, with 'representative' describing the requirement that the sample shall preserve crucial graph properties of the original graph. In our experiments, we improve over the pioneering work of Leskovec and Faloutsos (KDD 2006), by producing representative subgraph samples that are both smaller and of higher quality than those produced by other methods from the literature.

Discovery of knowledge from geometric graph databases is of particular importance in chemistry and biology, because chemical compounds and proteins are represented as graphs with 3D geometric coordinates. In such applications, scientists are not interested in the statistics of the whole database. Instead they need information about a novel drug candidate or protein at hand, represented as a query graph. We propose a polynomial-delay algorithm for geometric frequent subgraph retrieval. It enumerates all subgraphs of a single given query graph which are frequent geometric $epsilon$-subgraphs under the entire class of rigid geometric transformations in a database. By using geometric$epsilon$-subgraphs, we achieve tolerance against variations in geometry. We compare the proposed algorithm to gSpan on chemical compound data, and we show that for a given minimum support the total number of frequent patterns is substantially limited by requiring geometric matching. Although the computation time per pattern is lar
ger than for non-geometric graph mining,the total time is within a reasonable level even for small minimum support.

In this paper we present new algorithms for non-negative matrix approximation (NMA), commonly known as the NMF problem. Our methods improve upon the well-known methods of Lee & Seung~cite{lee00} for both the Frobenius norm as well the Kullback-Leibler divergence versions of the problem. For the latter problem, our results are especially interesting because it seems to have witnessed much lesser algorithmic progress as compared to the Frobenius norm NMA problem. Our algorithms are based on a particular textbf {block-iterative} acceleration technique for EM, which preserves the multiplicative nature of the updates and also ensures monotonicity. Furthermore, our algorithms also naturally apply to the Bregman-divergence NMA algorithms of~cite{suv.nips}. Experimentally, we show that our algorithms outperform the traditional Lee/Seung approach most of the time.

Time-series segmentation in the fully unsupervised scenario in which the number of segment-types is a priori unknown is a fundamental problem in many applications. We propose a Bayesian approach to a segmentation model based on the switching linear Gaussian state-space model that enforces a sparse parametrization, such as to use only a small number of a priori available different dynamics to explain the data. This enables us to estimate the number of segment-types within the model, in contrast to previous non-Bayesian approaches where training and comparing several separate models was required. As the
resulting model is computationally intractable, we introduce a variational approximation where a reformulation of the problem enables the use of efficient inference algorithms.

In this paper we build upon the Multiple Kernel Learning (MKL) framework
and in particular on [1] which generalized it to infinitely many
kernels. We rewrite the problem in the standard MKL formulation which
leads to a Semi-Infinite Program. We devise a new algorithm to solve it
(Infinite Kernel Learning, IKL). The IKL algorithm is applicable to both
the finite and infinite case and we find it to be faster and more stable
than SimpleMKL [2]. Furthermore we present the first large scale
comparison of SVMs to MKL on a variety of benchmark datasets, also
comparing IKL. The results show two things: a) for many datasets there
is no benefit in using MKL/IKL instead of the SVM classifier, thus the
flexibility of using more than one kernel seems to be of no use, b) on
some datasets IKL yields massive increases in accuracy over SVM/MKL due
to the possibility of using a largely increased kernel set. For those
cases parameter selection through Cross-Validation or MKL is not applicable.

High dimensionality of belief space in partially observable Markov decision processes (POMDPs) is one of the major causes that severely restricts the applicability of this model. Previous studies have demonstrated that the dimensionality of a POMDP can eventually be reduced by transforming it into an equivalent predictive state representation (PSR). In this paper, we address the problem of finding an approximate and compact PSR model corresponding to a given POMDP model. We formulate this problem in an optimization framework. Our algorithm tries to minimize the potential error that missing some core tests may cause. We also present an empirical evaluation on benchmark problems, illustrating the performance of this approach.

Graph mining methods enumerate frequent subgraphs
efficiently, but they are not necessarily good features for
machine learning due to high correlation among features.
Thus it makes sense to perform principal component analysis
to reduce the dimensionality and create decorrelated
features. We present a novel iterative mining algorithm
that captures informative patterns corresponding to major
entries of top principal components. It repeatedly calls
weighted substructure mining where example weights are
updated in each iteration. The Lanczos algorithm, a standard
algorithm of eigendecomposition, is employed to update
the weights. In experiments, our patterns are shown to
approximate the principal components obtained by frequent
mining.

We provide a novel framework for very fast model-based reinforcement learning in continuous state and action spaces. The framework requires probabilistic models that explicitly characterize their levels of confidence. Within this framework, we use flexible, non-parametric models to describe the world based on previously collected experience. We demonstrate learning on the cart-pole problem in a setting where we provide very limited prior knowledge about the task. Learning progresses rapidly, and a good policy is found after only a hand-full of iterations.

Policy Learning approaches are among the best suited methods for high-dimensional, continuous control systems such as anthropomorphic robot arms and humanoid robots. In this paper, we show two contributions: firstly, we show a unified perspective which allows us to derive several policy learning algorithms from a common point of view, i.e, policy gradient algorithms, natural-gradient algorithms and EM-like policy learning. Secondly, we present several applications to both robot motor primitive learning as well as to robot control in task space. Results both from simulation and several different real robots are shown.

In International Conference on Control, Automation and Systems, pages: 1284-1289, IEEE, Piscataway, NJ, USA, International Conference on Control, Automation and Systems (ICCAS), October 2008 (inproceedings)

Abstract

There is a trade-off between stability and performance in haptic control systems. In this paper, a stability and performance analysis is presented for a scaled teleoperation system in an effort to increase the performance of the system while maintaining the stability. The stability is quantitatively defined as a metric using Llewellynpsilas absolute stability criterion. Position tracking and kinesthetic perception are used as the performance indices. The analysis is carried out using various scaling factors and impedances of human and environment. A two-channel position-position (PP) controller and a two-channel force-position (FP) controller are applied for the analysis and simulation.

Sliding window classifiers are among the most successful and widely applied techniques for object localization. However, training is typically done in a way that is not specific to the localization task. First a binary classifier is trained using a sample of positive and negative examples, and this classifier is subsequently applied to multiple regions within test images. We propose instead to treat object localization in a principled way by posing it as a problem of predicting structured data: we model the problem not as binary classification, but as the prediction of the bounding box of objects located in images. The use of a joint-kernel framework allows us to formulate the training procedure as a generalization of an SVM, which can be solved efficiently. We further improve computational efficiency by using a branch-and-bound strategy for localization during both training and testing. Experimental evaluation on the PASCAL VOC and TU Darmstadt datasets show that the structured training procedure improves pe
rformance over binary training as well as the best previously published scores.

We aim to color automatically greyscale images, without any manual intervention. The color proposition could then be interactively corrected by user-provided color landmarks if necessary. Automatic colorization is nontrivial since there is usually no one-to-one correspondence between color and local texture. The contribution of our framework is that we deal directly with multimodality and estimate, for each pixel of the image to be colored, the probability distribution of all possible colors,
instead of choosing the most probable color at the local level. We also predict the expected variation of color at each pixel, thus defining a nonuniform
spatial coherency criterion. We then use graph cuts to maximize the probability of the whole colored image at the global level. We work in the L-a-b color space in order to approximate the human perception of distances between colors, and we use machine learning tools to extract as much information as possible from a dataset of colored examples. The resulting algorithm is fast, designed to be more robust to texture noise, and is above all able to deal with ambiguity, in contrary to previous approaches.

Three simple and explicit procedures for testing the independence
of two multi-dimensional random variables are described. Two
of the associated test statistics (L1, log-likelihood) are defined when the
empirical distribution of the variables is restricted to finite partitions.
A third test statistic is defined as a kernel-based independence measure.
All tests reject the null hypothesis of independence if the test statistics
become large. The large deviation and limit distribution properties of all
three test statistics are given. Following from these results, distributionfree
strong consistent tests of independence are derived, as are asymptotically
alpha-level tests. The performance of the tests is evaluated experimentally
on benchmark data.

This paper presents a fully automated algorithm for reconstructing
a textured 3D model of a face from a single
photograph or a raw video stream. The algorithm is based
on a combination of Support Vector Machines (SVMs) and
a Morphable Model of 3D faces. After SVM face detection,
individual facial features are detected using a novel
regression- and classification-based approach, and probabilistically
plausible configurations of features are selected
to produce a list of candidates for several facial feature positions.
In the next step, the configurations of feature points
are evaluated using a novel criterion that is based on a
Morphable Model and a combination of linear projections.
To make the algorithm robust with respect to head orientation,
this process is iterated while the estimate of pose is
refined. Finally, the feature points initialize a model-fitting
procedure of the Morphable Model. The result is a highresolution
3D surface model.

We propose a new measure of conditional dependence of random variables, based on normalized cross-covariance operators on reproducing kernel Hilbert spaces. Unlike previous kernel dependence measures, the proposed criterion does not depend on the choice of kernel in the limit of infinite data, for a wide class of kernels. At the same time, it has a straightforward empirical estimate with good convergence behaviour. We discuss the theoretical properties of the measure, and demonstrate its application in experiments.

We study a pattern classification algorithm which has recently been proposed by Vapnik and coworkers. It builds on a new inductive principle which assumes that in addition to positive and negative data, a third class of data is available, termed the Universum. We assay the behavior of the algorithm by establishing links with Fisher discriminant analysis and oriented PCA, as well as with an SVM in a
projected subspace (or, equivalently, with a data-dependent reduced kernel). We also provide experimental results.

This paper considers kernels invariant to translation, rotation and dilation. We show that no non-trivial positive definite (p.d.) kernels exist which are radial and
dilation invariant, only conditionally positive definite (c.p.d.) ones. Accordingly, we discuss the c.p.d. case and provide some novel analysis, including an elementary derivation of a c.p.d. representer theorem. On the practical side, we give a support vector machine (s.v.m.) algorithm for arbitrary c.p.d. kernels. For the thinplate
kernel this leads to a classifier with only one parameter (the amount of regularisation), which we demonstrate to be as effective as an s.v.m. with the Gaussian kernel, even though the Gaussian involves a second parameter (the length scale).

It has been a long-standing goal in the adaptive control community to reduce the generically difficult, general reinforcement learning (RL) problem to simpler problems solvable by supervised learning. While this approach is todays standard for value function-based methods, fewer approaches are known that apply similar reductions to policy search methods. Recently, it has been shown that immediate RL problems can be solved by reward-weighted regression, and that the resulting algorithm is an expectation maximization (EM) algorithm with strong guarantees. In this paper, we extend this algorithm to the episodic case and show that it can be used in the context of LSTM recurrent neural networks (RNNs). The resulting RNN training algorithm is equivalent to a weighted self-modeling supervised learning technique. We focus on partially observable Markov decision problems (POMDPs) where it is essential that the policy is nonstationary in order to be optimal. We show that this new reward-weighted logistic regression u
sed in conjunction with an RNN architecture can solve standard benchmark POMDPs with ease.

High dimensionality of belief space in DEC-POMDPs is one
of the major causes that makes the optimal joint policy computation
intractable. The belief state for a given agent is a
probability distribution over the system states and the policies
of other agents. Belief compression is an efficient POMDP
approach that speeds up planning algorithms by projecting
the belief state space to a low-dimensional one. In this paper,
we introduce a new method for solving DEC-POMDP problems,
based on the compression of the policy belief space.
The reduced policy space contains sequences of actions and
observations that are linearly independent. We tested our approach
on two benchmark problems, and the preliminary results
confirm that Dynamic Programming algorithm scales up
better when the policy belief is compressed.

High performance and compliant robot control require accurate dynamics models which cannot be obtained analytically for sufficiently complex robot systems. In such cases, machine learning offers a promising alternative for approximating the robot dynamics using measured data. This approach offers a natural framework to incorporate unknown nonlinearities as well as to continually adapt online for changes in the robot dynamics. However, the most accurate regression methods, e.g. Gaussian processes regression (GPR) and support vector regression (SVR), suffer from exceptional high computational complexity which prevents their usage for large numbers of samples or online learning to date. Inspired by locally linear regression techniques, we propose an approximation to the standard GPR using local Gaussian processes models. Due to reduced computational cost, local Gaussian processes (LGP) can be applied for larger sample-sizes and online learning. Comparisons with other nonparametric regressions, e.g. standard GPR,
nu-SVR and locally weighted projection regression (LWPR), show that LGP has higher accuracy than LWPR close to the performance of standard GPR and nu-SVR while being sufficiently fast for online learning.

Maximum variance unfolding (MVU) is an effective heuristic for dimensionality reduction. It produces a low-dimensional representation of the data by maximizing the variance of their embeddings while preserving the local distances of the
original data. We show that MVU also optimizes a statistical dependence measure which aims to retain the identity of individual observations under the distancepreserving constraints. This general view allows us to design "colored" variants of MVU, which produce low-dimensional representations for a given task, e.g. subject to class labels or other side information.

A straightforward nonlinear extension of Grangers concept of causality in the kernel framework is suggested. The kernel-based approach to assessing nonlinear Granger causality in multivariate time series enables us to determine, in a model-free way, whether the causal relation between two time series is present or not and whether it is direct or mediated by other processes. The trace norm of the so-called covariance operator in feature space is used to measure the prediction error. Relying on this measure, we test the improvement of predictability between time series by subsampling-based multiple testing. The distributional properties of the resulting p-values reveal the direction of Granger causality. Experiments with simulated and real-world data show that our method provides encouraging results.

We discuss the problem of policy representation in stochastic
and partially observable systems, and address the case where
the policy is a hidden parameter of the planning problem. We
propose an adaptation of the Predictive State Representations
(PSRs) to this problem by introducing tests (sequences of actions
and observations) on policies. The new model, called
the Predictive Policy Representations (PPRs), is potentially
more compact than the other representations, such as decision
trees or Finite-State Controllers (FSCs). In this paper,
we show how PPRs can be used to improve the performances
of a point-based algorithm for DEC-POMDP.

Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data - the model parameters can be derived in closed form and sampling is easy. We demonstrate its usefulness by studying a simple neural representation model of natural images. For the first time, we are able to directly compare predictions from a pairwise maximum entropy model not only in small groups of neurons, but also in larger populations of more than thousand units. Our results indicate that in such larger networks interactions exist that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics extrem ely well up to the limit of dimensionality where estimation of the full joint distribution is feasible.

Dynamic system-based motor primitives [1] have
enabled robots to learn complex tasks ranging from Tennisswings
to locomotion. However, to date there have been only
few extensions which have incorporated perceptual coupling to
variables of external focus, and, furthermore, these modifications
have relied upon handcrafted solutions. Humans learn how
to couple their movement primitives with external variables.
Clearly, such a solution is needed in robotics.
In this paper, we propose an augmented version of the dynamic
systems motor primitives which incorporates perceptual coupling
to an external variable. The resulting perceptually driven motor
primitives include the previous primitives as a special case and
can inherit some of their interesting properties. We show that
these motor primitives can perform complex tasks such as Ball-in-a-Cup or Kendama task even with large variances in the initial
conditions where a skilled human player would be challenged. For
doing so, we initialize the motor primitives in the traditional way
by imitation learning without perceptual coupling. Subsequently,
we improve the motor primitives using a novel reinforcement
learning method which is particularly well-suited for motor
primitives.

Stimulus selectivity of sensory neurons is often characterized by estimating their receptive field properties such as orientation selectivity. Receptive fields are usually derived from the mean (or covariance) of the spike-triggered stimulus ensemble. This approach treats each spike as an independent message but does not take into account that information might be conveyed through patterns of neural activity that are distributed across space or time. Can we find a concise description for the processing of a whole population of neurons analogous to the receptive field for single neurons? Here, we present a generalization of the linear receptive field which is not bound to be triggered on individual spikes but can be meaningfully
linked to distributed response patterns. More precisely, we seek to identify those stimulus features and the corresponding patterns of neural activity that are most
reliably coupled. We use an extension of reverse-correlation methods based on canonical correlation analysis. The resulting population receptive fields span the
subspace of stimuli that is most informative about the population response. We evaluate our approach using both neuronal models and multi-electrode recordings from rabbit retinal ganglion cells. We show how the model can be extended to capture nonlinear stimulus-response relationships using kernel canonical correlation analysis, which makes it possible to test different coding mechanisms. Our technique can also be used to calculate receptive fields from multi-dimensional neural measurements such as those obtained from dynamic imaging methods.

Protein subcellular localization is a crucial ingredient to many important
inferences about cellular processes, including prediction of protein function
and protein interactions. While many predictive computational tools have been
proposed, they tend to have complicated architectures and require many design
decisions from the developer.
Here we utilize the multiclass support vector machine (m-SVM) method to directly
solve protein subcellular localization without resorting to the common approach
of splitting the problem into several binary classification problems. We
further propose a general class of protein sequence kernels which considers all
motifs, including motifs with gaps. Instead of heuristically selecting one or a few
kernels from this family, we utilize a recent extension of SVMs that optimizes
over multiple kernels simultaneously. This way, we automatically search over
families of possible amino acid motifs.
We compare our automated approach to three other predictors on four different
datasets, and show that we perform better than the current state of the art. Further, our method provides some insights as to which sequence motifs are most useful for determining subcellular ocalization, which are in agreement with biological
reasoning.

We present a theoretical study on the discriminative clustering framework, recently proposed for simultaneous subspace selection via linear discriminant analysis (LDA) and clustering. Empirical results have shown its favorable performance in comparison with several other popular clustering algorithms. However, the inherent relationship between subspace selection and clustering in this framework is not well understood, due to the iterative nature of the algorithm. We show in this paper that this iterative subspace selection and clustering is equivalent to kernel K-means with a specific kernel Gram matrix. This provides significant and new insights into the nature of this subspace selection procedure. Based on this equivalence relationship, we propose the Discriminative K-means (DisKmeans) algorithm for simultaneous LDA subspace selection and clustering, as well as an automatic parameter estimation procedure. We also present the nonlinear extension of DisKmeans using kernels. We show that the learning of the ke
rnel matrix over a convex set of pre-specified kernel matrices can be incorporated into the clustering formulation. The connection between DisKmeans and several other clustering algorithms is also analyzed. The presented theories and algorithms are evaluated through experiments on a collection of benchmark data sets.

Generalized linear models are the most commonly used tools to describe the stimulus selectivity of sensory neurons. Here we present a Bayesian treatment of such models. Using the expectation propagation algorithm, we are able to approximate the full posterior distribution over all weights. In addition, we use a Laplacian prior to favor sparse solutions. Therefore, stimulus features that do not critically influence neural activity will be assigned zero weights and thus be effectively excluded by the model. This feature selection mechanism facilitates both the interpretation of the neuron model as well as its predictive abilities. The posterior distribution can be used to obtain confidence intervals which makes it possible to assess the statistical significance of the solution. In neural data analysis, the available amount of experimental measurements is often limited whereas the parameter space is large. In such a situation, both regularization by a sparsity prior and uncertainty estimates for the model parameters are essential.
We apply our method to multi-electrode recordings of retinal ganglion cells and use our uncertainty estimate to test the statistical significance of functional couplings between neurons. Furthermore we used the sparsity of the Laplace prior to select those filters from a spike-triggered covariance analysis that are most informative about the neural response.

We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than those obtained by policy gradient methods such as REINFORCE. For several complex control tasks, including robust standing with a humanoid robot, we show that our method outperforms well-known algorithms from the fields of policy gradients, finite difference methods and population based heuristics. We also provide a detailed analysis of the differences between our method and the other algorithms.

We present an independence-based method for learning Bayesian network (BN) structure without making any assumptions on the probability distribution of the domain. This is mainly useful for continuous domains. Even mixed continuous-categorical domains and structures containing vectorial variables can be handled. We address the problem by developing a non-parametric conditional independence test based on the so-called kernel dependence measure, which can be readily used by any existing independence-based BN structure learning algorithm. We demonstrate the structure learning of graphical models in continuous and mixed domains from real-world data without distributional assumptions. We also experimentally show that our test is a good alternative, in particular in case of small sample sizes, compared to existing tests, which can only be used in purely categorical or continuous domains.

Clustering is often formulated as a discrete optimization problem. The objective is to find, among all partitions of the data set, the best one according to some quality measure. However, in the statistical setting where we assume that the finite data set has been sampled from some underlying space, the goal is not to find the best partition of the given sample, but to approximate the true partition of the underlying space. We argue that the discrete optimization approach usually does not achieve this goal. As an alternative, we suggest the paradigm of nearest neighbor clustering&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lsquo;&amp;amp;amp;amp;amp;amp;amp;amp;amp;amp;lsquo;. Instead of selecting the best out of all partitions of the sample, it only considers partitions in some restricted function class. Using tools from statistical learning theory we prove that nearest neighbor clustering is statistically consistent. Moreover, its worst case complexity is polynomial by co
nstructi
on, and
it can b
e implem
ented wi
th small
average
case co
mplexity
using b
ranch an
d bound.

Whereas kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC). The resulting test costs O(m^2), where m is the sample size. We demonstrate that this test outperforms established contingency table-based tests. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists.

We present Fitness Expectation Maximization (FEM), a novel method for performing black box function optimization. FEM searches the fitness landscape of an objective function using an instantiation of the well-known Expectation Maximization algorithm, producing search points to match the sample distribution weighted according to higher expected fitness. FEM updates both candidate solution parameters and the search policy, which is represented as a multinormal distribution. Inheriting EMs stability and strong guarantees, the method is both elegant and competitive with some of the best heuristic search methods in the field, and performs well on a number of unimodal and multimodal benchmark tasks. To illustrate the potential practical applications of the approach, we also show experiments on finding the parameters for a controller of the challenging non-Markovian double pole balancing task.

Attributed graphs are increasingly more common in many application
domains such as chemistry, biology and text processing.
A central issue in graph mining is how to collect informative subgraph
patterns for a given learning task.
We propose an iterative mining method based on
partial least squares regression (PLS).
To apply PLS to graph data, a sparse version of PLS is developed first
and then it is combined with a weighted pattern mining algorithm.
The mining algorithm is iteratively called with different weight
vectors, creating one latent component per one mining call.
Our method, graph PLS, is efficient and easy to implement, because the
weight vector is updated with elementary matrix calculations.
In experiments, our graph PLS algorithm showed
competitive prediction accuracies in many chemical datasets and its
efficiency was significantly superior to graph boosting (gboost) and the
naive method based on frequent graph mining.

In International Conference of the IEEE Engineering in Medicine and Biology Society, pages: 1939-1942, IEEE, Piscataway, NJ, USA, 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), August 2008 (inproceedings)

Abstract

A new control scheme is developed in this paper for a bilateral teleoperation system for microsurgical applications. The main objective of the proposed control scheme is to enhance the kinesthetic perception of the operator. First, the kinesthetic perception, based on psychophysics, is classified into three metrics of detection, sensitivity of detection, and discrimination. Additionally, a new performance index is introduced as a combination of these three metrics to quantify the kinesthetic performance. Second, modified macro-micro bilateral control system using an impedance-shaping method is proposed. The proposed controller can increase kinesthetic perception by shaping and magnifying the transmitted impedance to the operator. Finally, the performance of the proposed controller is verified in a comparison with the two-channel position-position (PP) controller, the two-channel force-position (FP) controller, and the four-channel transparency- optimized controller.

Kernel canonical correlation analysis (KCCA) is a dimensionality
reduction technique for paired data. By finding directions that
maximize correlation, KCCA learns representations that are more closely
tied to the underlying semantics of the data rather than noise. However,
meaningful directions are not only those that have high correlation to another
modality, but also those that capture the manifold structure of the
data. We propose a method that is simultaneously able to find highly
correlated directions that are also located on high variance directions
along the data manifold. This is achieved by the use of semi-supervised
Laplacian regularization of KCCA. We show experimentally that Laplacian
regularized training improves class separation over KCCA with only
Tikhonov regularization, while causing no degradation in the correlation
between modalities. We propose a model selection criterion based on
the Hilbert-Schmidt norm of the semi-supervised Laplacian regularized
cross-covariance operator, which we compute in closed form.

Accurate models of the robot dynamics allow the design of significantly more precise, energy-efficient and more compliant computed torque control for robots. However, in some cases the accuracy of rigid-body models does not suffice for sound control performance due to unmodeled nonlinearities such as hydraulic cables, complex friction, or actuator dynamics. In such cases, learning the models from data poses an interesting alternative and estimating the dynamics model using regression techniques becomes an important problem. However, the most accurate regression methods, e.g. Gaussian processes regression (GPR) and support vector regression (SVR), suffer from exceptional high computational complexity which prevents their usage for large numbers of samples or online learning to date. We proposed an approximation to the standard GPR using local Gaussian processes models. Due to reduced computational cost, local Gaussian processes (LGP) is capable for an online learning. Comparisons with other nonparametric regre
ssions, e.g. standard GPR, SVR and locally weighted projection regression (LWPR), show that LGP has higher accuracy than LWPR and close to the performance of standard GPR and SVR while being sufficiently fast for online learning.

In International Conference of the IEEE Engineering in Medicine and Biology Society, pages: 3241-3244, IEEE, Piscataway, NJ, USA, 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS), August 2008 (inproceedings)

Abstract

Success of telesurgical operations depends on better position tracking ability of the slave device. Improved position tracking of the slave device can lead to safer and less strenuous telesurgical operations. The two-channel force-position control architecture is widely used for better position tracking ability. This architecture requires force sensors for direct force feedback. Force sensors may not be a good choice in the telesurgical environment because of the inherent noise, and limitation in the deployable place and space. Hence, environment force estimation is developed using the concept of the robot function parameter matrix and a recursive least squares method. Simulation results show efficacy of the proposed method. The slave device successfully tracks the position of the master device, and the estimation error quickly becomes negligible.

This paper presents a novel protocol for the accuracy assessment of thematic maps obtained by the classification of very high resolution images. As the thematic accuracy alone is not sufficient to adequately characterize the geometrical properties of classification maps, we propose a novel protocol that is based on the analysis of two families of indexes: (i) the traditional thematic accuracy indexes, and (ii) a set of geometric indexes that characterize different geometric properties of the objects recognized in the map. These indexes can be used in the training phase of a classifier for identifying the parameters values that optimize classification results on the basis of a multi-objective criterion. Experimental results obtained on Quickbird images show the effectiveness of the proposed protocol in selecting classification maps characterized by better tradeoff between thematic and geometric accuracy with respect to standard accuracy measures.

A recent trend in exemplar based unsupervised
learning is to formulate the learning
problem as a convex optimization problem.
Convexity is achieved by restricting the set
of possible prototypes to training exemplars.
In particular, this has been done for clustering,
vector quantization and mixture model
density estimation. In this paper we propose
a novel algorithm that is theoretically and
practically superior to these convex formulations.
This is possible by posing the unsupervised
learning problem as a single convex
master problem" with non-convex subproblems.
We show that for the above learning
tasks the subproblems are extremely wellbehaved
and can be solved efficiently.

This paper presents a novel approach to feature selection for the classification of hyperspectral images. The proposed approach aims at selecting a subset of the original set of features that exhibits two main properties:( i) high capability to discriminate among the considered classes, (ii) high invariance (stationarity) in the spatial domain of the investigated scene. The feature selection is accomplished by defining a multi-objective criterion that considers two terms: (i) a term that assesses the class separability, (ii) a term that evaluates the spatial invariance of the selected features. The multi-objective problem is solved by an evolutionary algorithm that estimates the Pareto-optimal solutions. Experiments carried out on a hyperspectral image acquired by the Hyperion sensor confirmed the effectiveness of the proposed technique.

The central issue in representing graph-structured data instances in learning algorithms is designing features which are invariant to permuting the numbering of the vertices. We present a new system of invariant graph features which we call the skew spectrum of graphs. The skew spectrum is based on mapping the adjacency matrix of any (weigted, directed, unlabeled) graph to a function on the symmetric group and computing bispectral invariants. The reduced form of the skew spectrum is computable in O(n3) time, and experiments show that on several benchmark datasets it can outperform state of the art graph kernels.

In this paper, we investigate stability-based methods
for cluster model selection, in particular to select
the number K of clusters. The scenario under
consideration is that clustering is performed
by minimizing a certain clustering quality function,
and that a unique global minimizer exists. On
the one hand we show that stability can be upper
bounded by certain properties of the optimal clustering,
namely by the mass in a small tube around
the cluster boundaries. On the other hand, we provide
counterexamples which show that a reverse
statement is not true in general. Finally, we give
some examples and arguments why, from a theoretic
point of view, using clustering stability in a
high sample setting can be problematic. It can be
seen that distribution-free guarantees bounding the
difference between the finite sample stability and
the true stability cannot exist, unless one makes
strong assumptions on the underlying distribution.

We relate compressed sensing (CS) with Bayesian experimental design and provide a novel efficient approximate method for the latter, based on expectation propagation.
In a large comparative study about linearly measuring natural images, we show that the simple standard heuristic of measuring wavelet coefficients top-down systematically
outperforms CS methods using random measurements; the sequential projection optimisation approach of (Ji &amp;amp;amp; Carin, 2007) performs even worse. We also show that our
own approximate Bayesian method is able to learn measurement filters on full images efficiently which ouperform the wavelet heuristic. To our knowledge, ours is
the first successful attempt at "learning compressed sensing" for images of realistic size. In contrast to common CS methods, our framework is not restricted to sparse signals, but can
readily be applied to other notions of signal complexity or noise models. We give concrete ideas how our method can be scaled up to large signal representations.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems