2015

An event-based state estimation approach for reducing communication in a networked control system is proposed. Multiple distributed sensor-actuator-agents observe a dynamic process and sporadically exchange their measurements and inputs over a bus network. Based on these data, each agent estimates the full state of the dynamic system, which may exhibit arbitrary inter-agent couplings. Local event-based protocols ensure that data is transmitted only when necessary to meet a desired estimation accuracy. This event-based scheme is shown to mimic a centralized Luenberger observer design up to guaranteed bounds, and stability is proven in the sense of bounded estimation errors for bounded disturbances. The stability result extends to the distributed control system that results when the local state estimates are used for distributed feedback control. Simulation results highlight the benefit of the event-based approach over classical periodic ones in reducing communication requirements.

2008

Discovery of knowledge from geometric graph databases is of particular importance in chemistry and
biology, because chemical compounds and proteins are represented as graphs with 3D geometric coordinates. In
such applications, scientists are not interested in the statistics of the whole database. Instead they need information
about a novel drug candidate or protein at hand, represented as a query graph. We propose a polynomial-delay
algorithm for geometric frequent subgraph retrieval. It enumerates all subgraphs of a single given query graph
which are frequent geometric epsilon-subgraphs under the entire class of rigid geometric transformations in a database.
By using geometric epsilon-subgraphs, we achieve tolerance against variations in geometry. We compare the proposed
algorithm to gSpan on chemical compound data, and we show that for a given minimum support the total number
of frequent patterns is substantially limited by requiring geometric matching. Although the computation time per
pattern is larger than for non-geometric graph mining, the total time is within a reasonable level even for small
minimum support.

We investigate an implicit method to compute a piecewise linear representation of a surface from a
set of sample points. As implicit surface functions we use the weighted sum of piecewise linear kernel functions.
For such a function we can partition Rd in such a way that these functions are linear on the subsets of the partition.
For each subset in the partition we can then compute the zero level set of the function exactly as the intersection of
a hyperplane with the subset.

We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously
cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms
work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a
more informative visualization of complex data than simple clustering; in addition, taking into account the relations
between different clusters is shown to substantially improve the quality of the clustering, when compared
with state-of-the-art algorithms in the literature (both spectral clustering and a previous dependence maximization
approach). We demonstrate our algorithm on image and text data.

In this report we present new algorithms for non-negative matrix approximation (NMA),
commonly known as the NMF problem. Our methods improve upon the well-known methods of Lee &
Seung [19] for both the Frobenius norm as well the Kullback-Leibler divergence versions of the problem.
For the latter problem, our results are especially interesting because it seems to have witnessed much
lesser algorithmic progress as compared to the Frobenius norm NMA problem. Our algorithms are
based on a particular block-iterative acceleration technique for EM, which preserves the multiplicative
nature of the updates and also ensures monotonicity. Furthermore, our algorithms also naturally apply
to the Bregman-divergence NMA algorithms of Dhillon and Sra [8]. Experimentally, we show that our
algorithms outperform the traditional Lee/Seung approach most of the time.

The Euclidean K-means problem is fundamental to clustering and over the years it has been
intensely investigated. More recently, generalizations such as Bregman k-means [8], co-clustering [10],
and tensor (multi-way) clustering [40] have also gained prominence. A well-known computational difficulty
encountered by these clustering problems is the NP-Hardness of the associated optimization task,
and commonly used methods guarantee at most local optimality. Consequently, approximation algorithms
of varying degrees of sophistication have been developed, though largely for the basic Euclidean
K-means (or `1-norm K-median) problem. In this paper we present approximation algorithms for several
Bregman clustering problems by building upon the recent paper of Arthur and Vassilvitskii [5]. Our algorithms
obtain objective values within a factor O(logK) for Bregman k-means, Bregman co-clustering,
Bregman tensor clustering, and weighted kernel k-means. To our knowledge, except for some special
cases, approximation algorithms have not been considered for these general clustering problems. There
are several important implications of our work: (i) under the same assumptions as Ackermann et al. [1]
it yields a much faster algorithm (non-exponential in K, unlike [1]) for information-theoretic clustering,
(ii) it answers several open problems posed by [4], including generalizations to Bregman co-clustering,
and tensor clustering, (iii) it provides practical and easy to implement methodsin contrast to several
other common approximation approaches.

We study the question of activity classification in videos and present a novel approach for recognizing
human action categories in videos by combining information from appearance and motion of human body parts.
Our approach uses a tracking step which involves Particle Filtering and a local non - parametric clustering step.
The motion information is provided by the trajectory of the cluster modes of a local set of particles. The statistical
information about the particles of that cluster over a number of frames provides the appearance information. Later
we use a Bag ofWords model to build one histogram per video sequence from the set of these robust appearance
and motion descriptors. These histograms provide us characteristic information which helps us to discriminate
among various human actions and thus classify them correctly.
We tested our approach on the standard KTH and Weizmann human action datasets and the results were comparable
to the state of the art. Additionally our approach is able to distinguish between activities that involve the
motion of complete body from those in which only certain body parts move. In other words, our method discriminates
well between activities with gross motion like running, jogging etc. and local motion like waving,
boxing etc.

This paper proposes a framework for single-image super-resolution and JPEG artifact removal.
The underlying idea is to learn a map from input low-quality images (suitably preprocessed low-resolution or
JPEG encoded images) to target high-quality images based on example pairs of input and output images. To
retain the complexity of the resulting learning problem at a moderate level, a patch-based approach is taken such
that kernel ridge regression (KRR) scans the input image with a small window (patch) and produces a patchvalued
output for each output pixel location. These constitute a set of candidate images each of which reflects
different local information. An image output is then obtained as a convex combination of candidates for each
pixel based on estimated confidences of candidates. To reduce the time complexity of training and testing for
KRR, a sparse solution is found by combining the ideas of kernel matching pursuit and gradient descent. As a
regularized solution, KRR leads to a better generalization than simply storing the examples as it has been done
in existing example-based super-resolution algorithms and results in much less noisy images. However, this may
introduce blurring and ringing artifacts around major edges as sharp changes are penalized severely. A prior model
of a generic image class which takes into account the discontinuity property of images is adopted to resolve this
problem. Comparison with existing super-resolution and JPEG artifact removal methods shows the effectiveness
of the proposed method. Furthermore, the proposed method is generic in that it has the potential to be applied to
many other image enhancement applications.

Unsupervised time-series segmentation in the general scenario in which the number of segment-types
and segment boundaries are a priori unknown is a fundamental problem in many applications and requires an accurate segmentation model as well as a way of determining an appropriate number of segment-types.
In most approaches, segmentation and determination of number of segment-types are addressed
in two separate steps, since the segmentation model assumes a predefined number of segment-types.
The determination of number of segment-types is thus achieved by training and comparing several separate models. In this paper, we take a Bayesian approach to a segmentation model based on linear Gaussian state-space models to achieve structure selection within the model. An appropriate prior distribution on the parameters is used to enforce a sparse parametrization, such that the model automatically selects the smallest number of underlying dynamical systems that explain the data well and a parsimonious structure for each dynamical system. As the resulting model is computationally intractable, we introduce a variational approximation, in which a reformulation of the problem enables to use an efficient inference algorithm.

This report summarizes the theory and some main applications of a new non-monotonic algorithm for
maximizing a Poisson Likelihood, which for Positron Emission Tomography (PET) is equivalent to minimizing
the associated Kullback-Leibler Divergence, and for Transmission Tomography is similar to maximizing the dual
of a maximum entropy problem. We call our method non-monotonic maximum likelihood (NMML) and show
its application to different problems such as tomography and image restoration. We discuss some theoretical
properties such as convergence for our algorithm. Our experimental results indicate that speedups obtained via our
non-monotonic methods are substantial.

We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time
approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg.~a Banach space). We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.

This technical report is merely an extended version of the appendix of Steinke et.al. "Manifold-valued Thin-Plate Splines with Applications in
Computer Graphics" (2008) with complete proofs,
which had to be omitted due to space restrictions. This technical report requires a basic knowledge of differential
geometry. However, apart from that requirement the technical report is self-contained.

Real-time control of the endeffector of a humanoid robot in external coordinates requires
computationally efficient solutions of the inverse kinematics problem. In this context, this
paper investigates methods of resolved motion rate control (RMRC) that employ optimization
criteria to resolve kinematic redundancies. In particular we focus on two established techniques,
the pseudo inverse with explicit optimization and the extended Jacobian method. We prove that
the extended Jacobian method includes pseudo-inverse methods as a special solution. In terms of
computational complexity, however, pseudo-inverse and extended Jacobian differ significantly in
favor of pseudo-inverse methods. Employing numerical estimation techniques, we introduce a
computationally efficient version of the extended Jacobian with performance comparable to the
original version. Our results are illustrated in simulation studies with a multiple degree-offreedom
robot, and were evaluated on an actual 30 degree-of-freedom full-body humanoid robot.

2007

Abstract. This paper considers kernels invariant to translation, rotation and dilation. We show that no non-trivial
positive definite (p.d.) kernels exist which are radial and dilation invariant, only conditionally positive definite
(c.p.d.) ones. Accordingly, we discuss the c.p.d. case and provide some novel analysis, including an elementary
derivation of a c.p.d. representer theorem. On the practical side, we give a support vector machine (s.v.m.) algorithm
for arbitrary c.p.d. kernels. For the thin-plate kernel this leads to a classifier with only one parameter (the
amount of regularisation), which we demonstrate to be as effective as an s.v.m. with the Gaussian kernel, even
though the Gaussian involves a second parameter (the length scale).

(TR-07-47), University of Texas, Austin, TX, USA, September 2007 (techreport)

Abstract

Several important machine learning problems can be modeled and solved via semidefinite programs. Often, researchers invoke off-the-shelf software for the associated optimization, which can be inappropriate for many applications due to computational and storage requirements. In this paper, we introduce the use of convex perturbations for semidefinite programs (SDPs). Using a particular perturbation function, we arrive
at an algorithm for SDPs that has several advantages over existing techniques: a) it is simple, requiring only a few lines of MATLAB, b) it is a first-order method which makes it scalable, c) it can easily exploit the structure of a particular SDP to gain efficiency (e.g., when the constraint matrices are low-rank). We demonstrate on several machine learning applications that the proposed algorithm is effective in finding fast approximations to large-scale SDPs.

Most existing sparse Gaussian process (g.p.) models seek computational advantages by basing their
computations on a set of m basis functions that are the covariance function of the g.p. with one of its two inputs
fixed. We generalise this for the case of Gaussian covariance function, by basing our computations on m Gaussian
basis functions with arbitrary diagonal covariance matrices (or length scales). For a fixed number of basis
functions and any given criteria, this additional flexibility permits approximations no worse and typically better
than was previously possible. Although we focus on g.p. regression, the central idea is applicable to all kernel
based algorithms, such as the support vector machine. We perform gradient based optimisation of the marginal
likelihood, which costs O(m2n) time where n is the number of data points, and compare the method to various
other sparse g.p. methods. Our approach outperforms the other methods, particularly for the case of very few basis
functions, i.e. a very high sparsity ratio.

Recent years have seen huge advances in object recognition from images. Recognition rates beyond 95% are the rule rather than the exception on many datasets. However, most state-of-the-art methods can only decide if an object is present or not. They are not able to provide information on the object location or extent within in the image.
We report on a simple yet powerful scheme that extends many existing recognition methods to also perform localization of object bounding boxes. This is achieved by maximizing the classification score over all possible subrectangles in the image. Despite the impression that this would be computationally intractable, we show that in many situations efficient algorithms exist which solve a generalized maximum subrectangle problem.
We show how our method is applicable to a variety object detection frameworks and demonstrate its performance by applying it to the popular bag of visual words model, achieving competitive results on the PASCAL VOC 2006 dataset.

Assume we are given a sample of points from some underlying
distribution which contains several distinct clusters. Our goal is
to construct a neighborhood graph on the sample points such that
clusters are ``identified&lsquo;&lsquo;: that is, the subgraph induced by points
from the same cluster is connected, while subgraphs corresponding to
different clusters are not connected to each other. We derive bounds
on the probability that cluster identification is successful, and
use them to predict ``optimal&lsquo;&lsquo; values of k for the mutual and
symmetric k-nearest-neighbor graphs. We point out different
properties of the mutual and symmetric nearest-neighbor graphs
related to the cluster identification problem.

We describe two related models to cluster multidimensional time-series under the assumption of an underlying linear Gaussian dynamical process. In the first model, times-series are assigned to the same cluster when they show global similarity in their dynamics, while in the second model times-series are assigned to the same cluster when they show simultaneous similarity. Both models are based on Dirichlet Mixtures of Bayesian Linear Gaussian State-Space Models in order to (semi) automatically determine an appropriate number of components in the mixture, and to additionally bias the components to a parsimonious parameterization. The resulting models are formally intractable and to deal with this we describe a deterministic approximation based on a novel implementation of Variational Bayes.

This paper presents a fully automated algorithm for reconstructing a textured 3D model of a face from a single photograph or a raw video stream. The algorithm is based on a combination of Support Vector Machines (SVMs) and a Morphable Model of 3D faces. After SVM face detection, individual facial features are detected using a novel regression-and classification-based approach, and probabilistically plausible configurations of features are selected to produce a list of candidates for several facial feature positions. In the next step, the configurations of feature points are evaluated using a novel criterion that is based on a Morphable Model and a
combination of linear projections. Finally, the feature points initialize a model-fitting procedure of the Morphable Model. The result is a high-resolution 3D surface model.

This technical report describes a cute idea of how to create new policy search approaches. It directly relates to the Natural Actor-Critic methods but allows the derivation of one shot solutions. Future work may include the application to interesting problems.

We introduce a modified Kalman filter that performs robust, real-time outlier detection, without the need for manual parameter tuning by the user. Systems that rely on high quality sensory data (for instance, robotic systems) can be sensitive to data containing outliers. The standard Kalman filter is not robust to outliers, and other variations of the Kalman filter have been proposed to overcome this issue. However, these methods may require manual parameter tuning, use of heuristics or complicated parameter estimation procedures. Our Kalman filter uses a weighted least squares-like approach by introducing weights for each data sample. A data sample with a smaller weight has a weaker contribution when estimating the current time step?s state. Using an incremental variational Expectation-Maximization framework, we learn the weights and system dynamics. We evaluate our Kalman filter algorithm on data from a robotic dog.

Discussions about different graph Laplacian, mainly normalized and
unnormalized versions of graph Laplacian, have been ardent with
respect to various methods in clustering and graph based
semi-supervised learning. Previous research on graph Laplacians
investigated their convergence properties to Laplacian operators
on continuous manifolds. There is still no strong proof on
convergence for the normalized Laplacian. In this paper, we
analyze different variants of graph Laplacians directly from the
ways solving the original graph partitioning problem. The graph
partitioning problem is a well-known combinatorial NP hard
optimization problem. The spectral solutions provide evidence that
normalized Laplacian encodes more reasonable considerations for
graph partitioning. We also provide some examples to show their
differences.

In many applications, relationships among objects of interest are more complex than pairwise. Simply approximating complex relationships as pairwise ones can lead to loss of information. An alternative for these applications is to analyze complex relationships among data directly, without the need to first represent the complex relationships into pairwise ones. A natural way to describe complex relationships is to use hypergraphs. A
hypergraph is a graph in which edges can connect more than two vertices. Thus we consider learning from a hypergraph, and develop a general framework which is applicable to classification and clustering for complex relational data. We have applied our framework to real-world web classification problems and obtained encouraging results.

We propose an independence criterion based on the eigenspectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator (we term this a Hilbert-Schmidt Independence Criterion, or HSIC). This approach has several advantages, compared with previous kernel-based independence criteria. First, the empirical estimate is simpler than any other kernel dependence test, and requires no user-defined regularisation. Second, there is a clearly defined population quantity which the empirical estimate approaches in the large sample limit, with exponential convergence guaranteed between the two: this ensures that independence tests based on HSIC do not suffer from slow learning rates.
Finally, we show in the context of independent component analysis (ICA) that the performance of HSIC is competitive with that of previously published kernel-based criteria, and of other recently published ICA methods.

Gaussian process (GP) priors have been successfully used in non-parametric Bayesian regression and classification models. Inference can be performed analytically only for the regression model with Gaussian noise. For all other likelihood models inference is intractable and various approximation techniques have been proposed. In recent years
expectation-propagation (EP) has been developed as a general method for approximate inference. This article provides a general summary of how expectation-propagation can be used for approximate
inference in Gaussian process models. Furthermore we present a case study describing its implementation for a new robust variant of
Gaussian process regression. To gain further insights into the quality of the EP approximation we present experiments in which we compare to results obtained by Markov chain Monte Carlo (MCMC) sampling.

In this paper we are concerned with the optimal combination of features of possibly different types for detection and estimation tasks in machine vision. We propose to combine features such that the resulting classifier maximizes the margin between classes. In
contrast to existing approaches which are non-convex and/or generative we propose to use a discriminative model leading to convex problem formulation and complexity control.
Furthermore we assert that decision functions should not compare apples and oranges by comparing features of different types directly. Instead we propose to combine different similarity measures for each different
feature type. Furthermore we argue that the question: Which feature type is more discriminative for task X? is ill-posed and show empirically that the answer to this question might depend on the complexity of the decision function.

Presented at the PASCAL workshop on clustering, London, 2005 (techreport)

Abstract

The goal of this paper is to discuss statistical aspects of clustering
in a framework where the data to be clustered has been sampled
from some unknown probability distribution. Firstly, the clustering of
the data set should reveal some structure of the underlying data rather
than model artifacts due to the random sampling process. Secondly, the
more sample points we have, the more reliable the clustering should be.
We discuss which methods can and cannot be used to tackle those problems.
In particular we argue that generalization bounds as they are used
in statistical learning theory of classification are unsuitable in a general
clustering framework. We suggest that the main replacements of generalization
bounds should be convergence proofs and stability considerations.
This paper should be considered as a road map paper which identifies important
questions and potentially fruitful directions for future research
about statistical clustering. We do not attempt to present a complete
statistical theory of clustering.

In psychophysical studies the psychometric function is used to model the relation between the physical stimulus intensity and the observer's ability to detect or discriminate between stimuli of different intensities. In this report we propose the use of Bayesian inference to extract the information contained in experimental data estimate the parameters of psychometric functions. Since Bayesian inference cannot be performed analytically we describe how a Markov chain Monte Carlo method can be used to generate samples from the posterior distribution over parameters. These samples are used to estimate Bayesian confidence intervals and other characteristics of the posterior distribution. In addition we discuss the
parameterisation of psychometric functions and the role of prior distributions in the analysis. The proposed approach is exemplified using artificially generate
d data and in a case study for real experimental data. Furthermore, we compare our approach with traditional methods based on maximum-likelihood parameter estimation combined with bootstrap techniques for confidence interval estimation. The appendix provides a description of an implementation for the R environment for statistical computing and provides the code for reproducing the results discussed in the experiment section.

The relation between BOLD signal and neural activity is still poorly understood. The Gaussian Linear Model known as GLM is broadly used in many fMRI data analysis for recovering the underlying neural activity. Although GLM has been proved to be a really useful tool for analyzing fMRI data it can not be used for describing the complex biophysical process of neural metabolism. In this technical report we make use of a system of Stochastic Differential Equations that is based on Buxton model [1] for describing the underlying computational principles of hemodynamic process. Based on this SDE we built a Kalman Filter estimator so as to estimate the induced neural signal as well as the blood inflow under physiologic and sensor noise. The performance of Kalman Filter estimator is investigated under different physiologic noise characteristics and measurement frequencies.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems