2012

Subjects operating a brain-computer interface (BCI) based on sensorimotor rhythms exhibit large variations in performance over the course of an experimental session. Here, we show that high-frequency gamma-oscillations, originating in fronto-parietal networks, predict such variations on a trial-to-trial basis. We interpret this nding as empirical support for an in uence of attentional networks on BCI-performance via modulation of the sensorimotor rhythm.

We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concentration of martingale values. Our second approach is based on integration of Hoeffding-Azuma inequality with PAC-Bayesian analysis. We also introduce a way to apply PAC-Bayesian analysis in situation of limited feedback. We combine the new tools to derive PAC-Bayesian generalization and regret bounds for the multiarmed bandit problem. Although our regret bound is not yet as tight as state-of-the-art regret bounds based on other well-established techniques, our results significantly expand the range of potential applications of PAC-Bayesian analysis and introduce a new analysis tool to reinforcement learning and many other fields, where martingales and limited feedback are encountered.

Taking a sharp photo at several megapixel resolution traditionally
relies on high grade lenses. In this paper, we present an approach to alleviate
image degradations caused by imperfect optics. We rely on a calibration step
to encode the optical aberrations in a space-variant point spread function and
obtain a corrected image by non-stationary deconvolution. By including the
Bayer array in our image formation model, we can perform demosaicing as part
of the deconvolution.

We present a probabilistic viewpoint to multiple kernel learning unifying well-known regularised risk approaches and recent advances in approximate Bayesian inference relaxations. The framework proposes a general objective function suitable for regression, robust regression and classification that is lower bound of the marginal likelihood and contains many regularised risk approaches as special cases. Furthermore, we derive an efficient and provably convergent optimisation algorithm.

We study statistical detection of grayscale objects in noisy images. The object of
interest is of unknown shape and has an unknown intensity, that can be varying over the object
and can be negative. No boundary shape constraints are imposed on the object, only a weak bulk
condition for the object's interior is required. We propose an algorithm that can be used to detect
grayscale objects of unknown shapes in the presence of nonparametric noise of unknown level. Our
algorithm is based on a nonparametric multiple testing procedure.
We establish the limit of applicability of our method via an explicit, closed-form, non-asymptotic
and nonparametric consistency bound. This bound is valid for a wide class of nonparametric noise
distributions. We achieve this by proving an uncertainty principle for percolation on nite lattices.

Within the unmanageably large class of nonconvex optimization, we consider the rich subclass of nonsmooth problems having composite objectives (this includes the extensively studied convex, composite objective problems as
a special case). For this subclass, we introduce a powerful, new framework that permits asymptotically non-vanishing perturbations. In particular, we develop perturbation-based batch and incremental (online like) nonconvex proximal splitting algorithms. To our knowledge, this is the rst time that such perturbation-based nonconvex splitting algorithms are being proposed and analyzed. While the main contribution of the paper is the theoretical framework, we complement our results by presenting some empirical results on matrix factorization.

In the series of our earlier papers on the subject, we proposed a novel statistical hy-
pothesis testing method for detection of objects in noisy images. The method uses results from
percolation theory and random graph theory. We developed algorithms that allowed to detect
objects of unknown shapes in the presence of nonparametric noise of unknown level and of un-
known distribution. No boundary shape constraints were imposed on the objects, only a weak
bulk condition for the object's interior was required. Our algorithms have linear complexity and
exponential accuracy. In the present paper, we describe an implementation of our nonparametric
hypothesis testing method. We provide a program that can be used for statistical experiments in
image processing. This program is written in the statistical programming language R.

We propose a novel algorithm to solve the expectation propagation relaxation of Bayesian inference for continuous-variable graphical models. In contrast to most previous algorithms, our method is provably convergent. By marrying convergent EP ideas from (Opper&amp;Winther 05) with covariance decoupling techniques (Wipf&amp;Nagarajan 08, Nickisch&amp;Seeger 09), it runs at least an order of magnitude faster than the most commonly used EP solver.

We formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. This formulation enables practical and theoretical comparison of different approaches to graph clustering as well as comparison of graph clustering with other possible ways to model the graph. We adapt the PAC-Bayesian analysis of co-clustering (Seldin and Tishby, 2008; Seldin, 2009) to derive a PAC-Bayesian generalization bound for graph clustering. The bound shows that graph clustering should optimize a trade-off between empirical data fit and the mutual information that clusters preserve on the graph nodes. A similar trade-off derived from information-theoretic considerations was already shown to produce state-of-the-art results in practice (Slonim et al., 2005; Yom-Tov and Slonim, 2009). This paper supports the empirical evidence by providing a better theoretical foundation, suggesting formal generalization guarantees, and offering
a more accurate way to deal with finite sample issues. We derive a bound minimization algorithm and show that it provides good results in real-life problems and that the derived PAC-Bayesian bound is reasonably tight.

We introduce several new formulations for sparse nonnegative matrix approximation. Subsequently,
we solve these formulations by developing generic algorithms. Further, to help selecting a particular sparse formulation,
we briefly discuss the interpretation of each formulation. Finally, preliminary experiments are presented
to illustrate the behavior of our formulations and algorithms.

We propose a novel statistical hypothesis testing method for detection of objects
in noisy images. The method uses results from percolation theory and random graph theory.
We present an algorithm that allows to detect objects of unknown shapes in the presence of
nonparametric noise of unknown level and of unknown distribution. No boundary shape constraints
are imposed on the object, only a weak bulk condition for the object's interior is required. The
algorithm has linear complexity and exponential accuracy and is appropriate for real-time systems.
In this paper, we develop further the mathematical formalism of our method and explore im-
portant connections to the mathematical theory of percolation and statistical physics. We prove
results on consistency and algorithmic complexity of our testing procedure. In addition, we address
not only an asymptotic behavior of the method, but also a nite sample performance of our test.

Many problems of low-level computer vision and image processing, such as denoising, deconvolution, tomographic reconstruction or super-resolution, can be addressed by maximizing the posterior distribution of a sparse linear model (SLM). We show how higher-order Bayesian decision-making problems, such as optimizing image acquisition in magnetic resonance scanners, can be addressed by querying the SLM posterior covariance, unrelated to the density's mode. We propose a scalable algorithmic framework, with which SLM posteriors over full, high-resolution images can be approximated for the first time, solving a variational optimization problem which is convex iff posterior mode finding is convex. These methods successfully drive the optimization of sampling trajectories for real-world magnetic resonance imaging through Bayesian experimental design, which has not been attempted before. Our methodology provides new insight into similarities and differences between sparse reconstruction and approximate Bayesian inference, and has important implications for compressive sensing of real-world images.

(UWEETR-1020-0003), University of Washington, Washington DC, USA, August 2010 (techreport)

Abstract

We propose a novel framework for graph-based cooperative regularization that uses submodular costs on graph edges. We introduce an efficient iterative algorithm to solve the resulting hard discrete optimization problem, and show that it has a guaranteed approximation factor. The edge-submodular formulation is amenable to the same extensions as standard graph cut approaches, and applicable to a range of problems. We apply this method to the image segmentation problem. Specifically, Here, we apply it to introduce a discount for homogeneous boundaries in binary image segmentation on very difficult images, precisely, long thin objects and color and grayscale images with a shading gradient. The experiments show that significant portions of previously truncated objects are now preserved.

We derive a number of methods to solve efficiently simple optimization problems subject to a totalvariation
(TV) regularization, under different norms of the TV operator and both for the case of 1-dimensional and
2-dimensional data. In spite of the non-smooth, non-separable nature of the TV terms considered, we show that
a dual formulation with strong structure can be derived. Taking advantage of this structure we develop adaptions
of existing algorithms from the optimization literature, resulting in efficient methods for the problem at hand.
Experimental results show that for 1-dimensional data the proposed methods achieve convergence within good
accuracy levels in practically linear time, both for L1 and L2 norms. For the more challenging 2-dimensional case
a performance of order O(N2 log2 N) for N x N inputs is achieved when using the L2 norm. A final section
suggests possible extensions and lines of further work.

Density modeling is notoriously difficult for high dimensional data. One approach to the problem is to search for a lower dimensional manifold which captures the main characteristics of the data. Recently, the Gaussian Process Latent Variable Model (GPLVM) has successfully been used to find low dimensional manifolds in a variety of complex data. The GPLVM consists of a set of points in a low dimensional latent space, and a stochastic map to the observed space. We show how it can be interpreted as a density model in the observed space. However, the GPLVM is not trained as a density model and therefore yields bad density estimates. We propose a new training strategy and obtain improved generalisation performance and better density estimates in comparative evaluations on several benchmark data sets.

We discuss generalized proximity operators (GPO) and their associated generalized projection problems.
On inputs of size n, we show how to efficiently apply GPOs and generalized projections for separable
norms and distance-like functions to accuracy e in O(n log(1/e)) time. We also derive projection algorithms that
run theoretically in O(n log n log(1/e)) time but can for suitable parameter ranges empirically outperform the
O(n log(1/e)) projection method. The proximity and projection tasks are either separable, and solved directly, or
are reduced to a single root-finding step. We highlight that as a byproduct, our analysis also yields an O(n log(1/e))
(weakly linear-time) procedure for Euclidean projections onto the l1;1-norm ball; previously only an O(n log n)
method was known. We provide empirical evaluation to illustrate the performance of our methods, noting that
for the l1;1-norm projection, our implementation is more than two orders of magnitude faster than the previously
known method.

We introduce a problem we call Cooperative cut, where the goal is to find a minimum-cost graph cut but where a submodular function is used to define the cost of a subsets of edges. That means, the cost of an edge that is added to the current cut set C depends on the edges in C. This generalization of the cost in the standard min-cut
problem to a submodular cost function immediately makes the problem harder. Not only do we prove NP hardness even for nonnegative submodular costs, but also show a lower bound of Omega(|V|^(1/3)) on the approximation factor for the problem. On the positive side, we propose and compare four approximation algorithms with an overall approximation factor of min { |V|/2, |C*|, O( sqrt(|E|) log |V|), |P_max|}, where C* is the optimal solution, and P_max
is the longest s, t path across the cut between given s, t. We also introduce additional heuristics for the problem which have attractive properties from the perspective of practical applications and implementations in that existing
fast min-cut libraries may be used as subroutines. Both our approximation algorithms, and our heuristics, appear
to do well in practice.

2009

Many successful applications of computer vision to image or video manipulation are interactive by nature. However, parameters of such systems are often trained neglecting the user. Traditionally, interactive systems have been treated in the same manner as their fully automatic counterparts. Their performance is evaluated by computing the accuracy of their solutions under some fixed set of user interactions. This paper proposes a new evaluation and learning method which brings the user in the loop. It is based on the use of an active robot user - a simulated model of a human user. We show how this approach can be used to evaluate and learn parameters of state-of-the-art interactive segmentation systems. We also show how simulated user models can be integrated into the popular max-margin method for parameter learning and propose an algorithm to solve the resulting optimisation problem.

We propose a novel probabilistic method for detection of objects in noisy images.
The method uses results from percolation and random graph theories. We present an algorithm
that allows to detect objects of unknown shapes in the presence of random noise. Our procedure
substantially differs from wavelets-based algorithms. The algorithm has linear complexity and exponential
accuracy and is appropriate for real-time systems. We prove results on consistency and
algorithmic complexity of our procedure.

We develop an incremental generalized expectation maximization (GEM) framework to model the multiframe blind deconvolution problem. A simplistic version of this problem was recently studied by Harmeling etal~cite{harmeling09}. We solve a more realistic version of this problem which includes the following major features: (i) super-resolution ability emph{despite} noise and unknown blurring; (ii) saturation-correction, i.e., handling of overexposed pixels that can otherwise confound the image processing; and (iii) simultaneous handling of color channels. These features are seamlessly integrated into our incremental GEM framework to yield simple but efficient multiframe blind deconvolution algorithms. We present technical details concerning critical steps of our algorithms, especially to highlight how all operations can be written using matrix-vector multiplications. We apply our algorithm to real-world images from astronomy and super resolution tasks. Our experimental results show that our methods yield improve
d resolution and deconvolution at the same time.

Ultimately being motivated by facilitating space-variant blind deconvolution, we present a class of linear transformations, that are expressive enough for space-variant filters, but at the same time especially designed for efficient matrix-vector-multiplications. Successful results on astronomical imaging through atmospheric turbulences and on noisy magnetic resonance images of constantly moving objects demonstrate the practical significance of our approach.

Many inference problems involving questions of optimality ask for the maximum or the minimum of a finite set of unknown quantities. This technical report derives the first two posterior moments of the maximum of two correlated Gaussian variables and the first two posterior moments of the two generating variables (corresponding to Gaussian approximations minimizing relative entropy). It is shown how this can be used to build a heuristic approximation to the maximum relationship over a finite set of Gaussian variables, allowing approximate inference by Expectation Propagation on such quantities.

Three simple and explicit procedures for testing the independence of two multi-dimensional random
variables are described. Two of the associated test statistics (L1, log-likelihood) are defined when the empirical
distribution of the variables is restricted to finite partitions. A third test statistic is defined as a kernel-based
independence measure. Two kinds of tests are provided. Distribution-free strong consistent tests are derived on the
basis of large deviation bounds on the test statistcs: these tests make almost surely no Type I or Type II error after
a random sample size. Asymptotically alpha-level tests are obtained from the limiting distribution of the test statistics.
For the latter tests, the Type I error converges to a fixed non-zero value alpha, and the Type II error drops to zero, for
increasing sample size. All tests reject the null hypothesis of independence if the test statistics become large. The
performance of the tests is evaluated experimentally on benchmark data.

Kernel Canonical Correlation Analysis is a very general technique for subspace learning that incorporates
PCA and LDA as special cases. Functional magnetic resonance imaging (fMRI) acquired data is naturally
amenable to these techniques as data are well aligned. fMRI data of the human brain is a particularly interesting
candidate. In this study we implemented various supervised and semi-supervised versions of KCCA on human
fMRI data, with regression to single- and multi-variate labels (corresponding to video content subjects viewed
during the image acquisition). In each variate condition, the semi-supervised variants of KCCA performed better
than the supervised variants, including a supervised variant with Laplacian regularization. We additionally analyze
the weights learned by the regression in order to infer brain regions that are important to different types of visual
processing.

We consider three general classes of data-driven statistical tests. Neyman's smooth
tests, data-driven score tests and data-driven score tests for statistical inverse problems serve as
important special examples for the classes of tests under consideration. Our tests are additionally
incorporated with model selection rules. The rules are based on the penalization idea. Most of the
optimal penalties, derived in statistical literature, can be used in our tests.
We prove general consistency theorems for the tests from those classes. Our proofs make use of
large deviations inequalities for deterministic and random quadratic forms.
The paper shows that the tests can be applied for simple and composite parametric, semi- and
nonparametric hypotheses. Applications to testing in statistical inverse problems and statistics for
stochastic processes are also presented..

2008

Discovery of knowledge from geometric graph databases is of particular importance in chemistry and
biology, because chemical compounds and proteins are represented as graphs with 3D geometric coordinates. In
such applications, scientists are not interested in the statistics of the whole database. Instead they need information
about a novel drug candidate or protein at hand, represented as a query graph. We propose a polynomial-delay
algorithm for geometric frequent subgraph retrieval. It enumerates all subgraphs of a single given query graph
which are frequent geometric epsilon-subgraphs under the entire class of rigid geometric transformations in a database.
By using geometric epsilon-subgraphs, we achieve tolerance against variations in geometry. We compare the proposed
algorithm to gSpan on chemical compound data, and we show that for a given minimum support the total number
of frequent patterns is substantially limited by requiring geometric matching. Although the computation time per
pattern is larger than for non-geometric graph mining, the total time is within a reasonable level even for small
minimum support.

We investigate an implicit method to compute a piecewise linear representation of a surface from a
set of sample points. As implicit surface functions we use the weighted sum of piecewise linear kernel functions.
For such a function we can partition Rd in such a way that these functions are linear on the subsets of the partition.
For each subset in the partition we can then compute the zero level set of the function exactly as the intersection of
a hyperplane with the subset.

We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously
cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms
work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a
more informative visualization of complex data than simple clustering; in addition, taking into account the relations
between different clusters is shown to substantially improve the quality of the clustering, when compared
with state-of-the-art algorithms in the literature (both spectral clustering and a previous dependence maximization
approach). We demonstrate our algorithm on image and text data.

In this paper we consider the problem of automatically learning the kernel from general kernel
classes. Specifically we build upon the Multiple Kernel Learning (MKL) framework and in particular on the work
of (Argyriou, Hauser, Micchelli, & Pontil, 2006). We will formulate a Semi-Infinite Program (SIP) to solve the
problem and devise a new algorithm to solve it (Infinite Kernel Learning, IKL). The IKL algorithm is applicable
to both the finite and infinite case and we find it to be faster and more stable than SimpleMKL (Rakotomamonjy,
Bach, Canu, & Grandvalet, 2007) for cases of many kernels. In the second part we present the first large scale
comparison of SVMs to MKL on a variety of benchmark datasets, also comparing IKL. The results show two
things: a) for many datasets there is no benefit in linearly combining kernels with MKL/IKL instead of the SVM
classifier, thus the flexibility of using more than one kernel seems to be of no use, b) on some datasets IKL yields
impressive increases in accuracy over SVM/MKL due to the possibility of using a largely increased kernel set. In
those cases, IKL remains practical, whereas both cross-validation or standard MKL is infeasible.

In this report we present new algorithms for non-negative matrix approximation (NMA),
commonly known as the NMF problem. Our methods improve upon the well-known methods of Lee &
Seung [19] for both the Frobenius norm as well the Kullback-Leibler divergence versions of the problem.
For the latter problem, our results are especially interesting because it seems to have witnessed much
lesser algorithmic progress as compared to the Frobenius norm NMA problem. Our algorithms are
based on a particular block-iterative acceleration technique for EM, which preserves the multiplicative
nature of the updates and also ensures monotonicity. Furthermore, our algorithms also naturally apply
to the Bregman-divergence NMA algorithms of Dhillon and Sra [8]. Experimentally, we show that our
algorithms outperform the traditional Lee/Seung approach most of the time.

The Euclidean K-means problem is fundamental to clustering and over the years it has been
intensely investigated. More recently, generalizations such as Bregman k-means [8], co-clustering [10],
and tensor (multi-way) clustering [40] have also gained prominence. A well-known computational difficulty
encountered by these clustering problems is the NP-Hardness of the associated optimization task,
and commonly used methods guarantee at most local optimality. Consequently, approximation algorithms
of varying degrees of sophistication have been developed, though largely for the basic Euclidean
K-means (or `1-norm K-median) problem. In this paper we present approximation algorithms for several
Bregman clustering problems by building upon the recent paper of Arthur and Vassilvitskii [5]. Our algorithms
obtain objective values within a factor O(logK) for Bregman k-means, Bregman co-clustering,
Bregman tensor clustering, and weighted kernel k-means. To our knowledge, except for some special
cases, approximation algorithms have not been considered for these general clustering problems. There
are several important implications of our work: (i) under the same assumptions as Ackermann et al. [1]
it yields a much faster algorithm (non-exponential in K, unlike [1]) for information-theoretic clustering,
(ii) it answers several open problems posed by [4], including generalizations to Bregman co-clustering,
and tensor clustering, (iii) it provides practical and easy to implement methodsin contrast to several
other common approximation approaches.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems