We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one rst has to construct a graph on the data points and then
apply a graph clustering algorithm to nd a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) in uences the outcome of the nal clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to innity. It turns out that the limit values of the same objective function are systematically dierent on dierent types of graphs. This implies that clustering results systematically depend on the graph and can be very dierent for dierent types of graph. We provide examples to illustrate the implications on spectral clustering.

2010

We study nonparametric regression between Riemannian manifolds based on regularized empirical risk minimization. Regularization functionals for mappings between manifolds should respect the geometry of input and output manifold and be independent of the chosen parametrization of the manifolds. We define and analyze the three most simple regularization functionals with these properties and present a rather general scheme for solving the resulting optimization problem. As application examples we discuss interpolation on the sphere, fingerprint processing, and correspondence computations between three-dimensional surfaces. We conclude with characterizing interesting and sometimes counterintuitive implications and new open problems that are specific to learning between Riemannian manifolds and are not encountered in multivariate regression in Euclidean space.

The commute distance between two vertices in a graph is the expected time it takes a random walk to travel from the first to the second vertex and back. We study the
behavior of the commute distance as the size of the underlying graph increases. We prove that the commute distance converges to an expression that does not take
into account the structure of the graph at all and that is completely meaningless as a distance function on the graph. Consequently, the use of the raw commute distance for machine learning purposes is strongly discouraged for large graphs and in high dimensions. As an alternative we introduce the amplified commute distance that corrects for the undesired large sample effects.

Graph clustering methods such as spectral clustering are defined for general weighted graphs. In machine learning, however, data often is not given in form of a graph, but in terms of similarity (or distance) values between points. In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. In this paper
we investigate the influence of the construction of the similarity graph on the clustering results. We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. In plain words:
Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. We also provide examples which show that these differences can be observed for toy and real data already for rather small sample sizes.

We study clustering algorithms based on neighborhood graphs on a random sample of data points. The question we ask is how such a graph should be constructed in order to obtain optimal clustering results. Which type of neighborhood graph should one choose, mutual k-nearest-neighbor or symmetric k-nearest-neighbor? What is the optimal parameter k? In our setting, clusters are defined as connected components of the t-level set of the underlying probability distribution. Clusters are said to be identified in the neighborhood graph if connected components in the graph correspond to the true underlying clusters. Using techniques from random geometric graph theory, we prove bounds on the probability that clusters are identified successfully, both in a noise-free and in a noisy setting. Those bounds lead to several conclusions. First, k has to be chosen surprisingly high (rather of the order n than of the order logn) to maximize the probability of cluster identification. Secondly, the major difference between the mutual and the symmetric k-nearest-neighbor graph occurs when one attempts to detect the most significant cluster only.

This paper discusses non-parametric regression between Riemannian manifolds. This learning problem arises frequently in many application areas ranging from signal processing, computer vision, over robotics to computer graphics. We present a new algorithmic scheme for the solution of this general learning problem based on regularized empirical risk minimization. The regularization functional takes into account the geometry of input and output manifold, and we show that it implements a prior which is particularly natural. Moreover, we demonstrate that our algorithm performs well in a difficult surface registration problem.

2008

We present a generalization of thin-plate splines for interpolation and approximation of manifold-valued data, and
demonstrate its usefulness in computer graphics with several applications from different fields. The cornerstone
of our theoretical framework is an energy functional for mappings between two Riemannian manifolds which
is independent of parametrization and respects the geometry of both manifolds. If the manifolds are Euclidean,
the energy functional reduces to the classical thin-plate spline energy. We show how the resulting optimization
problems can be solved efficiently in many cases. Our example applications range from orientation interpolation
and motion planning in animation over geometric modelling tasks to color interpolation.

This technical report is merely an extended version of the appendix of Steinke et.al. "Manifold-valued Thin-Plate Splines with Applications in
Computer Graphics" (2008) with complete proofs,
which had to be omitted due to space restrictions. This technical report requires a basic knowledge of differential
geometry. However, apart from that requirement the technical report is self-contained.

With the help of differential geometry we describe a framework to define a thin-plate spline like energy for maps between arbitrary Riemannian manifolds. The so-called Eells energy only depends on the intrinsic geometry of the input and output manifold, but not on their respective representation. The energy can then be used for regression between manifolds, we present results for cases where the outputs are rotations, sets of angles, or points on 3D surfaces. In the future we plan to also target regression where the output is an element of "shape space", understood as a Riemannian manifold. One could also further explore the meaning of the Eells energy when applied to diffeomorphisms between shapes, especially with regard to its potential use as a distance measure between shapes that does not depend on the embedding or the parametrisation of the shapes.

2007

Given a sample from a probability measure with support on a submanifold in Euclidean space one can construct a neighborhood graph which can be seen as an approximation of the submanifold. The graph Laplacian of such a graph is used in several machine learning methods like semi-supervised learning, dimensionality reduction and clustering. In this paper we determine the pointwise limit of three different graph Laplacians used in the literature as the sample size increases and the neighborhood size approaches zero. We show that for a uniform measure on the submanifold all graph Laplacians have the same limit up to constants. However in the case of a non-uniform measure on the submanifold only the so called random walk graph Laplacian converges to the weighted Laplace-Beltrami operator.

We consider the problem of denoising a noisily sampled submanifold $M$ in $R^d$, where the submanifold $M$
is a priori unknown and we are only given a noisy point sample. The presented denoising algorithm is based
on a graph-based diffusion process of the point sample. We analyze this diffusion process using recent results about
the convergence of graph Laplacians. In the experiments we show that our method is capable of dealing with
non-trivial high-dimensional noise. Moreover using the denoising algorithm as pre-processing method we
can improve the results of a semi-supervised learning algorithm.

A natural representation of data are the parameters which generated the data. If the parameter space is continuous we can regard it as a manifold. In practice we usually do not know this manifold but we just
have some representation of the data, often in a very high-dimensional feature space. Since the number of internal parameters does not
change with the representation, the data will effectively lie on a low-dimensional submanifold in feature space. Due to measurement errors this data is usually corrupted by noise which particularly in high-dimensional feature spaces makes it almost impossible to find the manifold structure.
This paper reviews a method called Manifold Denoising which projects
the data onto the submanifold using a diffusion process on a graph generated by the data. We will demonstrate
that the method is capable of dealing with non-trival high-dimensional noise. Moreover we will show that using
the method as a preprocessing step one can significantly improve the results of a semi-supervised learning algorithm.

Assume we are given a sample of points from some underlying
distribution which contains several distinct clusters. Our goal is
to construct a neighborhood graph on the sample points such that
clusters are ``identified&amp;lsquo;&amp;lsquo;: that is, the subgraph induced by points
from the same cluster is connected, while subgraphs corresponding to
different clusters are not connected to each other. We derive bounds
on the probability that cluster identification is successful, and
use them to predict ``optimal&amp;lsquo;&amp;lsquo; values of k for the mutual and
symmetric k-nearest-neighbor graphs. We point out different
properties of the mutual and symmetric nearest-neighbor graphs
related to the cluster identification problem.

Assume we are given a sample of points from some underlying
distribution which contains several distinct clusters. Our goal is
to construct a neighborhood graph on the sample points such that
clusters are ``identified&lsquo;&lsquo;: that is, the subgraph induced by points
from the same cluster is connected, while subgraphs corresponding to
different clusters are not connected to each other. We derive bounds
on the probability that cluster identification is successful, and
use them to predict ``optimal&lsquo;&lsquo; values of k for the mutual and
symmetric k-nearest-neighbor graphs. We point out different
properties of the mutual and symmetric nearest-neighbor graphs
related to the cluster identification problem.

The regularization functional induced by the graph Laplacian of a random
neighborhood graph based on the data is adaptive in two ways. First it adapts to an underlying
manifold structure and second to the density of the data-generating probability measure.
We identify in this paper the limit of the regularizer and show
uniform convergence over the space of Hoelder functions. As an intermediate
step we derive upper bounds on the covering numbers of Hoelder functions on
compact Riemannian manifolds, which are of independent interest
for the theoretical analysis of manifold-based learning methods.

We present a new method to estimate the intrinsic dimensionality of a submanifold M in Euclidean space from random samples. The method is based on the
convergence rates of a certain U-statistic on the manifold. We solve at least partially the question of the choice of the scale of the data.
Moreover the proposed method is easy to implement, can handle large data sets and performs very well even for small sample sizes. We compare the
proposed method to two standard estimators on several artificial as well as real data sets.

Journal of Computer and System Sciences, 71(3):333-359, October 2005 (article)

Abstract

In order to apply the maximum margin method in arbitrary metric
spaces, we suggest to embed the metric space into a Banach or
Hilbert space and to perform linear classification in this space.
We propose several embeddings and recall that an isometric embedding
in a Banach space is always possible while an isometric embedding in
a Hilbert space is only possible for certain metric spaces. As a
result, we obtain a general maximum margin classification
algorithm for arbitrary metric spaces (whose solution is
approximated by an algorithm of Graepel.
Interestingly enough, the embedding approach, when applied to a metric
which can be embedded into a Hilbert space, yields the SVM
algorithm, which emphasizes the fact that its solution depends on
the metric and not on the kernel. Furthermore we give upper bounds
of the capacity of the function classes corresponding to both
embeddings in terms of Rademacher averages. Finally we compare the
capacities of these function classes directly.

In the machine learning community it is generally believed that
graph Laplacians corresponding to a finite sample of data points
converge to a continuous Laplace operator if the sample size
increases. Even though this assertion serves as a justification for many
Laplacian-based algorithms, so far only some aspects of this claim
have been rigorously proved. In this paper we close this gap by
establishing the strong pointwise consistency of a family of
graph Laplacians with data-dependent weights to some
weighted Laplace operator. Our investigation also
includes the important case where the data lies on a submanifold of
$R^d$.

We investigate the problem of defining Hilbertian metrics resp.
positive definite kernels on probability measures, continuing previous work. This type of kernels has shown very good
results in text classification and has a wide range of possible
applications. In this paper we extend the two-parameter family of
Hilbertian metrics of Topsoe such that it now includes all
commonly used Hilbertian metrics on probability measures. This
allows us to do model selection among these metrics in an elegant
and unified way. Second we investigate further our approach to
incorporate similarity information of the probability space into
the kernel. The analysis provides a better understanding of these
kernels and gives in some cases a more efficient way to compute
them. Finally we compare all proposed kernels in two text and two
image classification problems.

2004

We investigate the problem of defining Hilbertian metrics resp.
positive definite kernels on probability measures, continuing previous work. This type of kernels has shown very good
results in text classification and has a wide range of possible
applications. In this paper we extend the two-parameter family of
Hilbertian metrics of Topsoe such that it now includes all
commonly used Hilbertian metrics on probability measures. This
allows us to do model selection among these metrics in an elegant
and unified way. Second we investigate further our approach to
incorporate similarity information of the probability space into
the kernel. The analysis provides a better understanding of these
kernels and gives in some cases a more efficient way to compute
them. Finally we compare all proposed kernels in two text and one
image classification problem.

This paper gives a survey of results in the mathematical
literature on positive definite kernels and their associated
structures. We concentrate on properties which seem potentially
relevant for Machine Learning and try to clarify some results that
have been misused in the literature. Moreover we consider
different lines of generalizations of positive definite kernels.
Namely we deal with operator-valued kernels and present the
general framework of Hilbertian subspaces of Schwartz which we use
to introduce kernels which are distributions. Finally indefinite
kernels and their associated reproducing kernel spaces are
considered.

In this article we construct a maximal margin classification algorithm for arbitrary metric spaces. At first we show that the Support Vector Machine (SVM) is a maximal margin algorithm for the class of metric spaces where the negative squared distance is conditionally positive definite (CPD). This means that the metric space can be isometrically embedded into a Hilbert space, where one performs linear maximal margin separation. We will show that the solution only depends on the metric, but not on the kernel. Following the framework we develop for the SVM, we construct an algorithm for maximal margin classification in arbitrary metric spaces. The main difference compared with SVM is that we no longer embed isometrically into a Hilbert space, but a Banach space. We further give an estimate of the capacity of the function class involved in this algorithm via Rademacher averages. We recover an algorithm of Graepel et al. [6].

We address in this paper the question of how the knowledge of the marginal distribution $P(x)$ can be incorporated in a learning algorithm. We suggest three theoretical methods for taking into account this distribution for regularization and provide links to existing graph-based semi-supervised learning algorithms. We also propose practical implementations.

2002

We describe in this article a new code for evolving
axisymmetric isolated systems in general relativity. Such systems are described by asymptotically flat space-times, which have the property that they admit a conformal extension. We are working directly in the extended conformal manifold and solve numerically Friedrich's conformal field equations, which state that Einstein's equations hold in the physical space-time. Because of the compactness of the conformal space-time the entire space-time can be calculated on a finite numerical grid. We describe in detail the numerical scheme, especially the treatment of the axisymmetry and the boundary.

2002

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems