We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one rst has to construct a graph on the data points and then
apply a graph clustering algorithm to nd a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) in uences the outcome of the nal clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to innity. It turns out that the limit values of the same objective function are systematically dierent on dierent types of graphs. This implies that clustering results systematically depend on the graph and can be very dierent for dierent types of graph. We provide examples to illustrate the implications on spectral clustering.

2013

PDFDOI[BibTex]
rst has to construct a graph on the data points and then
apply a graph clustering algorithm to nd a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) in uences the outcome of the nal clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to innity. It turns out that the limit values of the same objective function are systematically dierent on dierent types of graphs. This implies that clustering results systematically depend on the graph and can be very dierent for dierent types of graph. We provide examples to illustrate the implications on spectral clustering.&p[url]=http://ei.is.tuebingen.mpg.de/person/ule?tx_sevenpack_pi1%255Bshow_abstracts%255D=0&amp;tx_sevenpack_pi1%255Bshow_keywords%255D=0&amp;tx_sevenpack_pi1%255Bexport%255D=edoc/maiervh2012&p[summary]=How the result of graph clustering methods depends on the construction of the graph&p[images][0]=http://ps.staging.is.tuebingen.mpg.de/uploads/publication/image/495/thumb_lg_pami.jpg %>" onclick="popupCenter($(this).attr('href'), '', 580, 470); return false;" class="popup social_facebook"></a> -->
<!-- <a href="#" onclick="fb_share('We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one rst has to construct a graph on the data points and then
apply a graph clustering algorithm to nd a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) in uences the outcome of the nal clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to innity. It turns out that the limit values of the same objective function are systematically dierent on dierent types of graphs. This implies that clustering results systematically depend on the graph and can be very dierent for dierent types of graph. We provide examples to illustrate the implications on spectral clustering.', 'http://ei.is.tuebingen.mpg.de/person/ule?tx_sevenpack_pi1%255Bshow_abstracts%255D=0&amp;tx_sevenpack_pi1%255Bshow_keywords%255D=0&amp;tx_sevenpack_pi1%255Bexport%255D=edoc/maiervh2012', 'http://staging.is.tuebingen.mpg.de/assets/home/am_home-5c82e9f63cc81d6ae8884feb8adb256e.jpg', 'How the result of graph clustering methods depends on the construction of the graph')" class="popup social_facebook"></a> -->
</li>
<li>
<a href="http://twitter.com/home?status=@MPI_IS_Tue - We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one rst has to construct a graph on the data points and then
apply a graph clustering algorithm to nd a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) in uences the outcome of the nal clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to innity. It turns out that the limit values of the same objective function are systematically dierent on dierent types of graphs. This implies that clustering results systematically depend on the graph and can be very dierent for dierent types of graph. We provide examples to illustrate the implications on spectral clustering.: http://ei.is.tuebingen.mpg.de/person/ule?tx_sevenpack_pi1%255Bshow_abstracts%255D=0&amp;tx_sevenpack_pi1%255Bshow_keywords%255D=0&amp;tx_sevenpack_pi1%255Bexport%255D=edoc/maiervh2012" onclick="popupCenter($(this).attr('href'), '', 580, 470); return false;" class="popup social_twitter"></a>
</li>
<li>
<a href="http://www.linkedin.com/shareArticle?mini=true&amp;url=http://ei.is.tuebingen.mpg.de/person/ule?tx_sevenpack_pi1%255Bshow_abstracts%255D=0&amp;tx_sevenpack_pi1%255Bshow_keywords%255D=0&amp;tx_sevenpack_pi1%255Bexport%255D=edoc/maiervh2012&amp;title=We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one rst has to construct a graph on the data points and then
apply a graph clustering algorithm to nd a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) in uences the outcome of the nal clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to innity. It turns out that the limit values of the same objective function are systematically dierent on dierent types of graphs. This implies that clustering results systematically depend on the graph and can be very dierent for dierent types of graph. We provide examples to illustrate the implications on spectral clustering. &amp;summary=How the result of graph clustering methods depends on the construction of the graph" onclick="popupCenter($(this).attr('href'), '', 580, 470); return false;" class="popup social_linkedin"></a>
</li>
<li>
<a href="https://plus.google.com/share?url=We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one rst has to construct a graph on the data points and then
apply a graph clustering algorithm to nd a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) in uences the outcome of the nal clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to innity. It turns out that the limit values of the same objective function are systematically dierent on dierent types of graphs. This implies that clustering results systematically depend on the graph and can be very dierent for dierent types of graph. We provide examples to illustrate the implications on spectral clustering. %20http://ei.is.tuebingen.mpg.de/person/ule?tx_sevenpack_pi1%255Bshow_abstracts%255D=0&amp;tx_sevenpack_pi1%255Bshow_keywords%255D=0&amp;tx_sevenpack_pi1%255Bexport%255D=edoc/maiervh2012" onclick="popupCenter($(this).attr('href'), '', 580, 470); return false;" class="popup social_googleplus"></a>
</li>
<li>
<a href="mailto:?subject=We study the scenario of graph-based clustering algorithms such as spectral clustering. Given a set of data points, one rst has to construct a graph on the data points and then
apply a graph clustering algorithm to nd a suitable partition of the graph. Our main question is if and how the construction of the graph (choice of the graph, choice of parameters, choice of weights) in uences the outcome of the nal clustering result. To this end we study the convergence of cluster quality measures such as the normalized cut or the Cheeger cut on various kinds of random geometric graphs as the sample size tends to innity. It turns out that the limit values of the same objective function are systematically dierent on dierent types of graphs. This implies that clustering results systematically depend on the graph and can be very dierent for dierent types of graph. We provide examples to illustrate the implications on spectral clustering. &amp;body=http://ei.is.tuebingen.mpg.de/person/ule?tx_sevenpack_pi1%255Bshow_abstracts%255D=0&amp;tx_sevenpack_pi1%255Bshow_keywords%255D=0&amp;tx_sevenpack_pi1%255Bexport%255D=edoc/maiervh2012" class="social_mail"></a>
</li>
</ul>
</div>

We examine whether the quality of dierent clustering algorithms can be compared by a
general, scientically sound procedure which is independent of particular clustering algorithms.
We argue that the major obstacle is the diculty in evaluating a clustering algorithm without
taking into account the context: why does the user cluster his data in the rst place, and what
does he want to do with the clustering afterwards? We argue that clustering should not be
treated as an application-independent mathematical problem, but should always be studied
in the context of its end-use. Dierent techniques to evaluate clustering algorithms have to be
developed for dierent uses of clustering. To simplify this procedure we argue that it will be
useful to build a \taxonomy of clustering problems" to identify clustering applications which
can be treated in a unied way and that such an eort will be more fruitful than attempting
the impossible | developing \optimal" domain-independent clustering algorithms or even
classifying clustering algorithms in terms of how they work.

PDF[BibTex]
erent clustering algorithms can be compared by a
general, scientically sound procedure which is independent of particular clustering algorithms.
We argue that the major obstacle is the diculty in evaluating a clustering algorithm without
taking into account the context: why does the user cluster his data in the rst place, and what
does he want to do with the clustering afterwards? We argue that clustering should not be
treated as an application-independent mathematical problem, but should always be studied
in the context of its end-use. Dierent techniques to evaluate clustering algorithms have to be
developed for dierent uses of clustering. To simplify this procedure we argue that it will be
useful to build a \taxonomy of clustering problems&quot; to identify clustering applications which
can be treated in a unied way and that such an eort will be more fruitful than attempting
the impossible | developing \optimal&quot; domain-independent clustering algorithms or even
classifying clustering algorithms in terms of how they work.&p[url]=http://ei.is.tuebingen.mpg.de/person/ule?tx_sevenpack_pi1%255Bshow_abstracts%255D=0&amp;tx_sevenpack_pi1%255Bshow_keywords%255D=0&amp;tx_sevenpack_pi1%255Bexport%255D=edoc/6331&p[summary]=Clustering: Science or Art?&p[images][0]=http://ps.staging.is.tuebingen.mpg.de/uploads/publication/image/495/thumb_lg_pami.jpg %>" onclick="popupCenter($(this).attr('href'), '', 580, 470); return false;" class="popup social_facebook"></a> -->
<!-- <a href="#" onclick="fb_share('We examine whether the quality of dierent clustering algorithms can be compared by a
general, scientically sound procedure which is independent of particular clustering algorithms.
We argue that the major obstacle is the diculty in evaluating a clustering algorithm without
taking into account the context: why does the user cluster his data in the rst place, and what
does he want to do with the clustering afterwards? We argue that clustering should not be
treated as an application-independent mathematical problem, but should always be studied
in the context of its end-use. Dierent techniques to evaluate clustering algorithms have to be
developed for dierent uses of clustering. To simplify this procedure we argue that it will be
useful to build a \taxonomy of clustering problems&quot; to identify clustering applications which
can be treated in a unied way and that such an eort will be more fruitful than attempting
the impossible | developing \optimal&quot; domain-independent clustering algorithms or even
classifying clustering algorithms in terms of how they work.', 'http://ei.is.tuebingen.mpg.de/person/ule?tx_sevenpack_pi1%255Bshow_abstracts%255D=0&amp;tx_sevenpack_pi1%255Bshow_keywords%255D=0&amp;tx_sevenpack_pi1%255Bexport%255D=edoc/6331', 'http://staging.is.tuebingen.mpg.de/assets/home/am_home-5c82e9f63cc81d6ae8884feb8adb256e.jpg', 'Clustering: Science or Art?')" class="popup social_facebook"></a> -->
</li>
<li>
<a href="http://twitter.com/home?status=@MPI_IS_Tue - We examine whether the quality of dierent clustering algorithms can be compared by a
general, scientically sound procedure which is independent of particular clustering algorithms.
We argue that the major obstacle is the diculty in evaluating a clustering algorithm without
taking into account the context: why does the user cluster his data in the rst place, and what
does he want to do with the clustering afterwards? We argue that clustering should not be
treated as an application-independent mathematical problem, but should always be studied
in the context of its end-use. Dierent techniques to evaluate clustering algorithms have to be
developed for dierent uses of clustering. To simplify this procedure we argue that it will be
useful to build a \taxonomy of clustering problems&quot; to identify clustering applications which
can be treated in a unied way and that such an eort will be more fruitful than attempting
the impossible | developing \optimal&quot; domain-independent clustering algorithms or even
classifying clustering algorithms in terms of how they work.: http://ei.is.tuebingen.mpg.de/person/ule?tx_sevenpack_pi1%255Bshow_abstracts%255D=0&amp;tx_sevenpack_pi1%255Bshow_keywords%255D=0&amp;tx_sevenpack_pi1%255Bexport%255D=edoc/6331" onclick="popupCenter($(this).attr('href'), '', 580, 470); return false;" class="popup social_twitter"></a>
</li>
<li>
<a href="http://www.linkedin.com/shareArticle?mini=true&amp;url=http://ei.is.tuebingen.mpg.de/person/ule?tx_sevenpack_pi1%255Bshow_abstracts%255D=0&amp;tx_sevenpack_pi1%255Bshow_keywords%255D=0&amp;tx_sevenpack_pi1%255Bexport%255D=edoc/6331&amp;title=We examine whether the quality of dierent clustering algorithms can be compared by a
general, scientically sound procedure which is independent of particular clustering algorithms.
We argue that the major obstacle is the diculty in evaluating a clustering algorithm without
taking into account the context: why does the user cluster his data in the rst place, and what
does he want to do with the clustering afterwards? We argue that clustering should not be
treated as an application-independent mathematical problem, but should always be studied
in the context of its end-use. Dierent techniques to evaluate clustering algorithms have to be
developed for dierent uses of clustering. To simplify this procedure we argue that it will be
useful to build a \taxonomy of clustering problems&quot; to identify clustering applications which
can be treated in a unied way and that such an eort will be more fruitful than attempting
the impossible | developing \optimal&quot; domain-independent clustering algorithms or even
classifying clustering algorithms in terms of how they work. &amp;summary=Clustering: Science or Art?" onclick="popupCenter($(this).attr('href'), '', 580, 470); return false;" class="popup social_linkedin"></a>
</li>
<li>
<a href="https://plus.google.com/share?url=We examine whether the quality of dierent clustering algorithms can be compared by a
general, scientically sound procedure which is independent of particular clustering algorithms.
We argue that the major obstacle is the diculty in evaluating a clustering algorithm without
taking into account the context: why does the user cluster his data in the rst place, and what
does he want to do with the clustering afterwards? We argue that clustering should not be
treated as an application-independent mathematical problem, but should always be studied
in the context of its end-use. Dierent techniques to evaluate clustering algorithms have to be
developed for dierent uses of clustering. To simplify this procedure we argue that it will be
useful to build a \taxonomy of clustering problems&quot; to identify clustering applications which
can be treated in a unied way and that such an eort will be more fruitful than attempting
the impossible | developing \optimal&quot; domain-independent clustering algorithms or even
classifying clustering algorithms in terms of how they work. %20http://ei.is.tuebingen.mpg.de/person/ule?tx_sevenpack_pi1%255Bshow_abstracts%255D=0&amp;tx_sevenpack_pi1%255Bshow_keywords%255D=0&amp;tx_sevenpack_pi1%255Bexport%255D=edoc/6331" onclick="popupCenter($(this).attr('href'), '', 580, 470); return false;" class="popup social_googleplus"></a>
</li>
<li>
<a href="mailto:?subject=We examine whether the quality of dierent clustering algorithms can be compared by a
general, scientically sound procedure which is independent of particular clustering algorithms.
We argue that the major obstacle is the diculty in evaluating a clustering algorithm without
taking into account the context: why does the user cluster his data in the rst place, and what
does he want to do with the clustering afterwards? We argue that clustering should not be
treated as an application-independent mathematical problem, but should always be studied
in the context of its end-use. Dierent techniques to evaluate clustering algorithms have to be
developed for dierent uses of clustering. To simplify this procedure we argue that it will be
useful to build a \taxonomy of clustering problems&quot; to identify clustering applications which
can be treated in a unied way and that such an eort will be more fruitful than attempting
the impossible | developing \optimal&quot; domain-independent clustering algorithms or even
classifying clustering algorithms in terms of how they work. &amp;body=http://ei.is.tuebingen.mpg.de/person/ule?tx_sevenpack_pi1%255Bshow_abstracts%255D=0&amp;tx_sevenpack_pi1%255Bshow_keywords%255D=0&amp;tx_sevenpack_pi1%255Bexport%255D=edoc/6331" class="social_mail"></a>
</li>
</ul>
</div>

We study the family of p-resistances on graphs for p ≥ 1. This family generalizes the standard resistance distance. We prove that for any fixed graph, for p=1, the p-resistance coincides with the shortest path distance, for p=2 it coincides with the standard resistance distance, and for p → ∞ it converges to the inverse of the minimal s-t-cut in the graph. Secondly, we consider the special case of random geometric graphs (such as k-nearest neighbor graphs) when the number n of vertices in the graph tends to infinity. We prove that an interesting phase-transition takes place. There exist two critical thresholds p^* and p^** such that if p < p^*, then the p-resistance depends on meaningful global properties of the graph, whereas if p > p^**, it only depends on trivial local quantities and does not convey any useful information. We can explicitly compute the critical values: p^* = 1 + 1/(d-1) and p^** = 1 + 1/(d-2) where d is the dimension of the underlying space (we believe that the fact that there is a small gap between p^* and p^** is an artifact of our proofs. We also relate our findings to Laplacian regularization and suggest to use q-Laplacians as regularizers, where q satisfies 1/p^* + 1/q = 1.

Statistical learning theory provides the theoretical basis for many of today's machine learning algorithms and is arguably one of the most beautifully developed
branches of artificial intelligence in general. It originated in Russia in the 1960s and gained wide popularity in the 1990s following the development of the so-called Support Vector Machine (SVM), which has become a standard tool for pattern recognition in a variety of domains ranging from computer vision to computational
biology. Providing the basis of new learning algorithms, however, was not the only motivation for developing statistical learning theory. It was just as much
a philosophical one, attempting to answer the question of what it is that allows us to draw valid conclusions from empirical data. In this article we attempt to give a gentle, non-technical overview over the key ideas and insights of statistical learning theory. We do not assume that the reader has a deep background in mathematics, statistics, or computer science. Given the nature of the subject matter, however, some familiarity with mathematical
concepts and notations and some intuitive understanding of basic probability is required. There exist many excellent references to more technical surveys of the mathematics of statistical learning theory: the monographs by one of the founders of statistical learning theory ([Vapnik, 1995], [Vapnik, 1998]), a brief overview over statistical learning theory in Section 5 of [Sch{\"o}lkopf and Smola, 2002], more technical overview papers such as [Bousquet et al., 2003], [Mendelson, 2003], [Boucheron et al., 2005], [Herbrich and Williamson, 2002], and the monograph [Devroye et al.,
1996].

Nearest neighbor ($k$-NN) graphs are widely used in machine learning and data mining applications, and our aim is to better understand what they reveal about the cluster structure of the unknown underlying distribution of points. Moreover, is it possible to identify spurious structures that might arise due to sampling variability? Our first contribution is a statistical analysis that reveals how certain subgraphs of a $k$-NN graph form a consistent estimator of the cluster tree of the underlying distribution of points. Our second and perhaps most important contribution is the following finite sample guarantee. We carefully work out the tradeoff between aggressive and conservative pruning and are able to guarantee the removal of all spurious cluster structures while at the same time guaranteeing the recovery of salient clusters. This is the first such finite sample result in the context of clustering.

We derive a generalized notion of f-divergences, called (f,l)-divergences. We show that this generalization enjoys many of the nice properties of f-divergences, although it is a richer family. It also provides alternative definitions of standard divergences in terms of surrogate risks. As a first practical application of this theory, we derive a new estimator for the Kulback-Leibler divergence that we use for clustering sets of vectors.

The commute distance between two vertices in a graph is the expected time it takes a random walk to travel from the first to the second vertex and back. We study the
behavior of the commute distance as the size of the underlying graph increases. We prove that the commute distance converges to an expression that does not take
into account the structure of the graph at all and that is completely meaningless as a distance function on the graph. Consequently, the use of the raw commute distance for machine learning purposes is strongly discouraged for large graphs and in high dimensions. As an alternative we introduce the amplified commute distance that corrects for the undesired large sample effects.

Foundations and Trends in Machine Learning, 2(3):235-274, July 2010 (article)

Abstract

A popular method for selecting the number of clusters is based on stability arguments: one chooses the number of clusters such that the corresponding clustering results are "most stable". In recent years, a series of papers has analyzed the behavior of this method from a theoretical point of view. However, the results are very technical and difficult to interpret for non-experts. In this paper we give a high-level overview about the existing literature on clustering stability. In addition to presenting the results in a slightly informal but accessible way, we relate them to each other and discuss their different implications.

We consider the problem of local graph clustering
where the aim is to discover the local cluster corresponding
to a point of interest. The most popular algorithms to solve
this problem start a random walk at the point of interest and
let it run until some stopping criterion is met. The vertices
visited are then considered the local cluster. We suggest a more
powerful alternative, the multi-agent random walk. It consists
of several agents connected by a fixed rope of length l. All
agents move independently like a standard random walk on
the graph, but they are constrained to have distance at most l
from each other. The main insight is that for several agents it is
harder to simultaneously travel over the bottleneck of a graph
than for just one agent. Hence, the multi-agent random walk
has less tendency to mistakenly merge two different clusters
than the original random walk. In our paper we analyze
the multi-agent random walk theoretically and compare it
experimentally to the major local graph clustering algorithms
from the literature. We find that our multi-agent random walk
consistently outperforms these algorithms.

Clustering is often formulated as a discrete optimization problem. The objective is to
find, among all partitions of the data set, the best one according to some quality measure.
However, in the statistical setting where we assume that the finite data set has been sampled
from some underlying space, the goal is not to find the best partition of the given
sample, but to approximate the true partition of the underlying space. We argue that the
discrete optimization approach usually does not achieve this goal, and instead can lead to
inconsistency. We construct examples which provably have this behavior. As in the case
of supervised learning, the cure is to restrict the size of the function classes under consideration.
For appropriate small function classes we can prove very general consistency
theorems for clustering optimization schemes. As one particular algorithm for clustering
with a restricted function space we introduce nearest neighbor clustering. Similar to the
k-nearest neighbor classifier in supervised learning, this algorithm can be seen as a general
baseline algorithm to minimize arbitrary clustering objective functions. We prove that it
is statistically consistent for all commonly used clustering objective functions.

Graph clustering methods such as spectral clustering are defined for general weighted graphs. In machine learning, however, data often is not given in form of a graph, but in terms of similarity (or distance) values between points. In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. In this paper
we investigate the influence of the construction of the similarity graph on the clustering results. We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. In plain words:
Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. We also provide examples which show that these differences can be observed for toy and real data already for rather small sample sizes.

We study clustering algorithms based on neighborhood graphs on a random sample of data points. The question we ask is how such a graph should be constructed in order to obtain optimal clustering results. Which type of neighborhood graph should one choose, mutual k-nearest-neighbor or symmetric k-nearest-neighbor? What is the optimal parameter k? In our setting, clusters are defined as connected components of the t-level set of the underlying probability distribution. Clusters are said to be identified in the neighborhood graph if connected components in the graph correspond to the true underlying clusters. Using techniques from random geometric graph theory, we prove bounds on the probability that clusters are identified successfully, both in a noise-free and in a noisy setting. Those bounds lead to several conclusions. First, k has to be chosen surprisingly high (rather of the order n than of the order logn) to maximize the probability of cluster identification. Secondly, the major difference between the mutual and the symmetric k-nearest-neighbor graph occurs when one attempts to detect the most significant cluster only.

We generalize traditional goals of clustering towards distinguishing components in a non-parametric mixture model. The clusters are not necessarily based on point locations, but on higher order criteria. This framework can be implemented by embedding probability distributions
in a Hilbert space. The corresponding clustering objective is very general and relates to a range of common clustering concepts.

We present a geometric method to determine confidence sets for the
ratio E(Y)/E(X) of the means of random variables X and Y. This
method reduces the problem of constructing confidence sets for the
ratio of two random variables to the problem of constructing
confidence sets for the means of one-dimensional random variables. It
is valid in a large variety of circumstances. In the case of normally
distributed random variables, the so constructed confidence sets
coincide with the standard Fieller confidence sets. Generalizations of
our construction lead to definitions of exact and conservative
confidence sets for very general classes of distributions, provided
the joint expectation of (X,Y) exists and the linear combinations of
the form aX + bY are well-behaved. Finally, our geometric method
allows to derive a very simple bootstrap approach for constructing
conservative confidence sets for ratios which perform favorably in
certain situations, in particular in the asymmetric heavy-tailed
regime.

In this paper, we investigate stability-based methods
for cluster model selection, in particular to select
the number K of clusters. The scenario under
consideration is that clustering is performed
by minimizing a certain clustering quality function,
and that a unique global minimizer exists. On
the one hand we show that stability can be upper
bounded by certain properties of the optimal clustering,
namely by the mass in a small tube around
the cluster boundaries. On the other hand, we provide
counterexamples which show that a reverse
statement is not true in general. Finally, we give
some examples and arguments why, from a theoretic
point of view, using clustering stability in a
high sample setting can be problematic. It can be
seen that distribution-free guarantees bounding the
difference between the finite sample stability and
the true stability cannot exist, unless one makes
strong assumptions on the underlying distribution.

Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems