Characterization of Linkage Based Clustering

Slides

Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.

Description

There are a wide variety of clustering algorithms that, when run on the same data, often produce
very different clusterings. Yet there is no principled method to guide the selection of a clustering
algorithm. The choice of an appropriate clustering is, of course, task dependent. As such, we must
rely on domain knowledge. The challenge is to communicate such knowledge between the domain
expert and the algorithm designer. One approach to providing guidance to clustering users in the
selection of a clustering algorithm is to identify important properties that a user may want an algorithm
to satisfy, and determine which algorithms satisfy each of these properties. Clustering users
can then utilize prior knowledge to determine the properties that make sense for their application.

Ultimately, there would be a sufficiently rich set of properties that would provide detailed enough
guidelines for a wide variety of clustering users. For a property to be useful, a user needs to be
able to easily determine the desirability of the property. Such a description of clustering algorithms
would yield principled guidelines for clustering algorithm selection by answering a series of simple
questions. Bosagh Zadeh and Ben-David [1] make progress in this direction by providing a set
of abstract properties that characterize single linkage. In this work, we give another result in the
same direction by characterizing a family of clustering algorithms. These are initial steps toward the
ambitious program of developing broad guidelines for clustering algorithm selection.

Linkage-based clustering is one of the most commonly-used and widely-studied clustering
paradigms. We provide a surprisingly simple set of properties that uniquely identify linkage-based
clustering algorithms. Our characterization highlights how linkage-based algorithms compare to
other clustering algorithms.

Combining previously proposed properties with our newly proposed ones, we show how these properties
partition the space of commonly-used clustering algorithms. Specifically, we show which of
these properties are satisfied by common linkage-based, centroid-based, and spectral clustering algorithms.
We hope that this analysis, as well as our characterization of linkage-based clustering,
will provide useful guidelines for users in selecting clustering algorithms.