Divisive clustering

So far we have only looked at
agglomerative clustering, but a cluster hierarchy can also
be generated top-down. This variant of hierarchical
clustering is called top-down clustering or
divisive clustering . We start at the top with all
documents in one cluster. The cluster is split using a flat
clustering algorithm. This procedure is applied recursively
until each document is in its own singleton cluster.

Top-down clustering is conceptually more complex than
bottom-up clustering since we need a second, flat clustering
algorithm as a ``subroutine''. It has the advantage of being
more efficient if we do not generate a complete hierarchy
all the way down to individual document leaves. For a fixed
number of top levels, using an efficient flat algorithm like
-means, top-down algorithms are linear in the number of
documents and clusters. So they run much faster than HAC
algorithms, which are at least quadratic.

There is evidence that divisive algorithms produce more accurate
hierarchies than bottom-up algorithms in some
circumstances. See the references on
bisecting -means in Section 17.9 . Bottom-up methods
make clustering decisions based on local patterns without
initially taking into account the global distribution. These
early decisions cannot be undone. Top-down clustering
benefits from complete information about the global
distribution when making top-level partitioning decisions.