How to Explain
Hierarchical Clustering

Given a set of N items to be clustered, and an NxN distance (or similarity) matrix, the
basic process of Johnson's (1967) hierarchical clustering is this:

Start by assigning each item to its own cluster, so that if you have N items, you now
have N clusters, each containing just one item. Let the distances (similarities) between
the clusters equal the distances (similarities) between the items they contain.

Find the closest (most similar) pair of clusters and merge them into a single cluster,
so that now you have one less cluster.

Compute distances (similarities) between the new cluster and each of the old clusters.

Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.

Step 3 can be done in different ways, which is what distinguishes single-link
from complete-link and average-link clustering. In single-link
clustering (also called the connectedness or minimum method), we
consider the distance between one cluster and another cluster to be equal to the shortest
distance from any member of one cluster to any member of the other cluster. If the data
consist of similarities, we consider the similarity between one cluster and another
cluster to be equal to the greatest similarity from any member of one cluster to
any member of the other cluster. In complete-link clustering (also called the diameter
or maximum method), we consider the distance between one cluster and another
cluster to be equal to the longest distance from any member of one cluster to any
member of the other cluster. In average-link clustering, we consider the distance
between one cluster and another cluster to be equal to the average distance from
any member of one cluster to any member of the other cluster. A variation on average-link
clustering is the UCLUS method of D'Andrade (1978) which uses the median distance.

Example. The following pages trace a hierarchical clustering of
distances in miles between U.S. cities. The method of clustering is single-link.

Input distance matrix:

BOS

NY

DC

MIA

CHI

SEA

SF

LA

DEN

BOS

0

206

429

1504

963

2976

3095

2979

1949

NY

206

0

233

1308

802

2815

2934

2786

1771

DC

429

233

0

1075

671

2684

2799

2631

1616

MIA

1504

1308

1075

0

1329

3273

3053

2687

2037

CHI

963

802

671

1329

0

2013

2142

2054

996

SEA

2976

2815

2684

3273

2013

0

808

1131

1307

SF

3095

2934

2799

3053

2142

808

0

379

1235

LA

2979

2786

2631

2687

2054

1131

379

0

1059

DEN

1949

1771

1616

2037

996

1307

1235

1059

0

The nearest pair of cities is BOS and NY, at distance 206. These are merged into a
single cluster called "BOS/NY".

Then we compute the distance from this new compound object to all other objects. In
single link clustering the rule is that the distance from the compound object to another
object is equal to the shortest distance from any member of the cluster to the outside
object. So the distance from "BOS/NY" to DC is chosen to be 233, which is the
distance from NY to DC. Similarly, the distance from "BOS/NY" to DEN is chosen
to be 1771.

After merging BOS with NY:

BOS/NY

DC

MIA

CHI

SEA

SF

LA

DEN

BOS/NY

0

223

1308

802

2815

2934

2786

1771

DC

223

0

1075

671

2684

2799

2631

1616

MIA

1308

1075

0

1329

3273

3053

2687

2037

CHI

802

671

1329

0

2013

2142

2054

996

SEA

2815

2684

3273

2013

0

808

1131

1307

SF

2934

2799

3053

2142

808

0

379

1235

LA

2786

2631

2687

2054

1131

379

0

1059

DEN

1771

1616

2037

996

1307

1235

1059

0

The nearest pair of objects is BOS/NY and DC, at distance 223. These are merged into a
single cluster called "BOS/NY/DC". Then we compute the distance from this new
cluster to all other clusters, to get a new distance matrix:

After merging DC with BOS-NY:

BOS/NY/DC

MIA

CHI

SEA

SF

LA

DEN

BOS/NY/DC

0

1075

671

2684

2799

2631

1616

MIA

1075

0

1329

3273

3053

2687

2037

CHI

671

1329

0

2013

2142

2054

996

SEA

2684

3273

2013

0

808

1131

1307

SF

2799

3053

2142

808

0

379

1235

LA

2631

2687

2054

1131

379

0

1059

DEN

1616

2037

996

1307

1235

1059

0

Now, the nearest pair of objects is SF and LA, at distance 379. These are merged into a
single cluster called "SF/LA". Then we compute the distance from this new
cluster to all other objects, to get a new distance matrix:

After merging SF with LA:

BOS/

NY/DC

MIA

CHI

SEA

SF/LA

DEN

BOS/NY/DC

0

1075

671

2684

2631

1616

MIA

1075

0

1329

3273

2687

2037

CHI

671

1329

0

2013

2054

996

SEA

2684

3273

2013

0

808

1307

SF/LA

2631

2687

2054

808

0

1059

DEN

1616

2037

996

1307

1059

0

Now, the nearest pair of objects is CHI and BOS/NY/DC, at distance 671. These are
merged into a single cluster called "BOS/NY/DC/CHI". Then we compute the
distance from this new cluster to all other clusters, to get a new distance matrix:

After merging CHI with BOS/NY/DC:

BOS/NY/DC/

CHI

MIA

SEA

SF/LA

DEN

BOS/NY/DC/CHI

0

1075

2013

2054

996

MIA

1075

0

3273

2687

2037

SEA

2013

3273

0

808

1307

SF/LA

2054

2687

808

0

1059

DEN

996

2037

1307

1059

0

Now, the nearest pair of objects is SEA and SF/LA, at distance 808. These are merged
into a single cluster called "SF/LA/SEA". Then we compute the distance from this
new cluster to all other clusters, to get a new distance matrix:

After merging SEA with SF/LA:

BOS/NY/DC/CHI

MIA

SF/LA/SEA

DEN

BOS/NY/DC/CHI

0

1075

2013

996

MIA

1075

0

2687

2037

SF/LA/SEA

2054

2687

0

1059

DEN

996

2037

1059

0

Now, the nearest pair of objects is DEN and BOS/NY/DC/CHI, at distance 996. These are
merged into a single cluster called "BOS/NY/DC/CHI/DEN". Then we compute the
distance from this new cluster to all other clusters, to get a new distance matrix:

After merging DEN with BOS/NY/DC/CHI:

BOS/NY/DC/CHI/DEN

MIA

SF/LA/SEA

BOS/NY/DC/CHI/DEN

0

1075

1059

MIA

1075

0

2687

SF/LA/SEA

1059

2687

0

Now, the nearest pair of objects is BOS/NY/DC/CHI/DEN and SF/LA/SEA, at distance 1059.
These are merged into a single cluster called "BOS/NY/DC/CHI/DEN/SF/LA/SEA".
Then we compute the distance from this new compound object to all other objects, to get a
new distance matrix:

After merging SF/LA/SEA with BOS/NY/DC/CHI/DEN:

BOS/NY/DC/CHI/DEN/SF/LA/SEA

MIA

BOS/NY/DC/CHI/DEN/SF/LA/SEA

0

1075

MIA

1075

0

Finally, we merge the last two clusters at level 1075. This process is summarized by
the clustering diagram printed by many software packages:

In the diagram, the columns are associated with the items and the rows are associated
with levels (stages) of clustering. An 'X' is placed between two columns in a given row if
the corresponding items are merged at that stage in the clustering.