Description

Given nn objects with pp variables measured on each object, xijxij, for i = 1,2, … ,ni=1,2,…,n and j = 1,2, … ,pj=1,2,…,p, nag_mv_cluster_kmeans (g03ef) allocates each object to one of KK groups or clusters to minimize the within-cluster sum of squares:

K

p

∑

∑i ∈ Sk

∑

(xij − xkj)2,

k = 1

j = 1

∑k=1K∑i∈Sk∑j=1p(xij-x-kj)2,

where SkSk is the set of objects in the kkth cluster and xkjx-kj is the mean for the variable jj over cluster kk. This is often known as KK-means clustering.

In addition to the data matrix, a KK by pp matrix giving the initial cluster centres for the KK clusters is required. The objects are then initially allocated to the cluster with the nearest cluster mean. Given the initial allocation, the procedure is to iteratively search for the KK-partition with locally optimal within-cluster sum of squares by moving points from one cluster to another.

Optionally, weights for each object, wiwi, can be used so that the clustering is based on within-cluster weighted sums of squares:

On entry, at least one cluster is empty after the initial assignment. Try a different set of initial cluster centres in cmeans and also consider decreasing the value of k. The empty clusters may be found by examining the values in nic.

Convergence has not been achieved within the maximum number of iterations given by maxit. Try increasing maxit and, if possible, use the returned values in cmeans as the initial cluster centres.

Accuracy

nag_mv_cluster_kmeans (g03ef) produces clusters that are locally optimal; the within-cluster sum of squares may not be decreased by transferring a point from one cluster to another, but different partitions may have the same or smaller within-cluster sum of squares.