Note added on April 14, 2010: Do not initialize k-means with two
identical centers. Otherwise, this Matlab code may terminate
prematurely with an error message.

The version of the KDD'98 dataset used in the paper is available as cup98b.data.zip. This file has 95413
rows with 56 columns each. For k
= 20, starting with furthest-first initialization as described in the
paper, k-means converges in
260 iterations. The final mean squared distance of each point to
its nearest center is 12880. The Matlab software above performs
approximately 7.26 million distance calculations, instead of about 496
million without optimization. With a 3GHz Pentium IV processor,
it finishes in about 120 seconds,
compared to about 1280 seconds for a k-means
implementation with similar low-level optimizations that does not
eliminate distance calculations.