K-means clustering

K-means clustering tries to partition the set of vectors into k randomly initialized clusters.Again, results will change each time but a cursory examination of the clustering results shows that it seems to do a better job on the data-set than k-means.Although the projected document vectors are now length 512 (from around 1500) the result is much the same as the initial k-means clustering (while reducing the clustering computation by two thirds).The final result is arguably better as well with LSA finding some interesting sets of documents that were missed by vanilla k-means.In this case the latent document vectors are now length 256 so the k-means performance is now twice that of random projections.We haven't formally evaluated the results in this tutorial but a cursory examination of the four sets of results shows that NNMF is well suited to text clustering, while K-means in its three variants gives good but somewhat varied results.