Last week, version 0.4.2 of the R package toaster became available on CRAN. It addressed a few bugs in kmeans, enhanced the map visualization function, and included several other minor features and fixes.

Kmeans with toaster

In Aster, clustering is represented by several functions among which kmeans seems as the most utilized. toaster's family of functions streamlines the workflow for kmeans by utilizing rich set of SQL/MR and SQL in Aster and scripting with R programming language running on a desktop. It includes steps for the data prep, the clustering itself, the model evaluation and analysis while providing visualization functions for centroids, quality of the cluster model, cluster metrics and properties, and more.

Kmeans family of functions

Because kmeans is sensitive to data variation across its dimensions it is highly recommended to normalize (scale) model variables first (in most but not all cases). Thus, function computeKmeans does this automatically (unless instructed otherwise). After clustering performed in Aster the function returns standard Rkmeans object with extra information.

Using function createCentroidPlot one immediately visualizes cluster centroids in multi-dimensional space of the kmeans model.

Going deeper, pair of functions computeClusterSample and createClusterPairsPlot drill into resulting cluster structure by sampling and visualizing pairwise relationships between variables within and across kmeans clusters.