1. What is k-Means Clustering

It is a clustering method that tends to partition your data into partitions called clusters. Let's say you have you have n data points, the k-means algorithms would assign each of the data point to the cluster with the nearest mean.

Note: the k in k-means represents the number of clusters, that is k clusters.

2. How it works

Suppose we have a set of observations {x1, x2,... xn} which consists in a set of N random variable x (x is a D d-dimensional real vector). The goal is to partition the data set into some number K of clusters, where the value of K is known.
A cluster is a group of data points whose inter-point distances are minimal when compare with distance to points outside the cluster.
The first step is to find the mk, for k = 1,..., K, in which mkis the mean associated to the kth cluster.
We now assign each of the data points to clusters, such that the sum of squares of the distances of each data point to its closest mean mk is minimum.

3. Steps of the k-Means Algorithm

Step 1: For each unit, x, in the input space, place it in the cluster whose current centroid it is nearest to.Step 2: After all points have been assigned (x1,...xn), adjust the locations of the centroids of the k clustersStep 3: Reassign all the points to their nearest centroidRepeat steps 2 and 3 until convergence
Convergence occurs when the points not move between clusters and the centroids are stabilized

This algorithm would become clearer when we consider an example later in this article.
Let's now look at a little more details on how the clusters are assigned

4. Assignment of Clusters

(based on "Pattern Recognition and Machine Learning" by Christopher M. Bishop)
For each data point xn, we would use a corresponding set of binary variables we denoted as rnk ={0,1} where k =1,...,K.rnkdescribes which of the K clusters the data point xn is assigned to. If rnk is assigned to k, then rnk = 1 and rnj = 0 for j not equal to k.

Note the the two superscript in indicates that there would be two loops:Loop 1(1-N): Iterate through all the data pointsLoop 2(1-K): Iterate through all the cluster

Now, for each iteration, you need to calculate a value for J which is called the distortion measure and is given by the sum of squares function:

5. Lets try to understand this formula

First note that this formulais sometimes called distortion measure since it represent how much each data point say xk is separated from the mean mk of that cluster.
In other words, J represents the sum of squares of the distance of each data point to its assigned vector mk.

N is to total number of data points,

K is the number of clusters

xn is the vector of measurement n

mk is the mean for cluster k

rnk is an indicator variable that indicates whether to assign xn to k

We need to determine the value of {rnk} and mk that gives the least value of J.
This is achieved by the use of k-Means Algorithm

6. More Explanation of the k=Means Algorithm

This algorithm is an iterative procedure involving steps:
Input set of points x1,...xn
Choose a value for K
Place m1,..., mk(let's call this centroid) are random locations within your data space
Repeat the following steps until convergence
for each point x

find the nearest centroid, cj (compute the distance between xi, compute the distance between xi and mj for every centroid mj, and pick the cluster with the minimum distance)

find the nearest centroid

assign this point xj to the cluster of the nearest centroid

For each cluster j = 1,..., K, and recompute the centroid position

Take all the data points that fall into this cluster

new centroid cj = mean of all points x

assign to cluster j in the previous step

Stop when none of the cluster assignments change

7. Example of k-Means Clustering

The example below is based on
K = 2 and N = 14
Take some time to examine the progression through the steps and make sure you understand how it works