Question

In this lesson, we learned about the K-Nearest Neighbors classification technique. However, is K-Nearest Neighbors similar to K-Means Clustering?

Answer

No, although their names are somewhat similar, they are conceptually quite different methods of machine learning. Some important differences are as follows.

K-Nearest Neighbors is a supervised classification algorithm where K describes the number of neighbor points that we look at for each data point to determine its classification. As a supervised algorithm, we have the labels of the data points, and use those to predict the labels of new data points.

In addition, the concept behind this algorithm is that for a point, it will get its K nearest neighbors, based on the closest distance. And, this algorithm only really has to iterate one time through, unlike K-Means Clustering which iterates multiple times until convergence is reached.

K-Means Clustering is an unsupervised algorithm, where the K is used to describe how many centroids or clusters there will be when applying the algorithm. As an unsupervised algorithm, we are not given any labels, but instead, we have parameters that we use to group similar data points together and find the clusters.

The concept behind this algorithm is that we try to calculate the locations of the K centers, or the averages or means, of the data points, which are where the clusters are most likely centered on. It will recalculate the centers based on the data points, and will iterate multiple times until convergence is reached, which happens when the newly computed center locations stop changing.