Source: EdurekakNN is very simple to implement and is most widely used as a first step in any machine learning setup.

It is often used as a benchmark for more complex classifiers such as Artificial Neural Networks (ANN) and Support Vector Machines (SVM).

Despite its simplicity, k-NN can outperform more powerful classifiers and is used in a variety of applications such as economic forecasting, data compression and genetics.

GeneticsA Regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous…As a variety of functional genomic and proteomic techniques become available, there is an increasing need for…bmcbioinformatics.

orgAs with most technological progress in the early 1900s, KNN algorithm was also born out of research done for the armed forces.

Two offices of USAF School of Aviation Medicine — Fix and Hodges (1951) wrote a technical report introducing a non-parametric method for pattern classification that has since become popular as the k-nearest neighbor (kNN) algorithm.

Source: DTIC — The Declassified Paper in full.

How does it work?Let’s say we have a dataset with two kinds of points — Label 1 and Label 2.

Now given a new point in this dataset we want to figure out its label.

The way it is done in kNN is by taking a majority vote of its k nearest neighbours.

k can take any value between 1 and infinity but in most practical cases k is less than 30.

Blue Circles v/s Orange TrianglesLet’s say we have two groups of points — blue-circles and orange-triangles.

We want to classify the Test Point = black circle with a question mark, as either a blue circle or an orange triangle.

Goal: To label the black circle.

For K = 1 we will look at the first nearest neighbour.

Since we take majority vote and there is only 1 voter we assign its label to our black test point.

We can see that the test point will be classified as a blue circle for k=1.

Expanding our search radius to K=3 also keeps the result same, except that this time it is not an absolute majority, it’s 2 out of 3.

Still with k=3 test point is predicted to have the class blue-circle →because the majority of points are blue.

Let’s see how k=5 and K =9 do.

To look at the nearest neighbors we draw circle with test point at the centre and stop when 5 points fall inside the circle.

When we look at the 5 and subsequently at K = 9, the majority of the closest neighbors of our test point are orange-triangles.

That indicates that the test point must be an orange triangle.

Left→K=5, Majority = Orange | Right→K =9, Majority = OrangeNow that we have labelled this one test point we repeat the same process over all the unknown points (i.

e.

the test set).

Once all test points are labelled using k-NN we try separating them using a decision boundary.

Decision boundary shows how well the training set is separated.

k-NN Live!That’s the gist of how k-NN happens.

Let’s see it from the point of view of a machine learning engineer’s brain.

First they would choose k.

We already saw above that a bigger k takes a vote of larger number of points.

This means higher chances of being correct.

But at what cost, you say?Getting the k nearest neighbours means sorting through the distances.

That is a costly operation.

A very high processing power is needed which translates to either longer processing time or costlier processor.

Higher the K costlier the whole procedure.

But too low a k would result in overfitting.

A very low k will fail to generalize.

A very high k is costly.

K = 1, 5, 15.

Source code below.

As we go to higher K’s the boundaries become smooth.

Blue and red regions are broadly separated.

Some blue and red soldiers are left behind the enemy lines.

They are collateral damage.

They account for loss in training accuracy but lead to better generalisation and high test accuracy i.

e.

high accuracy of correct labelling for new points.

A graph of validation error when plotted against K would typically look like this.

We can see that around K = 8 the error is minimum.

It goes up on either side.

Left : Training error increases with K | Right : Validation error is best for K = 8 | Source: AnalyticsVidhyaThen they would try to find distances between points.

How do we decide which neighbours are near and which are not?Euclidean Distance — Most common distance metricChebyshev Distance — L∞ DistanceManhattan Distance — L1 Distance : Sum of the (absolute) differences of their coordinates.

In chess, the distance between squares on the chessboard for rooks is measured in Manhattan distance; kings and queens use Chebyshev distance — WikipediaAll 3 distance metrics with their mathematical formulae.

Source: LyfatOnce we know how to compare points based on distance we would like to train our model.

The best part about k-NN is that there is no explicit training step for it.

We already know all that is to know about our dataset — its labels.

In essence the training phase of the algorithm consists only of storing the feature vectors and class labels of the training samples.

The Tinkersons, December 2, 2017K-NN is a lazy learner because it doesn’t learn a discriminative function from the training data but memorizes the training dataset instead.

An eager learner has a model fitting or training step.

A lazy learner does not have a training phase.

Finally, why k-NN?xkcdQuick to implement : Which is why it is popular as a benchmarking algorithm.

Less training time: Faster turn around timeComparable accuracies: Its prediction accuracy as indicated in a lot of research papers is fairly high for a lot of applications.

k-NN is a life saver when one has to quickly deliver a solution with fairly accurate results.

In most tools like MATLAB, python, R it is given as a single line command.