Machine Learning (ML) should not be a hard read.

Menu

Similarity and Distance Functions

Let’s go back to our posts on distance functions. The initial idea was to calculate the distance between two points or between one point and a set of points in a 2d space. Within the discussion, we often stated that one point is more “similar” to the set, than the other point. So, maybe what we are actually trying to calculate is the similarity between two entities (either being a point or a set). The problem is that similarity is a subjective matter, so we need an objective measure, and that’s why we have chosen distance.

In other words, we aim to calculate the similarity, and we do so indirectly through the distance function. The inverse relationship between these two functions is clear, as one increases the other should decrease and vice versa.

So we can easily use the inverse of the distance function, and name it as the similarity function:

where is the similarity between points a and b. Let’s look at the behavior of the function and see what we have achieved:

The inverse similarity function.

The problem with this approach is related to the integral of this function. The integral goes to infinity if calculated for all the values the distance function can take. This might seem a bit out of the scope now, but later we observe that bounded integral is essential for having a well-formed probability density function.

A function which is usually used for that purpose is the exponential function. The function is defined as:

The exponential similarity function.

Which does not have the problem related to the inverse similarity function. Notice that the mentioned function is not the only function that can be used for this purpose. For example, another possibility is:

The polynomial similarity function.

Actually these two functions look very much like each other, but there are many more possibilities for similarity function which do not necessarily result in the same shape. We will see later that the choice of different similarity function will result in different probability density functions.