Supervised learning in disguise: the truth about unsupervised learning

One of the first lessons you’ll receive in machine learning is that there are two broad categories: supervised and unsupervised learning. Supervised learning is usually explained as the one to which you provide the correct answers, training data, and the machine learns the patterns to apply to new data. Unsupervised learning is (apparently) where the machine figures out the correct answer on its own.

Supposedly, unsupervised learning can discover something new that has not been found in the data before. Supervised learning cannot do that.

The problem with definitions

It’s true that there are two classes of machine learning algorithm, and each is applied to different types of problems, but is unsupervised learning really free of supervision?

In fact, this type of learning also involves a whole lot of supervision, but the supervision steps are hidden from the user. This is because the supervision is not explicitly presented in the data; you can only find it within the algorithm.

To understand this let us first consider the use of supervised learning. A prototypical method for supervised learning is regression. Here, the input and the output values – named X and Y respectively – are provided for the algorithm. The learning algorithm then assesses the model’s parameters such that it tries to predict the outputs (Y) for new inputs (X) as accurately as possible.

In other words, supervised learning finds a function: Y’ = f(X)

Supervised learning success

Supervised learning success is assessed by seeing how close Y’ is to Y, i.e. by computing error function.

This general principle of supervision in learning is the basic principle for logistic regression, support vector machines, decision trees, deep learning networks and many other techniques.

In contrast, unsupervised learning does not provide Y for the algorithm – only X is provided. Thus, for each given input we do not explicitly provide a correct output. The machine’s task is to “discover” Y on its own.

A common example is cluster (or clustering) analysis. Before a clustering analysis, there aren’t known clusters for the data points within the inputs, and yet the machine finds those clusters after the analysis. It’s almost as if the machine is creative – discovering something new in the data.

Nothing new

In fact, there is nothing new; the machine discovers only what it has been told to discover. Every unsupervised algorithm specifies what needs to be found in the data.

There must be criterion saying what success is. We don’t let algorithms do whatever they want, or ask machines to perform random analyses. There is always a goal to be accomplished, and that goal is carefully formulated as a constraint within the algorithms.

For example, in a clustering algorithm, you may require the distances between cluster centroids to be maximized, while the distances between data points belonging to the same cluster are minimized. Plus, for each data set there is an implicit Y, which for example may state to maximize the distance-between/distance-within ratio.

Therefore, the lack of supervision in these algorithms is nothing like the metaphorical “unsupervised child in a porcelain shop”, as this would not give us particularly useful machine learning. Instead, what we have is more akin to letting adults enter a porcelain shop without having to send a nanny too. The reason for our trust in adults is that they have already been supervised during childhood and have since (hopefully) internalized some of the rules.

Something similar happens with unsupervised machine learning algorithms; supervision has been internalized, as these methods come equipped with algorithms that informs what are good or bad model behaviours. Just as (most) adults have an internal voice telling them not to smash every item in the shop, unsupervised machine learning methods possess internal machinery that dictates what constitutes good behaviour.

Supervised vs. unsupervised

Fundamentally, the difference between supervised and unsupervised learning boils down to whether the computation of error utilizes an externally provided Y, or whether Y is internally computed from input data (X).

In both cases there is a form of supervision.

As all unsupervised learning is actually supervised, the main differentiator becomes the frequency at which intervention takes place. For example, do we intervene for each data point or just once, when the algorithm for computing Y out of X is designed?

Hence, within the so-called unsupervised methods, supervision is present, but hidden (it is disguised) because no special effort is required from the end user to supply supervision data. The algorithm seems to be magically supervised without an apparent supervisor. However, this does not mean that someone hasn’t gone through the pain of setting up the proper equations to implement an internal supervisor.

Consequently, unsupervised learning methods don’t truly discover anything new in any way that would overshadow the “discoveries” of supervised methods.