Low Dimension Embeddings for Visualization

Representation learning is a hot area in machine learning. In natural language processing (for example), learning long vectors for words has proven quite effective on several tasks. Often, these representations have several hundred dimensions. To perform a qualitative analysis of the learned representations, it helps to visualize them. Thus, we need a principled approach to drop from this high dimensional space to a lower dimensional space (like $ \mathbb{R}^2 $ for instance).

In this blog post, I will discuss the Multidimensional scaling (MDS) algorithm - a manifold learning algorithm that recovers (or at least tries to recover) the underlying manifold that contains the data.

MDS aims to find a configuration in a lower-dimension space that preserves distances between points in the data. In this post, I will provide intuition for this algorithm and an implementation for clojure (incanter).

MDS works by preserving distances between points in the data. The end result of running this algorithm on a dataset is a spatial arrangement that preserves distances (to the best of its ability).

For example, say you learn a representation for words - like word2vec. These representations are continuous vectors of 300 elements (at least the default vectors are). These representations preserve (or exhibit) some striking linguistic phenomena. For instance, $ R(man) - R(woman) \approx R(king) - R(queen) $. Words that are semantically similar tend to have similar representations.

So how does one go about visualizing phenomena like these? Clearly, you want to be able to view these points in 2 dimensions (on a plot say) and also preserve some of the properties (i.e. similarity of semantically similar words).

MDS operates on pairwise distances between data points. It retrieves an embedding of the data-points in a lower-dimensional space that preserves these distances. Thus, we can visualize the representations learned by using MDS to drop down to 2 dimensions. Ideally the distances should be preserved. Perfect.

Here is an implementation for clojure (using core.matrix - I had to set the implementation to vectorz since svd seems to not be available by default):

Looking at this plot, we observe that the resulting configuration places semantically similar words close by. For instance {dog, cat, animal}, {home, actor, doctor}, {lake, river} and {city, town} are groups that are clustered together and the individual clusters are pretty far away from each other.