Monday, 13 January 2014

Convert distance matrix to 2D projection with Python

In my continuing quest to never use R again, I've been trying to figure out how to embed points described by a distance matrix into 2D. This can be done with several manifold embeddings provided by scikit-learn. The diagram below was generated using metric multi-dimensional scaling based on a distance matrix of pairwise distances between European cities (docs here and here).

Notes: If you don't specify a random_state, then a slightly different embedding may be generated each time (with arbitary rotation) in the 2D plane. If it's slow, you can use multiple CPUs via n_jobs=N.

The input to 'fit' depends on the choice. If precomputed, you pass a distance matrix; if euclidean, you pass a set of feature vectors and it uses the Euclidean distance between them as the distances. (To my mind, this is just confusing.)

Hi Noel,Very nice post and I found something I was looking for.I have used PCA for my analysis and would like to know if you have any idea what is the difference between PCA nad NMDS. How to choose between both of them ?

The method described here reproduces a distance matrix in a lower dimension. PCA leaves the points where they are (at all the same distances - many people seem unaware of this) but rotates the axes so that the first one points along the direction of greatest variance, the second one along the next direction of variance, and so on.