The basic premise the work relies on is that words with similar meaning repeatedly occur closely (also known as co-occurrence). As an example in a large corpus of text one could expect to see the words mountain, valley and river appear often close to each other. The same might be true for mouse, cat and dog.

One could now create a square matrix of a text where all unique words n are represented as a row and column. Now we can read the text and every time we read a new word we look its row vector up in the matrix. Then we take x words on the right and on the left and increment the corresponding column in the row vector for each word. This is a simple sliding window parsing. We can also account for the closeness of a word by incrementing by a larger number for a word closer to the centre word, e.g. a word next to the centre could result in an increment of 5 in its column and a word 4 words away could result in a 2 increment.

Naive HAL Matrix

As a result words co-occurring have similar rows. If we look at the simplified example in the above matrix we can see that mountain, valley and river have similar rows and so do mouse, cat and dog. These rows can be interpreted as vectors with n dimensions. The “distance” between vectors then becomes a proxy for the similarity of meanings of the words represented by the vectors. The “distance” often is measured as the cosine of the angle between two vectors. As a result identical vectors, pointing in the same direction, have an angle of 0 degrees and a cosine value of 1. Unrelated vectors would be orthogonal with an angle of 90 degrees and a cosine value of 0. To ease the cosine calculation matrices are often normalised along the rows to the unit length of 1 of the row vectors.

Following the example it also shows that even words not directly co-occurring can share meaning. Dog for example does not appear close to mouse but through its shared meaning with cat also shares meaning with mouse. As a result one can easily group words by their meaning even if they share it only indirectly.

While similar experiments had been done before Lund and Burgess published their work it still was a great breakthrough. Their approach is completely automated and opposite to earlier work does not rely on humans selecting dimensions and training semantic vector spaces. Only the information in a corpus is used to create the matrix and the resulting vector space and thus has no external bias through influence by human actors.