Personal thoughts on maths and programming, often illustrated with code snippets

Menu

Unscrambling shuffled text

A story which surfaced a few years ago, and met quite some success in the press and on the internet, pretended Cambridge University had been conducing research on some of the most amazing faculties of the human brain. According to a supposed study, the order in which letters were laid out when writing a word mattered very little, provided the first and last letter be kept in place : this conclusion was supported by a short excerpt of shuffled text, which anyone could easily decipher1. As a short example, consider the following sentence:

Narlmloy, radneig tihs shdulon’t be too hrad.

As many commentators pointed out at the time, the trick works well because the words used are relatively short; the following passage should be much harder to understand:

Unshuffling scrambled text

Deciphering a shuffled message is non-trivial because one word can be shuffled in many different ways: "Cadgmrbie", "Cgamdbire", "Cmbiagrde" and "Cgrbimdae" are all acceptable representations of the word "Cambridge".

To overcome this difficulty, we define a shuffling-independent signature — a function that maps each word to a signature that does not change when the word is shuffled. The following snippet implements such a signature by isolating the first and last letter of each word and sorting the central part alphabetically.

Using this function, we can now compute a normalized version of the word "Cambridge", which does not change when its letters are shuffled: signature("Cambridge"), signature("Cadgmrbie"), signature("Cgamdbire"), signature("Cmbiagrde") and signature("Cgrbimdae") are all equal to "cabdgimre".

We then compute a signature-to-word mapping (possibly multi-valued — different words may map to the same signature) using a word list2 (english.txt).

This gives us a signature-to-word dictionary, which we can then use to decipher the shuffled message. For this, we first compute the signature of each scrambled word in the input, and then retrieve the corresponding clear-text words from the dictionary.

Further thoughts

Handling signature collisions

The code can be modified to allow for signature collisions (collisions happen when multiple words have the same signature): for this, instead of using a dictionary as a word-to-word mapping, we use a dictionary as a word-to-list-of-words mapping:

Did you like this article? Leave a comment! (If you hated it, that’s fine too: leave a message and tell me what I should improve next time!)

Matt Davis has a fascinating write-up on this. Snopes also has some reference material; the full text was something like Aoccdrnig to rscheearch at Cmabrigde uinervtisy, it deosn’t mttaer waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteres are at the rghit pclae. The rset can be a tatol mses and you can sitll raed it wouthit a porbelm. Tihs is bcuseae we do not raed ervey lteter by itslef but the wrod as a wlohe. [↩]