Solving Ancient Riddles with Neural Networks

Could the Voynich Manuscript just be early world building? Image in public domain.

I, like just about everyone who has ever heard of it, have been fascinated by the Voynich manuscript for years. The idea of an eldritch textbook, written in an encrypted script and with baffling, other worldly diagrams and drawings is ripe for all manner of conspiracy and conjecture. That it is over half a century old and has managed to survive relatively intact just fuels those fires.

Personally, though, I don’t subscribe to any of the extra-terrestrial, spiritual or religious interpretations for the book. Honestly, whilst any of those would be ground breaking and revolutionary to our understanding of the universe, they also dismiss something far more interesting. The Voynich manuscripts could be some of the earliest genre fiction on the planet! That idea is genuinely more exciting to me; to find definitive proof that people 600 years ago were just as happy inventing fictional worlds, in entirety, as I am today.

Of course, ancient fiction is well documented. We have plenty of examples from much earlier in our history, but the Voynich manuscripts are somehow more interesting then the likes of Lucian of Samosata’s True History, at least in one particular aspect. They don’t tell the tale of some great mythology or legend, at least not one which is still known, and they appear to have been created by a single author, given how consistent the writing and art style is throughout the 200+ pages. To me, that would make this veritable tome less “just a story” and more a body of work akin to Tolkien’s Middle-Earth; early proof of actual world-building, for no reason other than fun. That we may have found a 12th Century Tolkien or Martin* is far more exciting to me, personally, then any of the more grandiose theories.

With that said, it may not be that long before the Voynich manuscript finally gives up some secrets. In one of the more interesting applications of neural-network AI I’ve seen, fellows at the University of Alberta have recently been targeting the text of the manuscript. The big issue with decoding the text isn’t just the encryption; with computers we should be able to make some headway on that front. However, you can only crack an encryption system if you have some idea of what the unencrypted message will look like. For that, we need to at least know the language which was initially encrypted. Of course, if the manuscript truly was created by a 12th Century Tolkien then it may have been written in a fictional tongue, making the whole exercise fruitless. Still, this is the very mystery that Greg Kondrak, with aforementioned AI in tow, may have managed to crack.

Having trained the AI on finding lingual patterns within text (not actually deriving meaning, but recognising the mathematical models that make up a language’s semantics and syntax) it was fed the Voynich manuscript. The result: Hebrew, with a high degree of certainty, appears to be the underlying language. That’s completely amazing to me. That a computer can teach itself enough about human linguistics that it can derive the language of a block of text based on glyph placement and frequency is astonishing, but that it can then use that same logic to decode gibberish into the underlying root language is mind blowing.

Of course, the result does come with some major caveats, the first being that this is still a best guess. The AI has found a pattern, that is certain; a pattern which closely matches archaic Hebrew. But until we can decode the text and find that it makes sense in Hebrew it doesn’t get us much closer. Here, Kondrak has also made some headway, believing that the text might be using a simple alphagram system. An alphagram takes a word and reorders the letters into alphabetical order, a fairly simple form of pseudo-encryption. For example, the word “example” would become “aeelmpx”; not instantly recognisable but not the hardest riddle to solve either.

Based on that hunch, the AI has run the text through a decryption algorithm and, again, seems to have hit pay dirt. A large amount of the output words are recognisably Hebrew, with more than 80% matching known Hebrew text. Unfortunately, the AI is no good at translating ancient Hebrew, and the few sentences they have tried don’t make much sense, but it’s a promising start. Hopefully, with the help of scholars more versed in the ancient language, some light may finally be shed on just what, exactly, the Voynich manuscript is or was. And that would be pretty darn awesome!