Archive

Knox took the time to plot the frequency distributions from this post, where I looked at the theory that the VMs words are phonetic codes. Here are his results:

Where not included in the title, comparisons are to the Herbal Sections. VMs is in blue-black.

Comparison of phonetic code frequencies between VMs sections and various known texts.

With only 40 words to translate, there cannot be a meaningful series but it would be interesting to see the actual words in position, anyway. If this only shows the power of Genetic Algorithms to match something regardless of significance, why does the old Latin Herbal make the best matches to the Herbal and Astrological sections?

Current Status

This is my personal summary of where I am at the moment, in particular which theories I’ve rejected (for better or worse!)

Theory: VMs words are anagrams of a plaintext that has been enciphered into the VMs glyphs

Attempts to find solutions with many mappings (1- 2- 3-grams) and various languages/dictionaries fail to find even mediocre matches

Unusual prevalence of e.g. “8am 8am 8am” not explained by this theory

Theory: VMs words are in fact pieces of plaintext words, that need to be a) combined b) deciphered

Trials with delimiters like VMs “o” and “9” and with many mappings and languages/dictionaries fail to find good matches

But this would explain “8am 8am 8am” at a stretch

Theory: VMs words contain numeric codes, that use a Selenus type code table, with e.g. gallows characters used as multipliers

There are too many VMs characters: for this to work – only, say, 4 gallows characters and ten digits are needed for a minimal implementation – what are all the rest for?

Doesn’t explain “8am 8am 8am”

Theory: VMs words are phonetic codes for a reading of the manuscript

Mapping the words to Soundex or Double Metaphone and comparing with plaintexts produces a poor frequency match (but is this a good test – see e.g. Robert Firth’s notes)

This could explain “8am 8am 8am”

Theory: The text is produced by a polyalphabetic cipher with rotating/repeating sequences (a la Strong)

Multiple attempt to fit this theory using various alphabet lengths and sequence lengths fails to find a convincing match, although plausible results can be generated

Would explain “8am 8am 8am”

Procedure: since the cipher/code/whatever it is changes at least between sections, and possibly between folios (and maybe even within a folio), examining large quantities of VMs text for statistical properties is very misleading. Only text within a single side of a folio should be tackled for decryption.

A good question, to which the answer is probably “no”. But, like most questions that the Voynich provokes, it’s fun to try to answer. I’ve detailed the approach and result on my web site.

The procedure is to take each word in a body of plaintext and convert it to a phonetic code using the Soundex and Double Metaphone algorithms. Then accumulate the frequency distribution of the phonetic codes in the plaintext and compare with the frequency distribution of word counts in the VMs. If the VMs “words” are phonetic abbreviations of plaintext words, then one might expect the frequency distributions to match in form and level.

(Of course, Soundex is designed for English, so may not make much practical sense when applied to other languages. However, it is one means of expressing phonetic content: and thus allows us to generate a phonetic description for any foreign language word, albeit in an English pronounciation! Double Metaphone is designed for several different languages.)

Example Soundex

To illustrate the Soundex compression, here is a Latin phrase and its Soundex equivalent:

A Caution

"Students who have approached the Voynich text from the point of view of the professional cryptanalyst have been led on at first by a deceptive surface appearance of simplicity, only to bog down sooner or later in an exasperating quagmire of paradoxes and enigmas that reveal themselves one by one as the analysis proceeds."
- Mary d'Imperio