Archive

The Shannon Entropy of a string of text measures the information content of the text. For text that is completely random i.e. where the appearance of any character is as likely as the appearance of any other, the entropy (or “disorder”) is high. For a text which is a long string of identical characters, for example, the entropy is low.

Mathematically, the Shannon Entropy is defined as:

Entropy = –ΣiN probi * Log( probi)

where probi is the frequency of the i’th character in the text, and the sum is over all the characters.

If the Voynich text is randomly created (by whatever means), we’d expect it to have high entropy (i.e. be very disordered). What we in fact find is that the text is ordered, with low Entropy, and is rather more ordered than English, for example. The result of comparing the Voynich text with several other texts in different languages is shown in the table below.

Language

Source

Entropy

Voynich

GC’s Transcription

3.73

French

Text from 1367

3.97

Latin

Cantus Planus

4.05

Spanish

Medina 1543

4.09

German

Kochbuch 1553

4.15

English

Thomas Hardy

4.21

Early Italian

Divine Comedy 1300

4.23

None

Random characters

6.01

The last entry in the table shows the Entropy for a random text – and is getting on for double the Entropy of the Voynich.

A Caution

"Students who have approached the Voynich text from the point of view of the professional cryptanalyst have been led on at first by a deceptive surface appearance of simplicity, only to bog down sooner or later in an exasperating quagmire of paradoxes and enigmas that reveal themselves one by one as the analysis proceeds."
- Mary d'Imperio