New study suggests Voynich text is not a hoax

Comparison of the Voynich manuscript and different information carrying sequences. A) Information in word distribution as a function of the scale for the Voynich manuscript compared to other five language and symbolic sequences (F: Fortran; C: Chinese; V: Voynich; E: English; L: Latin; Y: yeast DNA). The number of words in all sequences was equal to that of the Voynich text; if the original sequence was longer, the additional words were not considered. B) Scale of maximal information for the sequences considered in A. Credit: doi:10.1371/journal.pone.0066344.g001

(Phys.org) —Theoretical physicist Marcelo Montemurro and colleague Damián H. Zanette have published a paper in the journal PLOS ONE claiming that the Voynich text is likely not a hoax as some have suggested. The two researchers along with others at the University of Manchester in the U.K. analyzed a digital copy of the text and say that computer assisted analyses of the "book" suggest it does harbor meaning, though what that might be is still a mystery.

The Voynich text is a book made up of 104 folios—each page has graphemes (arrays of characters) and drawings on it. It first came to light in 1912 when Wilfrid Voynich claimed to have found it in an Italian Monastery. The graphemes suggest words made up of characters that do not appear in any other known language. Since the time of its discovery, various researchers have sought to determine if the text is written in an unknown language, or if it is instead a book created by someone as a hoax. Adding to the mystery of the text are the drawings of plants on most of the pages—none of them are known to exist in nature. Carbon dating of the text suggests it was created sometime in the 1400s—but that that doesn't offer proof that the writing on the parchment was done during that period, leaving some to suggest it was Voynich himself who created the characters and drawings. To date, no one has been able to prove whether the text has meaning or if it is simply pages of gibberish. To learn more, Montemurro and his team turned to advanced computer analysis.

To analyze the text, researchers assign modern language letters to characters; this allows for the application of algorithms. In this case, the team looked at global patterns of "words" that appear throughout the text. This process represents a novel way to view the semantics. One type of pattern distribution known as "entropy" allows researchers to compare documents to one another using a computer. The method offers a single number that describes the complexity of the text. The Voynich text received a score of 805, compared to 728 for text samples written in English and 580 for those written in Chinese. A comparison of the Voynich score to yeast DNA samples (25) and a program written in Fortran (285) suggests the Voynich text is more complicated than simple gibberish.

The team notes that the text also conforms to Zipf's law—it states that words in real languages are inversely proportional their rank in a frequency table. Taken together, the researchers conclude that the Vonynich text mostly likely contains real information and thus, is not a hoax.

AbstractThe Voynich manuscript has remained so far as a mystery for linguists and cryptologists. While the text written on medieval parchment -using an unknown script system- shows basic statistical patterns that bear resemblance to those from real languages, there are features that suggested to some researches that the manuscript was a forgery intended as a hoax. Here we analyse the long-range structure of the manuscript using methods from information theory. We show that the Voynich manuscript presents a complex organization in the distribution of words that is compatible with those found in real language sequences. We are also able to extract some of the most significant semantic word-networks in the text. These results together with some previously known statistical features of the Voynich manuscript, give support to the presence of a genuine message inside the book.

Related Stories

Human beings have the ability to convert complex phenomena into a one-dimensional sequence of letters and put it down in writing. In this process, keywords serve to convey the content of the text. How letters and words correlate ...

As video recording technology improves in performance and falls in price, ever-more events are being captured within video files. If all of this footage could be searched effectively, it would represent an invaluable information ...

(Phys.org)—Geneticist Theo Sanderson has written a simple text editor that allows a writer to use only words from a list of the 1000 ("ten hundred" since "thousand" isn't on the list) most commonly used words in the English ...

Americans are saying goodbye to text messaging, a wireless industry group says, as Internet-based applications such as Apple's Messages are starting to taking over from what was once a cash cow for phone companies.

Today most public services involve electronic communication, which requires that people are able to read relatively well. However, a significant number of adults cannot fully understand the texts they read for example on ...

Recommended for you

It sounds like a science-fiction nightmare. But "killer robots" have the likes of British scientist Stephen Hawking and Apple co-founder Steve Wozniak fretting, and warning they could fuel ethnic cleansing and an arms race.

A startup team calls their work a product. They also call it a social movement. Many people in the over-7,000 islands in the Philippines lack access to electricity .The startup would like to make a difference. Their main ...

Are some people fed up with remembering and using passwords and PINs to make it though the day? Those who have had enough would prefer to do without them. For mobile tasks that involve banking, though, it is obvious that ...

26 comments

The theory that rang with the most truth, to me, was that it was an expensive prop for a traveling "healer," drawn by someone who was illiterate but familiar with writing. The prop gives legitimacy to the healer's claims of "ancient and secret wisdom." Being familiar with writing, the artist emulated features of authentic writing.

Actually that test for complexity just says it is not random strings of gibberish. If you look at that complexity measured on works of fiction, you get a high number, as there is plenty of complexity. However that doesn't mean deliberately concocted works of untruth are somehow true, just because they are complicated. Where I come from a "tightly woven plot" means tons of complexity, but it doesn't imply true.

Structure is not evidence for there being a message. For example if you used a real language to help keep track of what nonsensical symbols you are writing you would still pass this structure test without necessarily having meaning or being a workable translation of whatever you used as a guide.

Without knowing the details of their algorithm I would think that too high a score could indicate the author of the text was lazy and reused the same pattern repeatedly. Which, given human nature, is much more likely than someone putting down symbols in a random pattern.

It is a big leap from "not random" to "genuine message".

You can never prove that a one-off text isn't a personal cipher that you can't decode. The imaginary plants are which actually points to it being a hoax.

A comparison of the Voynich score to yeast DNA samples (25) and a program written in Fortran (285) suggests the Voynich text is more complicated than simple gibberish.

As programs go one written in Fortran is certainly a good standard for gibberish (couldn't resist)

Well, "useful" is not a term used in the article. They claim "genuine message" and "real information". Both terms are obvious analogues of "not gibberish".

To be fair though there is a difference between information (as defined by information theory) and meaningful information. You can create many sentences in english that have correct grammar (and hence would show up under such analysis as "information carrying") but which mean absolutely nothing.

IMO it's a list of prohibited herbal drugs, i.e. the plants, the usage of which had been prohibited in medieval times for various "witchcraft" purposes up to level, so that even their description must remained secret.

Probably just an obscure work of fiction by somebody with a talent for languages or ciphers. Still would be really fun to actually translate and figure out what it says. I don't think anyone is expecting life altering secrets to emerge from it.

In my opinion, this is a work done by a partially deaf, illiterate person with a severe learning disability and a serious speech impediment, who had very little or no exposure to the printed word, or any form of education for that matter, because he would have been considered an idiot by the community's lay population. Armed nonetheless with a very high degree of intelligence, he created his own written language with its own inexplicable rules, as he would have yet been provided with the means to record things on paper in some form, as the authorities (clerics) in those days thought these people to have been 'touched', and that they were 'holy', and therefore that somehow God might communicate through them. The work may yet be some form of ordered nonsense, with a simple but undecipherable pattern. In this way, the author might acquire some modicum of respect, in spite of his disabilities.

A test I like to do when examining encrypted data, is a compressibility test. The basic idea behind encryption is to scramble the data so much that frequency analysis is useless. Compressing data makes use of frequency analysis, regardless of the algorithm you use. For example ZIP is merely a hoffman tree, ie binary tree loaded via frequency analysis. So if data has been encrypted, rather than pure gibberish, the encrypted text will have less compressibility. It's "DISORDER" measure, will be very high, if it is encrypted well. So any text may have more than one level of "intelligence" invoked on it. The first can be plain text, ie a story. The next level is then encryption. This can go on and on.Of course if it isn't encrypted (high compressibility), then we know there are many repeated constructs. Then it is up to a Symantec processor to see if it matches the constructs of any known language. It's a form of "pattern" recognition if you like.

Decode this:XY<-gacuu . If you "experts" can't, I'd say all they are doing are just guessing and if they fail, they would blame "hoax" for their ignorance..., pride, stupidity, study grants, tenures??>

I'm not sure I understand why they think high entropy is indicative of a real language. High entropy is indicative that it's a random sequence or encrypted (such as substitutions made from a random table). The result would match English where individual characters have been substituted from a small table, leading to a slight increase in entropy, but retaining the overall structure.

Ahh, the good old, Voynich text. Great book to cozy up in bed with. Such an interesting plot with all of the little twists and artwork to keep the readers interest (satire of course). In all seriousness, I'm a coder and an artistic type so this book fascinated me when I first saw it. I was hoping that chapter by chapter you could pick up broad meaning from the pictures in each, on section is about plants, another about women birthing, another looks like it's about astronomy and another looks to be about math. Every chapter seems to have a code-wheel that to me looks like the key to cipher. So each chapter would have it's own cipher. It's all hand painted page by page, and the whole book had to take years make.

Thats what makes me believe its not a hoax either. It looks like a diary of what a Monk might write as observations of subjects without the church's influences. For example the taboo subject of women in birthing tubs in appearance. Fascinating and weird.

Statistical analysis of text is hardly high-end science. I am surprised it took 100 years to perform such a basic test.

If you look at the manuscript you may notice that it's sometimes difficult to identify what makes up a 'letter', or a word. Getting at the basic units of the text is already very hard.

You need LOTS of text to analyse to gain any sort of meaning.

And you also need some sort of crib or partial translation. Otherwise you just have the 'Chinese room' problem. Yes, you will eventually be able to produce valid 'Voynich-sentences' (and be able to distinguish valid from non-valid ones) - but it still won't tell you what they mean.

But in all seriousness: It looks like a (much better) version of the the fantasy doodles of animals and star maps I used to draw as a kid. And the 'information' could just be the natural tendency to repeat pleasing shapes.

Misery Monkeys can't be bothered to pay-attention (lend credence) to demonstrable facts invading in their very faces, so how could anyone give a flaccid rat's ass what some mythical voodoo con-man crafted centuries ago? The 'good news' is things that cannot continue indefinitely DON'T.

The theory that rang with the most truth, to me, was that it was an expensive prop for a traveling "healer," drawn by someone who was illiterate but familiar with writing. The prop gives legitimacy to the healer's claims of "ancient and secret wisdom." Being familiar with writing, the artist emulated features of authentic writing.

People sure get mixed up between information and meaning. A purely random string of characters is uncompressable and as such contains more information (and higher entropy) than an equally long proper sentence. The latter has meaning, the former does not. The article's use of entropy seems in reverse of what I expected. Lower entropy implies more structure and compressibility, and, possibly, meaning.

Higher entropy implies less structure/predictablility, so unless the text is scrambled or compressed it would appear to be gibberish. I'm not sure what measure they are using such that yeast DNA and Fortran get lower numbers. They called it "entropy" with quotes..

Please sign in to add a comment.
Registration is free, and takes less than a minute.
Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.