Menu

Low Level Word Structure

In two earlier articles I proposed that the characters <y> and <a> are equivalent, and that <y> is not expressed in a middle position after <e>. (If you have not read those articles, it is best to read them now before this article.) The combination of these findings is that <y> and its different expressions occur in most of the same environments as <o>, and that when one is swapped for the other the result is often a valid word. No other character can be swapped with either of them and reliably outcome in valid words. Because of this <y> and <o> must have some strong relationship.

Unlike the relationship between <y> and <a> — where the two characters occur in different environments which never overlap — the characters <y> and <o> occur in environments which significantly overlap. The writer of the Voynich text did not have to choose between <y> and <a> because the environment dictates which one is needed and the other is not possible. But they must have had a reason for choosing <y> or <o> because either one would have made valid word. So these two characters must have contrasted in the mind of the writer—they made the difference between one word and another—but they were obviously still similar enough to go in the same environments.

Thus <y> and <o> can be considered the same class of character: they work in the same way but do not have the same value. And we must immediately note that almost all words contains at least one character — or reflex of a character — from this class. Given that most words contain one or more examples of these two characters, and thus they must be essential to the structure of words, I would like to label this class of characters Primes.

In this article I would like to investigate the low level word structure using this newfound class of Primes as a starting point. In this view the simplest word consists of one Prime and all the other characters are structured around it. More complex words have more than one Prime, and the characters must relate to one Prime or the other. Word can thus be broken up in sections, each one of which contains a single Prime to which all the other characters in that section relate.

It is the goal of this article to discover if this kind of analysis can properly model or describe the low level structure of Voynich words. If so, the typical section should be regular and predictable. The high level structure of a word — or how characters relate over the whole word — will be a later article.

Let us look at a few words so that the kind of analysis I am proposing will be clearer.

The word <daiin> contains only one Prime <a>, which is a reflex of <y>, and thus the whole word is a single section. The body of the section is made of the Prime and everything before it, so in this case is <dy>. The section’s tail is everything which comes after the Prime, so here <iin>. For another simple word <chol>: the <o> is the Prime, so the body is <cho> and the tail is <l>.

A more complex word is <qokaiin>. It has two Primes <o> and <y> (here expressed as <a>), and thus two sections. For the purposes of breaking words into more than one section, I will take everything before a Prime as part of that Prime’s section, until another Prime is reached, and everything after the Prime if it is final one of a word. Thus all non–final sections consist of bodies, and only final sections have tails. So the word <qokaiin> has the sections <qo> — which is only a body — and <kaiin>, which has a body of <ky> and a tail of <iin>.

A yet more complex word would be <otedy>. Because of <y> deletion in a middle position, we should consider this word to have a Prime between the <e> and <d>. We could mark this as <y>, but for the purpose of this article I will mark as a null with the symbol <Ø>. The word then has three Primes and three sections. The first is <o>, the second is <teØ>, and the third is <dy>. All three are body only as the last section has no tail.

We can quickly recap the structure of these words as laid out below. The + indicates the link between two sections, and — as the link between the body and tail of a single section.

<daiin> = <dy> — <iin>

<chol> = <cho> — <l>

<qokaiin> = <qo> + <ky> — <iin>

<otedy> = <o> + <teØ> + <dy>

With this breakdown of words we can already see that the bodies and tails of sections in different words have repeating structures. Both <daiin> and <otedy> contain the section body <dy>, while <daiin> and <qokaiin> contain the section tail <iin>. This is encouraging, and it is hoped that the majority of words will be composed of a finite set of such parts which can be reduced to a simple pattern.

The rest of this article is concerned with showing which section bodies and tails are common and how they are structured. First the greater part dealing with section bodies, then next the smaller part dealing with section tails. The following is based upon a list of all words in the Voynich text with ten or more occurrences. Although it will not be exhaustive it should provide insight into the typical word structure.

By definition, the rightmost character in the body of a section is a Prime, and must always be present: either <o>, <y>, or one of the two reflexes of <y>, <a> and <Ø>. Both <y> and <o> may occur alone as the complete section, either as single letter words or as the first section in a word. The occurrence of <y> or <o> as the only character in the first section in a word is very common.

All the other characters in the body are optional. Words which do not meet the basic criteria of having at least one Prime are short, typically one or two characters in length. These words make up only a small number of the total words, and are not included in this analysis.

To the immediate left of the Prime there may occur an <e> group: a string of <e> from one to three characters long. However, only <e> and <ee> are common. When the Prime is <Ø> an <e> group must by definition occur, and it is the rightmost character in the section to be expressed. It is thus possible for it to constitute the whole body of a section, with neither Prime nor any other character. If an <e> group occurs in a section it must be to the immediate left of the Prime position and nowhere else, by definition.

To the left of an <e> group — or the Prime should an <e> group not occur — they may be either of the characters <ch, sh>. The character <ch> is more common than <sh>, but it would seem that both are equally valid. Its occurrence can be dependent on the surrounding characters, and it must be present in some situations. More will be said about this below.

To the left of <ch, sh>, any <e> group, and the Prime, comes the widest selection of characters which may occur in the body of a section: <k, t, p, f, ckh, cth, cph, cfh, d, s, l, r>. These characters work in different ways, and can be grouped according to how the interact with the two possible characters to the right of them, which have already been discussed.

The first and simplest group is <ckh, cth, cph, cfh> which will not take <ch, sh> to their right, but will take an <e> group. The next is <k, t, d, s, l, r> which will all take any combination of <ch, sh> and an <e> group. The third is <p, f> which will take <ch, sh> without an <e> group, but will not take an <e> group without <ch, sh>. All characters readily take only a Prime to their right with no other characters inbetween.

The only character which can occur further to the left in most cases is either of <ch, sh>, and only if one of the selection just mentioned above is also present. However, the occurrence of this further <ch, sh> is complicated. The characters <ckh, cth, cph, cfh> take these characters readily. The characters <k, t, p, f> will take them only in low numbers, and almost not at all if they already have a <ch, sh> to their right. Of the characters <d, s, l, r> the situation is quite complex. The character <d> takes <ch, sh> readily beforehand—but there is a further pattern noted below which may account for this—which goes to a lesser extent for <s>. However, <l, r> only rarely take <ch, sh> before them, if at all.

The patterns set out above account for a great many permissible section bodies, at least among common words. Sequences such as <keo>, <tchy>, <lo>, <dshe>, <ckheey> are all found with frequencies which suggest they are perfectly valid according to the rules governing Voynich word formation. Sequences such as <kdchy>, <tlo>, <cthcho>, <chrey>, <dfy> should not occur, and do not but for the rare exception.

However, there are a three more parts to the general pattern of section bodies which apply to only particular characters.

The first of these is that <d> may occur in the place of an <e> group, so long as <ch, sh> is present to the left (<ldy> occurs, but that may be related to yet another pattern below). Section bodies such as <chdy>, <shdy> <kchdy>, <tchdy>, <pchdy>, <lchdy> all occur, though none with the Prime <o>. The fact that <dy> regularly appears as the final section in a word, while <do> does not, makes this pattern rather curious. It seem as though there may be another process at work.

The next particular pattern is that <l> may occur before some characters at the beginning of a section body. The most common is <k>, though <t> and <d> may occur in much smaller numbers. Indeed, the sequence <lk> is so common, and with no other explanation or clear generalization, that it might be considered a specific quirk of the Voynich text. I believe it could be evidence that <lk> has a particular value as a digraph.

The third is well known and needs no real explanation. Any body which contains only the Prime <o> and is the first section of a word, can add <q> to the left and no further characters. This makes it always the first letter of any word is occurs in, and accounts for practically all occurrences of <q>.

The section tails are much easier to describe and generally much shorter. Unlike section bodies, however, which were not determined in any way by their Prime, tails are significantly constrained by the Prime which comes before them. The Prime is not counted as part of the tail, but for this reason must be considered. Because <y> is seldom found in tails, and <a> conditioned by the following character, three Primes must thus be distinguished: <a>, <o>, and <Ø>.

Many section tails are simple and consist of a single character. The Prime <Ø> may only take <d, s> and nothing else. For <a> and <o> the tail is often one character: for <o> it can be <d, s, l, r, m>, and for <a> it can be <l, r, m, n>. However, both can also take <i> sequences, which are one to three occurrences of the character <i> followed by <l, r, m, n> (although <o> is more limited in which it can take).

The only rare or untypical part of the tail pattern is the sometime occurrence of two characters (that is, apart from <i>). The only one which occurs with any frequency, however, is <ls>.

I have sought to show in the table below what has been said in the text. It should make the overall structure of a Voynich word section clearer. Bear in mind this is a generalized model, it neither explains all possible sections nor are all sections it suggests common. Think of it as a fence around the likeliest section structures, with those outside being unlikely or impossible.

The model pattern of Voynich words I have laid out is tentative and subject to change. I present it here as a first attempt with the hope that it provide a base to build upon. I will soon look at the high level structure of Voynich words and I expect that afterwards I will want to revisit the model.

There is also the question of how well the model pattern fits less common words. Exceptions may prove useful in refining the model, or even disproving its validity.