run4classes sheet: this sheet shows the tokens, on the left the lengths 2 up to 8, sorted per length on alphabetical order.
on the right you find the exact same information but there the columns are sorted on usage

sheets a till y will show you the hierarchical trees of the tokens, the dendrograms of the letters in the Voynich manuscript

the last sheet contains ‘tokens not in tree’ which are the tokens that could be integrated in the future. At this point there were of little significance.

The coloring:

A last note on the difference in “word tokens” and “ngram tokens”. As you can see the ngram tokens are artificial tokens: they do not exist specifically as word, but they are defined by the software as part of a word which is of importance. The “word tokens” are real words in the text and do exist as presented and were upgraded to token as explained in the text on this page.

When speaking about “tokens” that refers to the group name and refers to both flavours in general.

The following dendrograms are in the 21 sheets, based on the start letter:

a b c d e f g h i k o l m n p q r s t x y

Which dendrograms are relevant ?

The letter b is a fake letter: is lies always between c_h and checking the dendrogram shows indeed that this is true. cbh is always the correct configuration for that letter.
The letter g and x show only two hands of hits (as first letter) and are not of interest. The same for the dendrogram on first letter m and n. The remaining 15 letters will be discussed now: a c d e f h i k o l p q r s t y.

Dendrogram h

The most obvious question here is: can we put this over the dendrogram cbh?

The letter is not a real startletter of any word (two occurrences were detected which are flaws) but always present as a minimum Bpos (second letter) or higher position in the word.
That written will immediately rule out the sole existence of the letter and it can be ignored as well. 15 sheets remain.

Dendrograms on letters which are not startletters

In fact any letter that is not a Apos (startletter) can be ignored because all parts of that dendrogram will occur in the dendrograms of the other letters. But they might give us a quick insight in the structure of the tokens for that letter !

Any letter (c o q y d l t s k p a r e f i x g h m v z) shown, here in descending order of usage, can be startletter of a word, except b and n.
Note: Words that are two letters long and bigger are taken into account here, not “words” of 1 letter.

This word should have been dal probably. Look at the three line below, just one word to the left. There you see three 8’s (VMS d’s) beneath each other. Compare the last two with the strike of the pen to the bottom with this g in gal.

The character g is tilted to the right and does not really resemble a g, nor does it resemble any other letter. Perhaps the writer intended to write cSh ?

It is difficult to say that the letter g does not exist as startletter, because it’s not decisive.
However in many cases of the character g, it is very doubtful that this is the character g intended. In most cases it is probable the letter d.

The letter g occur in the total text only about 63 times, of which 52 times as last word letter.
Let’s ignore this letter in the token research as startletter.

The letter x

As start of a word this letter can be found in:

xar xor xol xsl xdar xoiin xoltedy xaloeees xasacbhe

xar (2x f112r and f55r) also as part of a word 6 times (f111r, f66r, f112r…)