The offset for the final word in each group points to the next word in
the following group. The offset for the final word and soundslike
group in the dictionary is 0.

There is some provisions for additional info to be stored with the word
but for simplicity, it's left out here. If soundslike data is not used
then the soundslike block it not used.

This format makes it easy to iterate over the data without using the
hash table.

Each soundslike group can be a maximum of about 256 bytes. If this
limit is reached then the soundslike group is split. Using 2 bytes for
the soundslike offset would of solved this problem however 256 bytes
is normally sufficient, thus I would of wasted some space by using an
extra byte. More importantly, Using 2 bytes means I would of had to
worry about alignment issues.

The soundslike groups are sorted in more or less alphabetic order.

The hash table is a simple open address hash table. The key is the
dictionary word in all lowercase form with all accents removed (what is
known as the "clean" form of the word). The value stored in the table
is a 32-bit offset to the beginning of the word. A 32-bit integer
offset is used rather than a pointer so that the compiled dictionary
can be mmaped to make loading the dictionary very fast and so that the
memory can be shared between processed, and on 64 bit platforms using
pointers would have doubled the size of the hash table.