9/30/2011

Finally for completeness, some of the matchers from Tornado in FreeArc. These are basically all standard "cache table" style
matchers, originally due to LZRW, made popular by LZP and LZO. The various Tornado settings select different amounts of hash rows
and ways.

As they should, they have very constant time operation that goes up pretty steadily from Tornado -3 to -7, because there's a
constant number of hash probes per match attempt.

There may be something wrong with my Tornado wrapper as the -3 matcher actually finds the longest total length. I dunno.
The speeds look reasonable. I don't really care much about these approximate matchers because the loss is hard to
quantify, so there you go (normally when I see an anomaly like that I would investigate it to make sure I understand
why it's happening).

Conclusion : I've got to get off string matching so this is probably the end of posts on this topic.

MMC looks promising but has some flaws. There are some cases that trigger a slowness spike in it. Also it
has some bad O(N^2) with unbounded match length ("MMC2") so I have to run it with a limit ("MMC1") which removes
some of its advantage over LzFind and Hash1 and other approximate matchers. (without the limit it has the
advantage of being exact). It's also a GPL at the moment which is a killer.

LzFind doesn't have anything going for it really.

For approximate/small-window matching I don't see any reason to not use the classic Zip hash chain method. I tried a few variants
of this, like doing a hash chain to match the first 4 bytes and then link listing off that, and all the variants were worse than
the classic way.

For large window / exact matching / optimal parsing, a correct O(N) matcher is the way to go. The suffix-array based matcher is by
far the easiest for your kids to implement at home.

Okay, finally on to greedy parsing. Note with greedy parsing the average match length per byte is always <= 1.0
(it's actually the % of bytes matched in this case).

Two charts for each , the first is clocks per byte, the second is average match length. Note that Suffix5 is just for reference and
is neither windowed nor greedy.

got arg : window_bits = 16

got arg : window_bits = 17

got arg : window_bits = 18

Commentary :

Okay, finally MMC beats Suffix Trie and LzFind, this is what it's good at. Both MMC and LzFind get slower as the
window gets larger. Surprisingly, the good old Zip-style Hash1 is significantly faster and finds almost all the matches
on these files. (note that LzFind1 and Hash1 both have search limits but MMC does not)

Still doing "optimal" (non-greedy parsing) but now lets move on to windowed & non-exact matching.

Windowed, possibly approximate matching.

Note : I will include the Suffix matchers for reference, but they are not windowed.

16 bit window :

Clocks per byte :

Average Match len :

This is what LzFind is designed for and it's okay at it. It does crap out pretty badly on the rather degenerate "particles.max" file, and it also fails to find a
lot of matches. (LZFind1 has a maximum match length of 256 and a maximum of 32 search steps, which are the defaults in the LZMA code; LzFind2 which we saw before
has those limits removed (and would DNF on many of these files)).

"lztest" is not a stress test set, it's stuff I've gathered that I think is roughly reflective of what games
actually compress. It's interesting that this data set causes lots of DNF's (did not finish) for MMC and LzFind.

Suffix5 (the real suffix trie) is generally slightly faster than the suffix array. It should be, of course, if I
didn't do a bonehead trie implementation, since the suffix array method basically builds a trie in the sort, then
reads it out to sorted indexes, and then I convert the sorted indexes back to match lengths.

I won't be showing results on CCC for the most part because it's not very reflective of real world modern data, but I wanted
to run on a set where MMC and LzFind don't DNF too much to compare their speed when they do succeed. Suffix Trie is
almost always very close to the fastest except on paper4 & paper5 which are very small files.

Test_Hash3 : uses cblib::hash_table (a reprobing ("open addressing" or "closed hashing", I prefer reprobing)) to
hash the first 4 bytes then a linked list. I was surprised to find that this is almost the same speed as Hash1 and
sometimes faster, even though it's a totally generic template hash table (that is not particularly well suited to this
usage).

Conclusion : SuffixArray2 and Suffix5 both actually work and are correct with no blowup cases.

SuffixArray1 looks good on the majority of files (and is slightly faster than SuffixArray2 on those files), but "stress_suffix_forward"
clearly calls it out and shows the break down case.

Suffix2 almost works except on the degenerate tests due to failure to get some details of follows quite right ( see here ).

Suffix3 just shows that a Suffix Trie without follows is some foolishness.

We won't show SuffixArray1 or Suffix2 or Suffix3 again.

MMC2 and LZFind2 both have bad failure cases. Both are simply not usable if you want to find the longest match at every byte. We will
revisit them later in other usages though and see that they are good for what they're designed for.

I've not included any of the hash chain type matchers in this test because they all obviously crap their pants in this scenario.

I was hoping to make some charts and graphs, but it's just not that interesting. Anyhoo, let's get into it.

What am I testing? String matching for an LZ-type compressor. Matches must start before current pos but can run past current pos.
I'm string matching only, not compressing. I'm counting the total time and total length of matches found.

I'm testing match length >= 4. Matches of length 2 & 3 can be found trivially by table lookup (though on small files this is not a good
way to do it). Most of the matchers can handle arbitrary min lengths, but this is just easier/fairer for comparison.

I'm testing both "greedy" (when you find a match step ahead its length) and "optimal" (find matches at every position). Some matchers
like the suffix tree ones don't really support greedy parsing, since they have to do all the work at every position even if you don't
want the match there.

I'm testing windowed and non-windowed matchers.

I'm testing approximate and non-approximate (exact) matchers. Exact matchers find all matches possible, approximate matchers find some
amount less. I'm not sure the best way to show the approximation vs. speed trade off. I guess you want a "pareto frontier" type of graph,
but what should the axes be?

Also, while I'm at it, god damn it!

MAKE YOUR CODE FREE PEOPLE !!

(and GPL is not free). And some complicated personal license is a pain in the ass. I used to do this
myself, I know it's tempting. Don't fucking do it. If you post code just make it 100% free for all uses.
BSD license is an okay choice.

Matchers I'm having trouble with :

Tornado matchers from FreeArc - seem to be GPL (?)
MMC - GPL

LzFind from 7zip appears to be public domain. divsufsort is free. Larsson's slide is free.

A very common missed optimization is letting the OS zero large chunks of memory for you.

Everybody just writes code like this :

U32 * bigTable = malloc(20<<20);
memset(bigTable,0,20<<20);

but that's a huge waste. (eg. for large hash table on small files the memset can dominate your time).

Behind your back, the operating system is actually running a thread all the time as part of the System Idle Process
which grabs free pages and writes them with zero bytes and puts them on the zero'ed page list.

When you call VirtualAlloc, it just grabs a page from the zeroed page list and hands it to you.
(if there are none available it zeroes it immediately).

!!! Memory you get back from VirtualAlloc is always already zeroed ; you don't need to memset it !!!

The OS does this for security, so you can never see some other app's bytes, but you can also use it to get
zero'ed tables quickly.

(I'm not sure if any stdlib has a fast path to this for "calloc" ; if so that might be a reason to prefer that
to malloc/memset; in any case it's safer just to talk to the OS directly).

ADDENDUM : BTW to be fair none of my string matchers do this, because other people's don't and I don't want
to win from cheap technicalities like that. But all string match hash tables should use this.

9/29/2011

Some subtle things that it took me a few days to track down. Writing for my reference.

1. Follows updates should be a bit "lazy". With path compression you aren't making all the nodes on a suffix. So when you
match at length 5, the follow at length 4 might not exist. (I made a
small note on the consequences of this previously .
Even if the correct follow node doesn't exist, you should still link in to the next longest follow node possible (eg. length 3 if a 4 doesn't exist).
Later on the correct follow might get made, and then if possible you want to update it. So you should consider the follows links to be
constantly under lazy update; just because a follow link exists it might not be the right one, so you may want to update it.

eg. say you match 4 bytes of suffix [abcd](ef..) at the current spot. You want to follow to [bcd] but there is no length 3 node of that suffix currently.
Instead you follow to [bc] (the next best follow available) , one of whose children is [dxy], you now split the [dxy] to [d][xy] and add [ef] under [d]. You can
then update the follow from the previous node ([abcd]) to point at the new [bc][d] node.

2. It appears that you only need to update one follow per byte to get O(N). I don't see that this is obvious from a theoretical standpoint,
but all my tests pass. Say you trace down a long suffix. You may encounter several nodes that don't have fully up to date follow pointers.
You do not have to track them all and update them all at the next byte. It seems you can just update the deepest one (not the deepest node,
but the deepest node that needs an update). (*)

3. Even if your follow is not up to date, you can still use the gauranteed (lastml-1) match len to good advantage. This was a big one that I
missed. Say you match 4096 bytes and you take the follow pointer, and it takes you to a node of depth 10. You've lost a lot of depth -
you know you must match at least 4095 bytes and you only have 10 of them. But you still have an advantage. You can descend the tree and
skip all string compares up to 4095 bytes. In particular, when you get to a leaf you can immediately jump to matching 4095 of the leaf pointer.

4. Handling of EOF in suffix algorithms is annoying; it needs to act like a value outside the [0,255] range. The most annoying case is
when you have a degenerate suffix like aaaa...aaaEOF , because the "follow" for that suffix might be itself (eg. what follows aaa... is aa..)
depending on how you handle EOF. This can only happen with the degenerate RLE case so just special casing the RLE-to-EOF case avoids some
pain.

(* = #2 is the thing I have the least confidence in; I wonder if there could be a case where the single node update doesn't work, or if
maybe you could get non-O(N) behavior unless you have a more clever/careful update node selection algorithm)

9/28/2011

A suffix sorter (such as the excellent divsufsort by Yuta Mori) provides a list of the suffix positions
in an array in sorted order. Eg. sortedSuffixes[i] is the ith suffix in order.

You can easily invert this table to make sortLookup such that sortLookup[ sortedSuffix[i] ] == i . eg.
sortLookup[i] is the sort order for position i.

Now at this point, for each suffix sort position i, you know that the longest match with another suffix is
either at i-1 or i+1.

Next we need the neighboring pair match lengths for the suffix sort. This can be done in O(N)
as
previously described here . So we now have a sortSameLen[] array such that sortSameLen[i] tells
you the match length between (sorted order) elements i and i+1.

Using just these you can find all the match lengths for any suffix in the array thusly :

For a suffix start at index pos
Find its sort order : sortIndex = sortLookup[pos]
In each direction (+1 and -1)
current_match_len = infinite
step to next sort index
current_match_len = MIN(current_match_len,sortSameLen[sort index])

When matching strings for LZ and such, we don't want the longest match in the array, we want the longest
match that occurs earlier. Handled naively this ruins the great O() performance of suffix array string matching.
But you can do better.

Run
Algorithm Next Index with Lower Value on the sortedSuffix[] array. This provides an array
nextSuffixPreceding[]. This is exactly what you need - it provides the next closest suffix with a
preceding index.

Now instead of the longest match being at +1 and -1, the longest match is at nextSuffixPreceding[i] and
priorSuffixPreceding[i].

There's one last problem - if my current suffix is at position pos, and I look up si = sortIndex[pos] and
from that nextSuffixPreceding[si] - I need to walk up to that position one by one doing MIN() on the
adjacent pair match lengths (sortSameLen). That ruins my O() win.

But there's a solution - simply build the match length as well when you run "next index with lower value".
This can be done easily by tracking the match length back to the preceding "fence". This adds no complexity
to the algorithm.

that is, the match length lookup is a very simple O(1) per position (or O(N) for all positions).

One minor annoyance remains, which is that the suffix array string searcher does not provide the lowest
offset for a given length of match. It gives you the closest in suffix order, which is not what you want.

For each i, find the next entry j (j > i) such that the value is lower (A[j] < A[i]).

Fill out B[i] = j for all i.

For array size N this can be done in O(N).

Here's how :

I'll call this algorithm "stack of fences". Walk the array A[] from start to finish in one pass.

At i, if the next entry (A[i+1]) is lower than the current (A[i]) then you have the ordering you want
immediately and you just assign B[i] = i+1.

If not, then you have a "fence", a value A[i] which is seeking a lower value. You don't go looking for
it immediately, instead you just set the current fence_value to A[i] and move on via i++.

At each position you visit when you have a fence, you check if the current A[i] < fence_value ? If so,
you set B[fence_pos] = i ; you have found the successor to that fence.

If you have a fence and find another value which needs to be a fence (because it's lower than its successor)
you push the previous fence on a stack, and set the current one as the active fence.
Then when you find a value that satisfies the new fence, you pop
off the fence stack and also check that fence to see if it was satisfied as well.
This stack can be stored in place in the B[] array, because the B[] is not yet
filled out for positions that are fences.

9/27/2011

There are three general classes of how string matchers respond to a case like "twobooks" :

1. No problemo. Time per byte is roughly constant no matter what you throw at it (for both greedy and
non-greedy parsing).
This class is basically only made up of matchers that have a correct "follows" implementation.

2. Okay with greedy parsing. This class craps out in some kind of O(N^2) way if you ask them to match at
every position, but if you let them do greedy matching they are okay. This class does not have a correct
"follows" implementation, but does otherwise avoid O(N^2) behavior. For example MMC seems to fall into this
class, as does a suffix tree without "follows".

Any matcher with a small constant number of maximum compares
can fall into this performance class, but at the cost of an unknown amount of match quality.

3. Craps out even with greedy parsing. This class fails to avoid O(N^2) trap that happens when you have a long
match and also many ways to make it. For example simple hash chains without an "amortize" limit fall in this
class. (with non-greedy parsing they are O(N^3) on degenerate cases like a file that's all the same char).

Two other interesting stress tests I'm using are :

Inspired by ryg, "stress_suffix_forward" :

4k of aaaaa...
then paper1
then 64k of aaaa...

obviously when you first reach the second part of "aaaa..." you need to find the beginning of the file,
but a naive suffix sort will have to look through 64k of following a's before it finds it.

Another useful one to check on the "amortize" behavior is "stress_search_limit" :

book1
then, 1000 times :
128 random bytes
the first 128 bytes of book1
book1 again

obviously when you encounter all of book1 for the second time, you should match the whole book1 at the
head of the file, but matchers which use some kind of simple search limit (eg. amortized hashing)
will see the 128 byte matches first and may never get back to the really long one.

(their paper also contains a proof of O(N)'ness , though it is obvious if you think about it a bit; see
comments on previous post about this).

Doing Judy-ish stuff for a suffix tree is exacly analogous to the "introspective" stuff that's done in
good suffix array sorters like divsufsort.

By Judy-ish I mean using a variety of tree structures and selecting one for the local area based on its
properties. (eg. nodes with > 100 children switch to just using a radix array of 256 direct links to kids).

Suffix tries are annoying because it's easy to slide the head (adding nodes) but hard to slide the tail
(removing nodes). Suffix arrays are even worse in that they don't slide at all.

The normal way to adapt suffix arrays to LZ string matching is just to use chunks of arrays (possibly
a power-of-2 cascade). There are two problems I haven't found a good solution to. One is how to look up
a string in the chunk that it is not a member of (eg. a chunk that's behind you). The other is how
to deal with offsets that are in front of you.

If you just put your whole file in one suffix array, I believe that is unboundedly bad. If you were
allowed to match forwards, then finding the best match would be O(1) - you only have to look at the
two slots before you and after you in the sort order. But since we can't match forward, you have to
scan. The pseudocode is like this :

do both forward and backward :
start at the sort position of the string I want to match
walk to the next closest in sort order (this is an O(1) table lookup)
if it's a legal match (eg. behind me) - I'm done, it's the best
if not, keep walking

the problem is the walk is unbounded. When you are somewhere early in the array, there can be an arbitrary
number (by which I mean O(N)) of invalid matches between you and your best match in the sort order.

Other than these difficulties, suffix arrays provide a much simpler way of getting the advantages of suffix
tries.

Suffix arrays also have implementation advantages. Because you separate the suffix string work from the rest of
your coder it makes it easier to optimize each one in isolation, you get better cache use and better register
allocation. Also, the suffix array can use more memory during the sort, or use scratch space, while a trie has
to hold its structure around all the time. For example some suffix sorts will do things like use a 2-byte radix
in parts of the sort where that makes sense (and then they can get rid of it and use it on another part of the
sort), and that's usually impossible for a tree that you're holding in
memory as you scan.

9/25/2011

This might be a series until I get angry at myself and move on to more important todos.

Some notes :

1. All LZ string matchers have to deal with this annoying problem of small files vs. large ones (and
small windows vs large windows). You really want very different solutions, or at least different
tweaks. For example, the size of the accelerating hash needs to be tuned for the size of data or you
can spend all your time initializing a 24 bit hash to find matches in 10 byte file.

2. A common trivial case degeneracy is runs of the same character. You can of course add special case
handling of this to any string matcher. It does help a lot on benchmarks of course, because this case is
common, but it doesn't help your worst case in theory because there are still bad degenerate cases.
It's just very rare to have long degenerate matches that aren't simple runs.

One easy way to do this is to special case just matches that start with a degenerate char. Have a special
index of [256] slots which correspond to starting with >= 4 of that char.

3. A general topic that I've never seen explored well is the idea of approximate string matching.

Almost every LZ string matcher is approximate, they consider less than the full set of matches. Long ago
someone referred to this as "amortized hashing" , which refers to the specific implemntation of a hash chain
(hash -> linked list) in which you simply stop searching after visiting some # of links. (amortize = minimize
the damage from the worst case).

Another common form of approximate string searching is to use "cache tables" (that is, hash tables with
overwrites). Many people use a cache tables with a few "ways".

The problem with both these approaches is that the penalty is *unbounded*. The approximate match can be
arbitrarily worse than the best match. That sucks.

What would be ideal is some kind of tuneable and boundable approximate string match. You want to set some
amount of loss you can tolerate, and get more speedup for more loss.

(there are such data structures for spatial search, for example; there are nice aproximate-nearest-neighbors
and high-dimensional-kd-trees and things like that which let you set the amount of slop you tolerate, and
you get more speedup for more slop. So far as I know there is nothing comparable for strings).

Anyhoo, the result is that algorithms with approximations can look very good in some tests, because they find
99% of the match length but do so much faster. But then on another test they suddenly fail to find even 50%
of the match length.

only the first character is used for selecting between siblings, but then you may need to step multiple characters to get to the next
branch point.

(BTW I just thought of an interesting alternative way to do suffix tries in a b-tree/judy kind of way. Make your node always have 256 slots.
Instead of always matching the first character to find your child, match N. That way for sparse parts of the tree N will be large and you
will have many levels of the tree in one 256-slot chunk. In dense parts of the tree N becomes small, down to 1, in which case you get a
radix array). Anyhoo..

So there are substrings that don't correspond to any specific node. For example "abx" is between "ab" and "abxy" which have definite spots in
the tree. If you want to add "abxr" you have to first break the "xy" and then add the new node.

Okay, this is all trivial and just tree management, but there's something interesting about it :

If you have a "follow" pointer and the length you want does not correspond to a specific node (ie it's one of those between lengths), then
there can be no longer match possible.

So, you had a previous match of length "lastml". You step to the next position, you know the best match is at least >= lastml-1. You use
a follow pointer to jump into the tree and find the node for the following suffix. You see that the node does not have length "lastml-1",
but some other length. You are done! No more tree walking is needed, you know the best match length is simply lastml-1.

Why is this? Consider if there was a longer match possible. Let's say our string was "sabcdt..." at the last position we matched 5 ("sabcd"). So we
now have "abcdt..." and know match is >= 4. We look up the follow node for "abcd" and find there is no length=4 node in the tree. That
means that the only path in the tree had "dt" in it - there has been no character other than "t" after "d" or there would be a branching node
there. But I know that I cannot match "t" because if I did then the previous match would have been longer. Therefore there is no longer
match possible.

This turns out to be very common. I'm sure if I actually spent a month or so on suffix tries I would learn lots of useful properties (there
are lots of papers on this topic).

To make terminology clear I'm going to use "trie" to mean a tree in which as you descend the length of character match always gets longer, and "suffix trie" to indicate the special
case where a trie is made from all suffixes *and* there are "follow" pointers (more on this later).

Just building a trie for LZ string searching is pretty easy. Using the linked-list method (which certainly has disadvantages), internal nodes only need a child & sibling
pointer, and some bit of data. If you always descend one char at a time that data is just one char. If you want to do "path compression" (multi-char steps in a single link)
you need some kind of pointer + length.

(it's actually much easier to write the code with path compression, since when you add a new string you only have to find the
deepest match in the tree then add one node; with single char steps you may have to add many nodes).

So for a file of length N, internal nodes are something like 10 bytes, and you need at most N nodes. Leaves can be smaller or even implicit.

With just a normal trie, you have a nice advantage for optimal parsing, which is that when you find the longest match, you also automatically walk past all shorter matches.
At each node you could store the most recent position that that substring was seen, so you can find the lowest offset for each length of match for free. (this requires more
storage in the nodes plus a lot more memory writes, but I think those memory writes are basically free since they are to nodes in cache anyway).

The Find and Insert operations are nearly identical so they of course should be done together.

A trie could be given a "lazy update". What you do is on Insert you just tack the nodes on somewhere low down in the tree. Then on Find, when you encounter nodes that have
not been fully inserted you pick them up an carry them with you as you descend. Whenever you take a path that your baggage can't take, you leave that baggage behind. This
could have advantages under certain usage patterns, but I haven't actually tried it.

But it's only when you get the "follow" pointers that a suffix trie really makes a huge difference.

A follow pointer is a pointer in the tree from any node (substring) to the location in the tree of the substring without the first character. That is, if you are at
"banana" in the tree, the follow pointer should point at the location of "anana" in the tree.

When you're doing LZ compression and you find a match at pos P of length L, you know that at pos P+1 there must be a match of at least length L-1 , simply by using the same
offset and matching one less character. (there could be a longer match, though). So, if you know the suffix node that was used to find the match of length L at pos P,
then you can jump in directly to match of length L-1 at the next position.

This is huge. Consider for example the fully degenerate case, a file of length N of all the same character. (yes obviously there are special case solutions to the fully
degenerate case, but that doesn't fix the problem, it just makes it more complex to create the problem). A naive string matcher is actually O(N^3) !!

For each position in the file (*N)
Consider all potential matches (*N)
Compare all the characters in that potential match (*N)

A normal trie makes this O(N^2) , because the comparing characters in the string is combined with finding all potential matches, so the tree descent + string compares
combined are just O(N).

But a true suffix trie with follow pointers is only O(N) for the whole parse. Somewhere early on would find a match of length O(N) and then each subsequent one just
finds a match of L-1 in O(1) time using the follow pointer. (the O(N) whole parse only works if you are just finding the longest length at each position; if you are doing
the optimal parse where you find the lowest offset for each length it's O(N^2))

Unfortunately, it seems that when you introduce the follow pointer this is when the code for the suffix trie gets rather tricky.
It goes from 50 lines of code to 500 lines of code, and it's hard to do without introducing parent pointers and lots more
tree maintenance. It also makes it way harder to do a sliding window.

(I'm playing a bit loose with the term "suffix tree" as most people do;
in fact a suffix tree is a very special construction that uses the all-suffixes property and internal pointers
to have O(N) construction time; really what I'm talking about is a radix string tree or patricia type tree).
(also I guess these trees are tries)

Some background first. You want to match strings for LZ compression. Say you decide to use a suffix tree.
At each level of the tree, you have already matched L characters of the search string;
you just look up your next character and descend that part of the tree that has that character as a prefix. eg. to
look up string str, if you've already decended to level L, you find the child for character str[L] (if it exists)
and descend into that part of the tree.
One way to implement this is to use a linked list for all the characters that have been seen at a given level
(and thus point to children at level +1).

Okay, pretty simple. This structure is not used much in data compression because we generally want sliding
windows, and removal of strings as they fall out of the sliding window is difficult.

(Larsson and others have shown that it is possible to do a true sliding suffix tree, but the complexity
has prevented use in practice; this would be a nice project if someone wants to make an actual fast
implementation of the sliding suffix trie)

Now let's look at the standard way you do a hash table for string matching in the LZ sliding window case.

The standard thing is to use a fixed size hash to a linked list of all strings that share that hash.
The linked list can just be an array of positions where that hash value last occured. So :

pos = hashTable[h] contains the position where h last occured
chain[pos] contains the lat position before pos where that same hash h occurred

the nice thing about this is that chain[] can just be an array of the size of the sliding window, and you
modulo the lookup into it. In particular :

note that the links can point outside the sliding window (eg. either hashTable[] or chain[] may contain
values that go outside the window), but we detect those and know our walk is done. (the key aspect here
is that the links are sorted by position, so that when a link goes out of the window we are done with
the walk; this means that you can't do anything like MTF on the list because it ruins the position sort
order). Also note that there's no check for null needed because we can just initial the hash table with a
negative value so that null is just a position outside the window.

To add to the hash table when we slide the window we just tack onto the list :

and there's the sort of magic bit - we also removed a node right there. We actually popped the node off
the back of the sliding window. That was okay because it must have been the last node on its list, so
we didn't corrupt any of our lists.

That's it for hash-chain review. It's really nice how simple the add/remove is, particularly for "Greedy"
type LZ parsers where you do Insert much more often than you do Find. (there are two general classes of
LZ parers - "Optimal" which generally do a Find & Insert at every position, and "Greedy" which when they find
a match, step ahead by the match len and only do Inserts).

So, can we get the advantages of hash chains and suffix trees?

Well, we need another idea, and that is "lazy updates". The idea is that we let our tree get out of sorts
a bit, and then fix it the next time we visit it. This is a very general idea and can be applied to almost
any tree type. I think the first time I encountered it was in the very cool old SurRender Umbra product,
where they used lazy updates of their spatial tree structures. When objects moved or spawned they got put
on a list on a node. When you descend the tree later on looking for things, if a node has child nodes you
would take the list of objects on the node and push them to the children - but then you only descend to
the child that you care about. This can save a lot of work under certain usage patterns; for example
if objects are spawning off in some part of the tree that you don't visit, they just get put in a high up
node and never pushed down to the leaves.

Anyhoo, so our suffix tree requires a node with two links. Like the hash table we will implement our links
just as positions :

struct SuffixNode { int sibling; int child; }

like the hash table, our siblings will be in order of occurance, so when we see a position that's out of
the window we know we are done walking.

Now, instead of maintaining the suffix tree when we add a node, we're just going to tack the new node on the front
of the list. We will then percolate in an update the next time we visit that part of the tree. So when you
search the tree, you can first encounter some unmaintained nodes before you get to the maintained section.

For example, say we had "bar" and "band" in our tree, and we add "bang" at level 2 , we just stick it on the head
and don't descend the tree to put it in the right place :

now the next time we visit the "ba" part of the tree in a retrieval, we also do some maintenance. We remember the
first time we see each character (using a [256] array), and if we see that same character again we know that it's
because part of the tree was not maintained.

Say we come in looking for "bank". If see a node with an "n" (that's a maintained n) we know we are done and we go to
the child link - there can't be any more n's behind that node. If we see an "N" (no child link), we remember it but
we have to keep walking siblings. We might see more "N"s and we are done if we see an "n". Then we update the links.
We remove the "n" (of band) from the sibling link and connect it to the "N" instead :

b
| (child links are vertical)
a
|
n-r
|
g---d

And this is the essence of MMC (lazy update suffix trie = LUST).

A few more details are significant. Like the simple hash chain, we always add nodes to the front of the list. The lazy update
also always adds nodes to the head - that is, the branch that points to more children is always at the most recent occurance of
that substring. eg. if you see "danger" then "dank" then "danish" you know that the "dan" node is either unmaintained, or points
are the most recent occurance of "dan" (the one in "danish"). What this means is that the simple node removal method of the hash
chain works - when the window slides, we just let nodes fall out of the range that we consider valid and they drop off the end
of the tree. We don't have to worry about those nodes being an internal node to the tree that we are removing, because they are
always the last one on a list.

In practice the MMC incremental update becomes complex because you may be updating multiple levels of the tree at once as you scan.
When you first see the "NG" you haven't seen the "n" yet and you don't want to scan ahead the list right away, you want to process it when you see it;
so you initially promote NG to a maintained node, but link it to a temporary invalid link that points back to the previous level.
Then you keep walking the list and when you see the "n" you fix up that link to complete the maintenance.

It does appear that MMC is a novel and interesting way of doing a suffix trie for a sliding window.