Hans-Peter Diettrich <DrDiettrich1@aol.com> writes:>nathan.mccauley@gmail.com schrieb:>>>>> I'm trying to put together a list of various methods to do code generate>>>> for switch statements. ...>>>> Has anyone seen tries used for the string-based switch statements?
...>[If there's a lot of strings, a patricia trie might be faster than>hashing or binary search since it doesn't require repeated scans of>the string during the match process. -John]

It's very doubtful that it's faster than hashing. In hashing you
traverse the tree once for computing the hash function and once for
the final match. All other match attempts (and there are typically <1
of those) will typically mismatch at the first character, so no
complete scan is necessary. You typically have to fetch one cache
line per match attempt (i.e., typically 1-2; if you use external
chaining, add another cache line fetch for the initial table lookup).

In contrast, in the trie method you have to chase a pointer for each
substring you look at; if there are a lot of string, it's probably at
every character for the first few characters. And for each pointer
you chase, you will have to fetch a cache line.

Ok, if there is little competition for the D-cache, the first few
levels of the trie might end up in the D-cache. OTOH, if a few words
are looked up often, all the lines necessary for their lookup will be
in the D-cache with a hash table, too.

Overall, I guess you can find cases where the patricia trie is faster
(an application that does little but looking up lots of different,
relatively short strings) but in most cases I think that hashing is
faster, assuming good implementations of each data structure.

For our switch statement, we have to consider another issue: the data
structures can be hard-coded; then at each node of the trie there will
be a conditional or an indirect branch; for the hash table we have an
indirect jump after computing the hash function, and then one or two
conditional branches afterward. Depending on how repetetive and
predictable the strings coming in are, the branch predictors may
predict very well or very badly (i.e., 50% mispredicts for conditional
branches, 100% for indirect branches).

Anyway, the more branches there are, the worse the effect of bad
predictability will be (hitting the trie harder). It may be better
not to hard-code the data structure to avoid some of the
mispredictions; in the end, though, you probably will perform one
indirect jump if you don't hard-code the data structure; I think that
hard-coding the hash table will not be worse than not hard-coding it,
but hard-coding the trie can be worse (depending on circumstances).