make performance, low-hanging fruits

*if you manage to rewrite make*, that would be interesting.
In the mean time, you should be aware I did a lot of performance
tweaks in OpenBSD a few years ago.
Seeing that this particular source code diverged somewhat, most of it
will probably require some effort to adapt to netbsd but still:
- the magic variables in each node have a *huge* toll. Adding them to
a "generic" HashTable is very, very expensive. It also means you have
to create hash entries for each and every magic variable.
"meshing" things together helps a lot. Basically, we do two-step hashing:
create a hash value. If it matches a magic variable, then we do the strcmp,
and if it matches, we go look directly in the very small array associated
with the node. Then (only then) we look in one of the generic hashtables.
This requires a slightly more specific abstraction for the hash tables.
You will also gain a wee little bit elsewhere, since you hash first, and
then use the hash key in every context in the chain, instead of doing it
again and again.
- we use hash table everywhere. For instance, for keyword lookup.
That's faster than the binary search you currently use...
we use a hashing scheme that creates 32 bit values, and I have a small
program that figures out the smallest modulo that differentiates keywords.
So, for instance, magic variable lookup goes something like this:
*pk = ohash_interval(name, enamePtr);
len = *enamePtr - name;
/* substitute short version for long local name */
switch (*pk % MAGICSLOTS1) { /* MAGICSLOTS1 should be the */
case K_LONGALLSRC % MAGICSLOTS1:/* smallest constant yielding */
/* distinct case values */
if (*pk == K_LONGALLSRC && len == strlen(LONGALLSRC) &&
strncmp(name, LONGALLSRC, len) == 0)
return ALLSRC_INDEX;
break;
...
- lst.lib uses a dynamic allocation scheme. You can actually separate
the allocation part from the actual list use. This is huge, as it divides
the number of memory allocations by a factor of 10. Just look at targ.c.
Imagine storing the list headers directly in the new node instead of having
to allocate each one of them... apart from not allocating so much memory,
your data structures will have improved locality.
I see that somebody already applied a similar optimisation to buf.c, for
the exact same reasons.
I also noticed you already set up a fast track through the parser for
comment lines. ;-)
- we also specialized some lists into growable arrays. As I recall, this
gained a wee little bit, but not that much.
Granted, your make is different from ours, but from I recall, those
optimizations more than doubled the speed of our make. I don't expect anything
different to happen in yours, and that's much simpler to implement than
rewriting most of the internals... :)