Oracle Blog

Solaris Linkers and Linking Accessories

The Cost Of ELF Symbol Hashing

Linux and Solaris are ELF cousins. Both systems use the same base
ELF standard, though we've both made our own OS specific extensions over
the years. Recently, the GNU linker folks made some changes to how Linux
does symbol hashing in their ELF objects. I've been learning
about what they've done, and that in turn caused me to consider the
bigger picture of ELF hashing overhead.

History and Trends

In an ELF based system, the runtime linker looks up symbols
within executables and sharable objects.
The available symbols are found in the dynamic symbol
table. The lookup is done using a pre-computed hash
table section associated with that symbol table. The SVR4 ELF ABI
defines the layout of these hash sections, and the hash function.
These are all original ELF features, dating back to the
late 1980's when ELF was designed. This aspect of ELF
has been static since that time.

The runtime linker maintains a list of objects currently
mapped into a programs address space. To find a desired
symbol, it starts at the head of this list and searches each
one in turn using a hash lookup operation, until the symbol is found
(success) or the end of the list is hit (failure).

The per-symbol cost of symbol lookup hashing grows with:

The number of objects in a process.

The number of symbols in those objects.

The length of the symbol names. In particular, the C++ language
encodes class names and argument types into symbol names, which makes
the strings extremely long. Even worse from a string comparison point
of view is that these strings tend to have long shared suffixes,
differing only in the final characters.

These items all grow over time.
In the days when ELF was originally defined, a process
had one or two sharable objects at most, and a few hundred
symbols, all of which had short names. More recently, it has become common for
a program to have
tens of sharable objects, and thousands of symbols, and names have grown.
This trend shows no sign of abating. It is
easy to imagine a near future with hundreds of objects and hundreds
of thousands of symbols. In fact, we have already seen a program with
almost 1000 objects.

In the past, when the list of objects in a program was very short,
it was not necessary to search many objects before a given symbol
was found. Most symbols were in the first, often only, object. Hence,
most hash operations were successful, and hash overhead was not a
significant concern. In modern programs however,
failed hash operations dominate. It is usually necessary to perform
one or more failing hash operations before getting to the object that
has a desired symbol. The more objects, the larger the percentage of
failing hashes.

Unless somehow mitigated, the per-symbol cost of hashing
will continue to grow as programs grow larger, possibly to a level
where the user can feel the effect.
There are several ways in which this overhead can be
reduced:

Eliminate unnecessary symbols

Eliminate the O(n) search of the object list to
reduce the amount of hashing required.

Make each hashing operation cost less.

Eliminate Unnecessary Symbols

Most objects contain global symbols that are for the use of
the object, but not intended to be accessed by outside code.
One common example would be that of a helper function called within
multiple files that are compiled into a sharable object. Such
a function needs to be global within the object so that it can
be called from multiple files. However, it is not intended to
be something the users of the library call directly.

ELF versioning allow symbols
to be assigned to versions, thereby creating interfaces that can be
maintained for backward compatibility as the object evolves.
In a version definition, the scope reduction operator can be used to
tell the link-editor that any global symbols not explicitly assigned
to a version should not be visible outside the object. For example,
the following exports the symbol foo from a version named VERSION_1.0,
while reducing the scope of all other global symbols to local:

VERSION_1.0
{
global:
foo;
local:
*;
};

Some language compilers offer symbol visibility keywords that
have similar effect.

Eliminating unnecessary symbols from the hash table reduces the
average length of the hash chains, and speeds the lookup.
In addition, hiding unnecessary symbols from object's external interface
prevents accidental interposition in which a given library exports
a function intended to only be for its own use, and that symbol
ends up interposing on a symbol of the same name in a different object.

Eliminate O(n) Object Searching

There are different strategies employed in modern operating systems
to minimize the need for symbol hashing:

Prelink

The Linux operating system emphasizes
their
prelink system. Prelink analyzes all the executables
on the system, and the libraries they depend on. Non-conflicting
addresses are assigned for all the libraries, and then the
executables and libraries are pre-bound to be loaded at those
addresses. In effect, the work normally done by the
runtime linker for each object at process startup is instead
done once. The runtime linker, recognizing a prelinked object
at process startup, will map it to its assigned location,
and immediately return rather than do the usual relocation
processing.

Prelinking is a complex per-system operation, though
Linux does a good job of hiding and managing this complexity.
Changes to the system can require the prelink operation
to be redone.

Prelinking pares the cost of ELF runtime processing to
the absolute minimum. As part of that, it completely
eliminates symbol hashing at startup. A further advantage
is that it does not require any changes to the objects
in question. All of the complexity is kept in the prelink
system itself.

Prelinking will not prevent hashing from occurring if:

The object is loaded via dlopen().

The object is not prelinked

If the object is prelinked for one system, but is
then used by another, either as a copy
or via a network filesystem. Prelinking is a
per-system concept. The system can
use an object prelinked for a different system,
but the benefits of prelinking may be lost.

The objects on the system have changed, altering
the prelinking needed. In this case, prelinking
can be recomputed.

Direct Binding

The Solaris operating system employs a combination
of direct binding, preferably in conjunction with lazy loading
and lazy binding. In non-direct
binding, an object records the symbols it requires
from other objects. In direct binding, each such symbol
also records the object expected to provide that symbol.

Direct bindings were developed with several goals in mind:

They harden large programs against the effects
of unintended symbol interposition, when the wrong
library provides a given symbol due to the
order in which libraries get loaded. This makes it
easier to reliably build larger programs.

They allow multiple instances of a named
interface to be referenced by different objects,
further expanding the ability of complex programs
to interface with multiple, perhaps incompatible,
external interfaces. For instance, when a program depends
on many objects that are developed independently
by others, sometimes those objects
have dependencies on different incompatible versions
of some other object.

Performance: At runtime, the runtime linker uses
the direct binding information
to skip the O(n) search of all the objects
in the process, and instead to go directly to
the right library for each symbol, carrying out a single
successful hash operation to locate it within the object.

The the first two items were the
driving issues that led to the invention of direct bindings,
reflecting real problems we were encountering in the field.

Lazy binding, which is the default, reduces the amount of
hashing done even further,
by delaying the binding operation until the first use
of a given symbol by the
program. If a given symbol is not needed during the
lifetime of a process, it is never looked up, and the hashing
that would otherwise occur is eliminated. Most programs
do not use all of the symbols available to them in a given
execution, so lazy binding amplifies the benefits of direct
binding.

Direct bindings do not eliminate hashing, but they
eliminate the unproductive failure cases, leaving a
single successful hash lookup per symbol.
In a sense, they bring us back to the performance of early
ELF systems where everything was found in a single library
and most hash operations were successful.
Direct bindings have a relatively simple implementation.
They work with dlopen(), and across network filesystems.
However, many objects do not use direct bindings.
Unlike prelink, direct bindings require explicit action
to be taken when the object is built.
Converting an existing non-direct object to use them
can require some analysis to be carried out by the
programmer, to enable desired
interposition that direct bindings would otherwise
prevent.

Prelinking and direct bindings are very different solutions that
attack the problem of hashing overhead along different axis. Systems
will see greatly reduced symbol hash
overhead with either strategy, but symbol hashing will still occur.
As such, the cost of the hash lookup operation is
still of interest.

Making Symbol Hashing Cheaper

The existing SVR4 ELF hash function and hash table sections
are fixed. Improving their performance requires introducing
a new hash function and hash section. Recently, the developers
of the GNU linkers
have done this. These new hash sections can coexist in an
object with the older SVR4 hash sections, allowing for backward
compatibility with older systems.
The GNU hash reduces hash overhead in the following ways:

The dynamic symbol table is sorted into hash order,
such that memory access tends to be adjacent and monotonically
increasing, which can help cache behavior. (Note that the Solaris
link-editor does a similar sort, although the specific details
differ.)

The dynamic symbol table contains some
symbols that are never looked up by via the hash table.
These symbols are left out of
the hash table, reducing its size and hash chain lengths.

Perhaps most significantly, the GNU hash section
includes a
Bloom filter.
This filter is used prior to hash lookup to determine if the
symbol is found in the object or not.

Bloom filters are used to test whether a given item is part of a
given set. They use a compact bitmask representation, and are fast to
query. Bloom filters,
are probabilistic: False positives are possible, but false negatives
are not. The size of the bitmask used to represent the filter, and the
quality and number of
hash functions determine the rate of false positives.

A Bloom filter is never wrong when it says an item does not
exist in a set. Applied to ELF hash tables,
the runtime linker can test a symbol name against a Bloom filter, and if
rejected, immediately move on to the next object. Since
most symbol lookups end in failure
as discussed above, this has the potential to eliminate a large number
of unnecessary hash operations. It is possible for a Bloom filter
to incorrectly indicate that an item exists in the set when it doesn't.
When this happens, it is caught by the following hash lookup,
so correct linker operation is not affected. Since false positives
are rare, this does not significantly affect performance.

It is interesting to note that the use of a Bloom filter
makes a successful symbol lookup slightly more expensive than
it would be otherwise. The hash table alone can be used
to determine if a symbol exists or not, so the Bloom filter
is pure overhead in the success case. Despite that, the
Bloom filter is a winning optimization, because it is very
cheap to compute compared to a hash table lookup, and because
most hash operations are against libraries that end up not containing
the desired symbol.

It is also worth noting that the runtime linker is free to
skip the use of the Bloom filter and proceed directly to
the hash lookup operation. This may be worthwhile in situations
where the runtime linker has other reasons to believe the
lookup will be successful. In particular, if the runtime
linker is directed to a given object
via a direct binding, the odds of a failed symbol lookup
should be zero, so there is no need to screen before
the hash lookup.

Conclusions

Tweaking the performance of an
existing algorithm has its place, particularly within the
inner loops of a busy program.
However, the big wins are usually the result of using a better
algorithm. In Solaris, direct bindings have been
our algorithmic approach to reducing hash overhead. We've made a
conscious effort to develop and deploy direct bindings in preference
to making improvements to the traditional hash process. We've been pleased with
the results  direct bindings are faster and
combined with their other attributes, allow programs to scale larger.

Nonetheless, the GNU hash embodies several worthwhile ideas:

Better hash function

Doesn't put symbols that the runtime
linker doesn't care about into the hash table.

Bloom filter cheaply eliminates most hash operations.

In particular, the use of a Bloom filter to cheaply filter
away unproductive hash operations stands out as a very interesting idea.

It is clear that hash overhead can be measured and reduced.
By and large however, we have not found ELF hash overhead
to be a hot spot for real programs. It seems that the other
things that go on in a program
generally dominate the hit that comes from ELF symbol hashing.
The core Solaris OS is now built using direct bindings, which
has allowed us to harden and simplify aspects of its design.
Interestingly enough, measurements do not reflect a significant
resulting change in system performance. This does not prove
that hash overhead should be ignored, but it does tend to suggest
that one has to look towards extremely large non-direct
bound programs in order to demonstrate the issue.

It's all food for thought, and perhaps time for some
experimentation and measurement.