Direct Bindings and Interposition

Interposition can occur when multiple instances of a symbol, having the same
name, exist in different dynamic objects that have been loaded into a process.
Under the default search model, symbol references are bound to the first
definition that is found in the series of dependencies that have been loaded.
This first symbol is said to interpose on the other symbols of
the same name.

Direct bindings can circumvent any implicit interposition. As the directly bound reference
is searched for in the dependency associated with the reference, the default
symbol search model that enables interposition, is bypassed. In a directly bound environment,
bindings can be established to different definitions of a symbol that have
the same name.

The ability to bind to different definitions of a symbol that have
the same name is a feature of direct binding that can be
very useful. However, should an application depend upon an instance of interposition, the
use of direct bindings can subvert the applications expected execution. Before deciding
to use direct bindings with an existing application, the application should be
analyzed to determine whether interposition exists.

To determine whether interposition is possible within an application, use lari(1). By
default, lari conveys interesting information. This information originates from multiple instances of
a symbol definition, which in turn can lead to interposition.

Interposition only occurs when one instance of the symbol is bound to.
Multiple instances of a symbol that are called out by lari might
not be involved in interposition. Other multiple instance symbols can exist, but might
not be referenced. These unreferenced symbols are still candidates for interposition, as
future code development might result in references to these symbols. All instances of
multiply defined symbols should be analyzed when considering the use of direct
bindings.

If multiple instances of a symbol of the same name exist, especially
if interposition is observed, one of the following actions should be performed.

Localize symbol instances to remove namespace collision.

Remove the multiple instances to leave one symbol definition.

Define any interposition requirement explicitly.

Identify symbols that can be interposed upon to prevent the symbol from being directly bound to.

The following sections explore these actions in greater detail.

Localizing Symbol Instances

Multiply defined symbols of the same name that provide different implementations, should
be isolated to avoid accidental interposition. The simplest way to remove a
symbol from the interfaces that are exported by an object, is to reduce
the symbol to local. Demoting a symbol to local can be achieved
by defining the symbol “static”, or possibly through the use of symbol attributes
provided by the compilers.

A symbol can also be reduced to local by using the link-editor
and a mapfile. The following example shows a mapfile that reduces the
global function error() to a local symbol by using the local scoping
directive.

Although individual symbols can be reduced to locals using explicit mapfile definitions,
defining the entire interface family through symbol versioning is recommended. See Chapter 5, Interfaces and Versioning.

Versioning is a useful technique typically employed to identify the interfaces that
are exported from shared objects. Similarly, dynamic executables can be versioned to
define their exported interfaces. A dynamic executable need only export the interfaces that
must be made available for the dependencies of the object to bind
to. Frequently, the code that you add to a dynamic executable need export
no interfaces.

The removal of exported interfaces from a dynamic executable should take into
account any symbol definitions that have been established by the compiler drivers.
These definitions originate from auxiliary files that the compiler drivers add to the
final link-edit. See Using a Compiler Driver.

The following example mapfile exports a common set of symbol definitions that
a compiler driver might establish, while demoting all other global definitions to
local.

You should determine the symbol definitions that your compiler driver establishes. Any
of these definitions that are used within the dynamic executable should remain
global.

By removing any exported interfaces from a dynamic executable, the executable is
protected from future interposition issues than might occur as the objects dependencies
evolve.

Removing Multiply Defined Symbols of the Same Name

Multiply defined symbols of the same name can be problematic within a
directly bound environment, if the implementation associated with the symbol maintains state.
Data symbols are the typical offenders in this regard, however functions that maintain
state can also be problematic.

In a directly bound environment, multiple instances of the same symbol can
be bound to. Therefore, different binding instances can manipulate different state variables
that were originally intended to be a single instance within a process.

For example, suppose that two shared objects contain the same data item
errval. Suppose also, that two functions action() and inspect(), exist in different shared
objects. These functions expect to write and read the value errval respectively.

With the default search model, one definition of errval would interpose on
the other definition. Both functions action() and inspect() would be bound to the
same instance of errval. Therefore, if an error code was written to
errval by action(), then inspect() could read, and act upon this error condition.

However, suppose the objects containing action() and inspect() were bound to different
dependencies that each defined errval. Within a directly bound environment, these functions are
bound to different definitions of errval. An error code can be written
to one instance of errval by action() while inspect() reads the other,
uninitialized definition of errval. The outcome is that inspect() detects no
error condition to act upon.

Multiple instances of data symbols typically occur when the symbols are declared
in headers.

int bar;

This data declaration results in a data item being produced by each
compilation unit that includes the header. The resulting tentative data item can
result in multiple instances of the symbol being defined in different dynamic objects.

However, by explicitly defining the data item as external, references to the
data item are produced for each compilation unit that includes the header.

extern int bar;

These references can then be resolved to one data instance at runtime.

Occasionally, the interface for a symbol implementation that you want to remove, should
be preserved. Multiple instances of the same interface can be vectored to
one implementation, while preserving any existing interface. This model can be achieved by
creating individual symbol filters by using a FILTERmapfile keyword. This keyword
is described in SYMBOL_SCOPE / SYMBOL_VERSION Directives.

Creating individual symbol filters is useful when dependencies expect to find a
symbol in an object where the implementation for that symbol has been
removed.

For example, suppose the function error() exists in two shared objects, A.so.1
and B.so.1. To remove the symbol duplication, you want to remove the implementation
from A.so.1. However, other dependencies are relying on error() being provided from
A.so.1. The following example shows the definition of error() in A.so.1. A mapfile
is then used to allow the removal of the error() implementation, while leaving
a filter for this symbol that is directed to B.so.1.

The function error() is global, and remains an exported interface of A.so.2.
However, any runtime binding to this symbol is vectored to the filtee
B.so.1. The letter “F” indicates the filter nature of this symbol.

This model of preserving existing interfaces, while vectoring to one implementation has
been used in several Oracle Solaris libraries. For example, a number of
math interfaces that were once defined in libc.so.1 are now vectored to the
preferred implementation of the functions in libm.so.2.

Defining Explicit Interposition

The default search model can result in instances of the same named
symbol interposing on later instances of the same name. Even without any explicit
labelling, interposition still occurs, so that one symbol definition is bound to
from all references. This implicit interposition occurs as a consequence of the symbol
search, not because of any explicit instruction the runtime linker has been
given. This implicit interposition can be circumvented by direct bindings.

Although direct bindings work to resolve a symbol reference directly to an
associated symbol definition, explicit interposition is processed prior to any direct binding search.
Therefore, even within a direct binding environment, interposers can be designed, and
be expected to interpose on any direct binding associations. Interposers can be
explicitly defined using the following techniques.

With the LD_PRELOAD environment variable.

With the link-editors -z interpose option.

With the INTERPOSEmapfile keyword.

As a consequence of a singleton symbol definition.

The interposition facilities of the LD_PRELOAD environment variable, and the -z interpose option,
have been available for some time. See Runtime Interposition. As these objects are
explicitly defined to be interposers, the runtime linker inspects these objects before processing
any direct binding.

Interposition that is established for a shared object applies to all the
interfaces of that dynamic object. This object interposition is established when a
object is loaded using the LD_PRELOAD environment variable. Object interposition is also
established when an object that has been built with the -z interpose option,
is loaded. This object model is important when techniques such as dlsym(3C) with
the special handle RTLD_NEXT are used. An interposing object should always have
a consistent view of the next object.

A dynamic executable has additional flexibility, in that the executable can define individual
interposing symbols using the INTERPOSEmapfile keyword. Because a dynamic executable is
the first object loaded in a process, the executables view of the next
object is always consistent.

The following example shows an application that explicitly wants to interpose on
the exit() function.

The letter “I” indicates the interposing nature of this symbol. Presumably, the
implementation of this exit() function directly references the system function _exit(), or
calls through to the system function exit() using dlsym() with the RTLD_NEXT handle.

At first, you might consider identifying this object using the -z interpose option.
However, this technique is rather heavy weight, because all of the interfaces exported
by the application would act as interposers. A better alternative would be
to localize all of the symbols provided by the application except for
the interposer, together with using the -z interpose option.

However, use of the INTERPOSEmapfile keyword provides greater flexibility. The use of
this keyword allows an application to export several interfaces while selecting those
interfaces that should act as interposers.

Symbols that are assigned the STV_SINGLETON visibility effectively provide a form of interposition.
See Table 12-20. These symbols can be assigned by the compilation system to
an implementation that might become multiply instantiated in a number of objects within
a process. All references to a singleton symbol are bound to the
first occurrence of a singleton symbol within a process.