Making Shared Libraries

Shared Libraries allow you to efficently re-use code between
different applications.

_________________ _________________ _________________

1.- The
process of generating a program. Introduction.

The process of generating a program nowadays in development
environments is the fruit of an evolution in the habits and
experiences suffered by programmers and designers.

This process consists of the following steps:

Creation of the source code in a high level language with
a text editor. Very large programs can be hard to handle if
we try to make them fit in a single file. For this reason,
the source code is divided into functional modules, which are
formed by one or more files of source code. The source code
in these modules do not have to be written in the same
language necessarily since some languages appear to be more
appropriate to solve a given task than others.

After creating the files of source code for the program,
they must be translated into segments of code executable by
the machine. This code is usually referred to as object
code. This code performs the same operations as the
source code except that it is in a special language that is
directly executable by the machine. The process of
translating the source code into object code is known as
compilation. A compilation is carried out by units and
a compilation session usually will include (depending on the
compiler) part of the program and in general, only one or a
few files. The compiled object code contains a program, a
subroutine, variables, etc. -- in general, the parts of a
program that have already been translated and that can be
delivered to the next stage.

After all the files with machine code for the program are
generated, one proceeds to put them together or link them
through a process performed by a special utility known as the
linker. In this process, all the references that the
code of a module makes to code belonging to another module
are "resolved" (such as calls to subroutines or references to
variables belonging to or defined in other modules). The
resulting product is a program that normally can be loaded
and run directly.

The execution of a program is performed by a special
piece of software that is essential part of the operating
system, which in the case of Linux is the system call
exec(). This function finds the file, assigns memory
to the process, loads specific parts of the file content
(those containing the code and the initial values of the
variables) and transfers the control to the CPU at a point in
the program 'text' that is usually indicated in the
executable file itself.

2.- Brief
history of the process of program generation.

The process of program generation has suffered a constant
evolution in order to always achieve the most efficient
execution or the best use of system resources.

Initially, programs were written directly in machine code.
Later, it was realized that writing a program in a higher level
language and its subsequent translation to machine code could
be automated due to the systematic nature of the translation.
This increased the productivity of software.

Upon achievement of the compilation of programs (I have
simplified the evolution of compilation, actually this was a
very difficult step to take because it is a very complex
process), the process of program generation consisted of
generating a file with the program source, compiling it and
executing it as a final step.

It was soon noticed, however, that the process of
compilation was very expensive and took too many resources,
including CPU time, and that many functions included by those
programs were used over and over in various programs. Moreover,
when somebody modified part of a program, this meant compiling
the whole source again, including translating once again a
whole bunch of identical code in order to compile the new code
inserted.

That was the reason for introducing the compilation by
modules. This consists of separating to one side the main
program, and to the other side, those functions that are
frequently used over and over, and which were already compiled
and archived in a special place (we will call it the precursor
of a library).

One could then develop programs supported by those functions
without expending additional effort introducing their code
again and again. Even then, the process was complex because
when linking the program it was necessary to join all the
pieces and these had to be identified by the programmer (this
added the additional cost of perhaps using a known function
that uses/needs other unknown functions)

3.- What is a
Library?

The above problems lead to the creation of Libraries. This is
nothing but a special type of file (more precisely an archive,
type tar(1) or cpio(1)) with the peculiar
characteristic that the linker understands its format and when
we specified a library archive, THE LINKER SELECTS ONLY THOSE
MODULES THAT THE PROGRAM NEEDS, and excludes everything else. A
new advantage came into game. Now we could develop programs
that used large libraries of functions and the programmer did
not have to know all the dependencies of the functions in the
library.

The library such as we have discussed so far, has not
evolved much more that this. It has only acquired a new special
file, that often appears at the beginning of the archive and
that contains a description of the modules and the identifiers
that the linker has to resolve without having to read the whole
library (and thus removing the need to read the library several
times). This process (adding the table of symbols to the
library archive) is performed under Linux by the command
ranlib(1). The libraries described thus far are known as STATIC
LIBRARIES.

An advancement occurred after the introduction of the first
multitasking systems: the sharing of code. If, in the
same system, two copies of the same code were launched, it
appeared interesting that two processes could share code
because normally a program does not modify its own code. This
idea eliminates the need for having multiple copies in memory
which saves large amounts of memory on huge multi-user
systems.

Taking this last innovation one step further, someone (I do
not know who he/she was but the idea was great ;-) thought that
quite often many programs used the same library, but being
different programs, the portion of the library used by a
program did not have to be the same as the portion used in
other program. Moreover the main code was not the same (they
were different programs), so their text were not shared. Well,
our person had thought that if different programs using the
same library could share the code of such a library we could
save some memory. Now different programs share the library
code, without having identical program text.

However, now the process is more complex. The executable
program is not fully linked, but the referencing to identifiers
of the library are postponed for the process of program
loading. The linker (in the case of Linux is ld(1)) recognizes
that it is dealing with a shared library and does not include
its code in the program. The system itself, the kernel, when
executing exec() recognizes that it is launching a
program using shared libraries and it runs a special code for
loading the shared libraries (assigning shared memory for its
text, assigning private memory for the values of the library,
etc.). This process is performed now when loading an executable
and the whole procedure is much more complex.

Off course, when the linker is faced with a normal library
it continues to behave as before.

The shared library is not an archive of files containing
object code, but more like a file containing object code by
itself. During linking of a program with a shared library, the
linker does not inquire inside the library for which modules
must be added to the program and which not It only makes sure
that the unresolved references get resolved and detects which
must be added to the list by the inclusion of the library. One
could make an archive ar(1) library of all the shared
libraries, but this is not often done because a shared library
is often the result of linking various modules so that the
library is necessary later, during run-time. Perhaps, the name
shared library is not the most appropriate and it would be more
clear to call it shared object (nevertheless, we will not use
this other term in order to be understood).

4.- Types of
Libraries.

As we already mentioned, under Linux there are two types of
libraries: static and shared. The static libraries are
collections of modules included in an archive with the utility
ar(1) and indexed with the utility ranlib(1). These modules are
often stored in a file whose name terminates in .a by
convention (I will not use the term extension because under
Linux the concept of extension of a file does not apply). The
linker recognizes the termination .a in a file and starts the
search for the modules as if it was a static library, selecting
and adding to the program those modules that resolve the
unresolved references.

The shared libraries, by contrast, are not archives but
reallocable objects, marked by a special code (that identifies
them as shared libraries). The linker ld(1), as mentioned, does
not add the modules to the program code, but selects the
identifiers provided by the library as resolved, adds those
needed by the library itself and continues without adding any
code, pretending the code in question has been added already to
the main code. The linker ld(1) recognized a shared library by
having the termination .so (not .so.xxx.yyy, and we will come
back to this point).

5.- Linking
Operation under Linux.

Every program consists of object modules linked to form an
executable. This operation is performed by ld(1), which is the
Linux linker.

ld(1) supports several options that modifies its
behavior, but we will restrict ourselves here to those options
related with the use of libraries in general. ld(1) is
not invoked directly by the user but by the compiler itself
gcc(1) in its final stage. A superficial knowledge
about its modus operandis helps will help us understand
the use of libraries under Linux.

ld(1) requires for its proper functioning the list
of objects that are going to be linked to the program. These
objects can be given and called in any order(*) as long as we
follow the previous convention, as mentioned, that a shared
library is indicated by a termination .so (not .so.xx.yy) and a
static library by .a (and of course, simple object files are
those whose names terminate in .o).

(*) This is not completely true. ld(1) includes only
those modules that resolve the references at the moment of
including the library, then there could still be a reference
originated by a module included later that, since it does not
appear yet in the moment of including this library, can cause
the order of inclusion of the libraries to be
significant.

On the other hand, ld(1) allows the inclusion of
standard libraries thanks to the options -l and -L.

But... What do we understand by a standard library, what is
the difference? None. Only that ld(1) searches for the
standard libraries in predetermined locations while those
appearing as object in the list of parameters are searched
using their filename.

The libraries are searched by default in the directories
/lib and /usr/lib (although I have heard that
according to the version/ implementation of ld(1)
there could be additional places). -L allows us to add
directories to those used for the normal search of libraries.
It is used by writing one -Ldirectory for
each directory we want to add. The standard libraries
are specified with the option -l Name (where
Name specifies the library to be loaded) and
ld(1) will search, in order, in the corresponding
directories, a filename libName.so. If not found it will try
for libName.a., its static version

If ld(1) finds a libName.so file, it links it as if
it was a shared library, while if it finds a file libName.a, it
will link the modules obtained from this if they resolved any
of the unresolved references.

6.- Dynamic
Linking and Loading Shared Libraries

The dynamic linking is performed at the moment of loading the
executable by a special module (in fact, this special module is
a shared library itself), called /lib/ld-linux.so

Actually the are two modules for linking with dynamic
libraries: /lib/ld.so (for libraries using the old
a.out format) and /lib/ld-linux.so (for libraries
using the new ELF format).

These modules are special, in that they must be loaded each
time a program is linked dynamically. Their names are standard
( the reason they are not to be moved from the directory
/lib, nor are their names to be modified). If we
changed the name of /etc/ld-linux.so, we would
automatically halt the use of any program using shared
libraries because this module takes charge of resolving all the
references not yet resolved at run-time.

The last module is helped by the existence of a file,
/etc/ld.so.cache, who indicates, for every library,
the most appropriate executable file that contains the library.
We will return to this issue later.

7.- soname.
Versions of Shared Libraries. Compatibility.

We now enter the most treacherous issue related to shared
libraries: their versions

A message often received is 'library libX11.so.3
not found,' leaving us with the frustration of having the
library libX11.so.6 and incapable of doing anything.
How is it possible that ld.so(8) recognizes as
interchangeable the libraries libpepe.so.45.0.1 and
libpepe.so.45.22.3 and does not recognize
libpepe.so.46.22.3?

Under Linux (and all the operating systems implementing the
ELF format) the libraries are identified by a sequence of
characters that distinguish them: the soname.

The soname is included inside the library itself and the
sequence is determined when linking the objects forming the
library. When the shared library is created, one has to pass to
ld(1) an option (-soname <name_of_the_library>),
to give a value to this character string.

This sequence of characters is used by the dynamic loader to
identify the shared library that must be loaded and to identify
the executable. The process is like this:Ld-linux.so detects that the program requires a
library and determines its soname. Then comes
/etc/ld.so.cache with such a name and obtains the name
of the file containing it. Next it compares the soname
requested with the name existing in the library and if they are
identical that's it! If they are not, it will continue
searching until it finds it or if it cannot, it reports an
error.

The soname can detect if a library is the appropriate one to
be loaded because ld-linux.so makes sure that the
soname requested coincides with the file requested. In case of
disagreement we obtain the famous 'libXXX.so.Y' not
found. What it is looking for is the soname and the error
given refers to the soname.

This can cause a lot of confusion when we change the name of
a library and the problem persists. But it is not a good idea
to access the soname and change it because there is a
convention in the Linux community for assigning soname:

The soname of a library, by convention, must identify the
appropriate library and the INTERFACE of such library. If we
perform modifications of a library that only affect their
internal functioning, but the whole interface is intact (number
of functions, variables, parameters of the functions) then the
two libraries will be interchangeable and in general, we will
say that the modifications introduced are minor (both libraries
are compatible and we can replace one for the other. When this
happens the minor number is often modified (which does not
appear in the soname) and the library can be replaced without
mayor problems.

However, when we add functions, remove functions, and in
general, MODIFY THE INTERFACE of the library, then is not
possible to maintain that the library as interchangeable with
the previous one (for example substituting libX11.so.3
with libX11.so.6 is part of the upgrade from X11R5 to
X11R6 which defines new functions and therefore modifies the
interface). The change from X11R6-v3.1.2 to X11R6-v3.1.3
probably will not include changes in the interface and the
library will have the same soname--although in order to
preserve the old one we must give it a different name (for this
reason the version number appears complete in the name of the
library while only the major number shows in the soname).

8.-
ldconfig(8)

As we mentioned earlier /etc/ld.so.cache allows
tt>ld-linux.so to convert the soname of the file contained
in the library. This is a binary file for more efficiency and
is created with the utility ldconfig(8). ldconfig(8) generates for each dynamic library found
in the directories specified by /etc/ld.so.conf a
symbolic link called by the soname of the library. It does this
such that when ld.so is going to obtain the name of the file,
what it really does is to select in the directory list a file
with the soname sought, and in this fashion there is no need to
execute ldconfig(8) each time we add a library. We run
ldconfig only when we add a directory to the list.

9.- I Want to
Make a Dynamic Library.

Before making a dynamic library we must consider if it is
really going to be useful. The dynamic libraries cause an
overload in the system due to several reasons:

The loading of a program is performed in several stages;
one for loading the main program, and another for each
dynamic library that the program uses (we will see that this
is for appropriate the dynamic library, as this last item
ceases to be inconvenient and starts to be an
advantage).

The dynamic libraries must contain rellocable code, since
the address allocated within the space of virtual addresses
for the process will not be known until loading time. The
compiler is then forced to reserved a register to maintain
the loading position of the library and as a result we have
one register less for the optimization of code. This case is
a minor ill since the overload introduced by this case does
not represent more than 5% of an overload in most cases.

For a dynamic library to be appropriate it must be used most of
the time by some program (this avoids the problem of reloading
the text of the library after the death of the process that
started it. While other processes are still using modules of
the library it remains in memory).

The shared library is fully loaded in memory (not only the
modules needed) therefore, to be useful, it must be useful in
its totality. The worse example of a dynamic library is where
only a function is used and 90% of the library is hardly ever
used.

A good example of dynamic library is the C standard library
(it is used by all the programs written in C). On average all
the functions are used here and there.

In a static library it is unnecessary to include functions
whose usage is infrequent; as long as those functions are
contained in their own module, they will not be linked in to
those programs that do not required them.

9.1.-
Compiling the Sources

The compilation of the sources is carried out in the same
fashion as in the case of a normal source, except for that we
will use the option '-f PIC' (Position Independent Code) to
generate code that can be loaded in different positions within
the space of virtual addresses of a process.

This step is fundamental because in a statically linked
program, the position of the library objects are resolved at
link-time, therefore at a fixed time. In the old a.out
executables, it was impossible to performed this step,
resulting in each shared library getting placed at a fixed
position in the space of virtual addresses. As a consequence,
there were conflicts anytime a program wanted to use two
libraries and loaded them in overlapping regions of virtual
memory. This meant you were forced to maintain a list, where
whenever someone wanted to make a library dynamic, one would
declare the range of addresses used so that nobody else would
use it.

Well, as we mentioned, registering a dynamic library in an
official list is not necessary because when the library is
loaded, it goes to positions determined at that given instant,
despite that fact that the code must be rellocable.

9.2.- Linking
Objects in the Library

After compiling all the objects, it is necessary to link them
with a special option which generates an object which is
dynamically loadable.

gcc -shared -o libName.so.xxx.yyy.zzz
-Wl,-soname,libName.so.xxx

As the reader can appreciate, it looks like a normal link
operation, except for the introduction of a series of options
that will lead to the generation of a shared library. Let us
explain them one by one:

-shared.
This phrase tells the linker that at the end it must generate
a shared library, and therefore there will be a type of
executable in the output file corresponding to the
library.

-o libName.so.xxx.yyy.zzz.
Is the name of the final file. It is not necessary to follow
the name convention, but if we want this library to be
standard for future developments, it is convenient to follow
it.

-Wl,-soname,libName.so.xxx.
The option -Wl tells gcc(1) that the next
options (separated by comma) are for the linker. This is
the mechanism used by gcc(1) to pass options to
ld(1). Above we are passing the following options
to the linker:

-soname libName.so.xxx

This option fixes the soname of the library so that it can
only be invoked by those programs that require a library
with the soname specified.

9.3.-
Installing the Library

Well we already have the corresponding executable. Now we must
install it in the appropriate place in order to be able to use
it.

To compile a program that requires our new library, one
would use the following line:

gcc -o program libName.so.xxx.yyy.zzz

or if the library has been installed in the appropriate place
(/usr/lib), it would be sufficient with:

gcc -o program -lName

(were the library in /usr/local/lib instead then it would have
been sufficient to add the option '-L/usr/local/lib'). To
install the library, do the following:

Copy the library to the directory /lib or
/usr/lib. If you decide to copy it to a different
location (for example /usr/local/lib), you cannot be
certainty that the linker ld(1) will find it
automatically when linking programs.

Execute ldconfig(1) to make the symbolic link
from libName.so.xxx.yyy.zzz to
libName.so.xxx. This step will tell us if we have
completed all the previous steps correctly and the library is
recognize as a dynamic library. The way programs get linked
is not effected by this step, only the loading of the
libraries at run-time are effected.

Make a symbolic link from libName.so.xxx.yyy.zzz
(or from libName.so.xxx, the soname) to
libName.so, in order to allow the linker to find the
library with the -l option. For this mechanism to work it is
necessary that the name of the library fits the pattern
libName.so

10.- Making a
static library

If on the other hand, one would like to make a static library
(or two versions are needed to be able to offer statically
linked copies) then one has to proceed as follows:

Note: The linker, in its search for libraries, looks
first a file called libName.so, followed by
libName.a. If we call the two libraries (the static
and dynamic versions) by the same name, it will not be possible
to determine, in general, which of the two will get linked in
each case (the dynamic always gets linked first when it is
found first).

For this reason it is always recommended that if the two
versions of the same library are needed, the static one be
named as libName_s.a, while the dynamic is named
libName.so. When linking, therefore, one would do:

gcc -o program -lName_s

to link with the static version, while in the case of the
dynamic one:

gcc -o program -lName

10.1.-
Compiling the Sources

To compile the sources, we will not take any special measures.
In the same way as the positions of the objects are decided
during the linking step, it is not necessary to compile with
the -f PIC option (although it is possible to continue using
it).

10.2.- Linking
the Objects in the Library

In the case of static libraries, there is no linking step. All
the objects are archived in the library file by the command
ar(1). Next, in order to resolve the symbols quickly,
one is advised to execute the command ranlib(1) on the
library. Although it is not necessary, not executing this
command may unlink modules in the executable because when the
module get processed by the linker during library construction
not all the indirect dependencies between modules are resolved
immediately: say the module is required by another module later
on in the archive, which leads to having to pass several times
through the same library until all the references get
resolved).

10.3.-
Installing the Library

The static libraries will be named the format libName.a if one
is only interested on maintaining a static library. In the case
of two types of libraries, I would recommend naming them
libName_s.a, so that it will be easier to control
whether to load a static or dynamic library.

The process of linking allows the introduction of the option
-static. This option controls the loading of the module
/lib/ld-linux.so, and does not affect the search order
of the libraries, so that if one writes -static and
ld(1) finds a dynamic library, it will continue to
work with it (instead of continuing looking for its static
counterpart). This leads to errors at run-time due to the
invocation of routines in a library that do not belong to the
executable -- the module for automatic dynamic loading is not
linked and therefore this process can not be carried out.

11.- Static
versus Dynamic Linking

Let us supposed we wish to distribute a program that makes use
of a library which we are authorized to distribute only if
included statically in a program and not in any other form (A
example of this case are the applications developed with
Motif).

To produce this kind of software there are two options. The
first is making an executable statically linked (using only .a
libraries and avoiding the use of the dynamic loader). These
kinds of programs are loaded only once and do not require
having any library installed in the system (not even
/lib/ld-linux.so). However they have the disadvantage
of a carrying all the software necessary within the binary file
and therefore they are usually huge files. The second option is
to make a dynamically linked program, meaning that the
environment where our application will run should provide all
the corresponding dynamic libraries. The executable may be very
small although some times it is not possible to have all the
libraries available (for example, there are people who do not
have Motif).

There is a third option, a mixed distribution, in which some
libraries are linked dynamically and others statically. In this
case, logically one would choose the conflictive library in its
static form and all the others in their dynamic form. This
option is a very convenient form for software distribution.

For example, one could compile three different versions of a
program as follows:

In the third case, only the Motif library Motif
(-lXm_s) gets linked statically, and all the others
are linked dynamically. The environment where the program runs
must provide the appropriate versions of the libraries
libm.so.xx libXt.so.xx libX11.so.xx libXmu.so.xx y
libXpm.so.xxFor more information: