Nomenclature

GNU is a computer operating system composed entirely of free software. Its name is a recursive acronym for GNU's Not Unix. The idea is that GNU is Unix-like but, unlike Unix, it is free and has no Unix code. The development of GNU was initiated by Richard Stallman and was the original focus of the Free Software Foundation (FSF). Because GNU's official kernel, GNU Hurd, is still incomplete, not all GNU components run on it. Instead, they run on third-party Linux kernels and some have been ported to other operating systems, such as Microsoft Windows, BSD Variants, Solaris, and Mac OS. Therefore GNU came to be known as a family of free software products, often ported to different operating system, rather than an operating system per se.

GCC stands for the GNU Compiler Collection, which includes front ends (see below) for C, C++, Objective-C, Fortran, Java, and Ada, as well as libraries for these languages (libstdc++, libgcj, etc.). However, GCC originally stood for GNU C Compiler and it (along with the GNU C++ compiler) remain the most frequently used GNU compilers. Many people still mean GNU C Compiler when they say GCC.

Front end is the part of a compiler that is specific to a particular language, such as C or C++.

Back end is the part of the compiler that is shared across several languages.

The GNU Compiler Collection's language-independent component (which is also, confusingly, sometimes referred to as GCC!) is shared among the compilers for all supported languages. The language-independent component of GCC includes the majority of the optimizers, as well as the back ends that generate machine code for various processors.

gcc, which is often /usr/bin/gcc, is the command that executes the GNU C Compiler.

g++, which is often /usr/bin/g++, is the command that executes the GNU C++ Compiler. We shall be using the g++ command a lot in the sequel.

In practice, g++ is simply a script that passes a certain set of command line arguments to gcc, so g++ uses gcc internally. It used to be a bash script in older versions of GCC. Now it's a binary executable, but it still does the same thing (as explained by andres9606there).

cc is the Sun C Compiler, not part of the GNU Compiler Collection. It is part of the Sun Studio.

CC is the Sun C++ Compiler, not part of the GNU Compiler Collection. It is part of the Sun Studio.

In order to understand static libraries, dynamic libraries and linking properly, we write a simple C++ program that uses a third-party library, log4cxx. The source file, proto-log4cxx.cpp, looks as follows:

#include <log4cxx/logger.h>

#include <log4cxx/basicconfigurator.h>

usingnamespace log4cxx;

LoggerPtr logger(Logger::getRootLogger());

int main(int argc, char* argv[])

{

BasicConfigurator::configure();

LOG4CXX_INFO(logger, "Hello World!");

}

Compiling

Let's try and compile this to an executable, proto-log4cxx (without an extension, this is specified using -o below):

$ g++ proto-log4cxx.cpp -o proto-log4cxx

We get error messages like the ones below:

proto-log4cxx.cpp:1:28: log4cxx/logger.h: No such file or directory
proto-log4cxx.cpp:2:39: log4cxx/basicconfigurator.h: No such file or directory
proto-log4cxx.cpp:4: namespace `log4cxx' undeclared
proto-log4cxx.cpp:6: `Logger' was not declared in this scope
proto-log4cxx.cpp:6: syntax error before `::' token
proto-log4cxx.cpp: In function `int main(int, char**)':
proto-log4cxx.cpp:10: `BasicConfigurator' undeclared (first use this function)
proto-log4cxx.cpp:10: (Each undeclared identifier is reported only once for
each function it appears in.)
proto-log4cxx.cpp:10: syntax error before `::' token
proto-log4cxx.cpp:12: `logger' undeclared (first use this function)
proto-log4cxx.cpp:12: `LOG4CXX_INFO' undeclared (first use this function)

Why? In addition to the library files per se we need access to the associated header files, which are included with

#include <log4cxx/logger.h>

#include <log4cxx/basicconfigurator.h>

We shall have to tell the g++ compiler where they are located. They happen to be located in the directory ~/dev/apache-log4cxx-0.10.0/src/main/include, so we try the following:

This is because by default GCC will attempt to link the program as well as compile it. In order to link it, it needs access to the library files used by the program so they can be linked in.

We could tell it to skip the linking. In this case it will produce an object file, proto-log4cxx.o, which can then be linked separately. It won't produce the executable. The option -c (compile) tells it to do just that:

But we do want to link this program and produce an executable. So in addition to telling the compiler about the header files, we need to make it link in the libraries that the code is using.

Linking

Now we need to decide, shall we use static linking or dynamic linking? Let's compare the two!

The required code from the libraries that are statically linked into our project will be physically included into the executable at compile time. This will create a bigger size executable (because of all the library code included in it), but this executable will be standalone.

The code from the libraries that are linked in dynamically will not be included into the executable, which means that the size of the executable will be smaller. However, whoever runs this executable will have to ensure that the so-called shared objects, *.so, are available at runtime (this is similar to DLLs on Windows systems).

In general, static linking is much slower than dynamic linking. This may be a factor in big projects which take a long time to build.

It is possible to use a mixture of static and dynamic linking.

Static linking

In order to use static linking, we need to specify the file to the linker:

proto-log4cxx.cpp — specify the source file. If our program were more complex, we could list several source files here separated by spaces.

-o proto-log4cxx — place the output in this file. If the compilation and linking succeed, an executable named proto-log4cxx will be created (no file extension).

-I ~/dev/apache-log4cxx-0.10.0/src/main/include — tell the preprocessor to look for header files here. These header files are ~/dev/apache-log4cxx-0.10.0/src/main/include/log4cxx/logger.h and ~/dev/apache-log4cxx-0.10.0/src/main/include/log4cxx/basicconfigurator.h. We can refer to them simply as log4cxx/logger.h and log4cxx/basicconfigurator.h because we specified this here.

-L ~/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs — tell the linker to look for *.a libraries in ~/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs. In fact liblog4cxx.a is located in this directory. We shall later tell the linker to link this library in using the -l option.

-L ~/dev/apr-util-1.3.9/.libs — tell the linker to look for *.a libraries in ~/dev/apr-util-1.3.9/.libs. In fact libaprutil-1.a is located in this directory. We shall later tell the linker to link this library in using the -l option.

-L ~/dev/apr-util-1.3.9/xml/expat/lib/.libs — tell the linker to look for *.a libraries in ~/dev/apr-util-1.3.9/xml/expat/lib/.libs. In fact libexpat.a is located in this directory. We shall later tell the linker to link this library in using the -l option.

-L ~/dev/apr-1.3.8/.libs — tell the linker to look for *.a libraries in ~/dev/apr-1.3.8/.libs. In fact libapr-1.a is located in this directory. We shall later tell the linker to link this library in using the -l option.

-Wl,-Bstatic — -Wl, passes an option to the linker. -Bstatic tells the linker to link in the libraries that follow this option statically rather than dynamically.

-llog4cxx — tell the linker to link in the liblog4cxx.a library. Note that lib is assumed, so we specify -llog4cxx, not -lliblog4cxx (which wouldn't work!). The file extension is also assumed.

-laprutil-1 — tell the linker to link in the libaprutil-1.a library.

-lexpat — tell the linker to link in the libexpat.a library.

-lapr-1 — tell the linker to link in the libapr-1.a library.

-Wl,-Bdynamic — -Wl, passes an option to the linker. -Bdynamic tells the linker to link in the libraries that follow this option dynamically rather than statically. (In fact we won't list any, but it seems right to revert to this default.)

-pthread — this option is similar to -lpthread but stronger. It does link in a library, the POSIX threads library for multithreading. However, it sets some flags for both the preprocessor and linker in addition to telling the linker to link this library in. The POSIX threads library is required by some of the code in the other libraries that we are linking in.

You may ask, does the order in which you specify the -L options matter? No. Because each of these simply tells the linker "also consider this directory when looking for lib*.a files (if using static linking; or *.so files if using dynamic linking)".

What about the order of the -l options? Here the answer is yes. Try rearranging

The code from these shared objects will not be incorporated into the executable but they will have to be available at runtime — when we run our executable or when our end user runs it. Therefore these shared objects have to be distributed to the end user.

This command line seems to be much shorter than the one we used for static linking. Let's go through it:

g++ — call g++.

proto-log4cxx.cpp — specify the source file. If our program were more complex, we could list several source files here separated by spaces.

-o proto-log4cxx — place the output in this file. If the compilation and linking succeed, an executable named proto-log4cxx will be created (no file extension).

-I ~/dev/apache-log4cxx-0.10.0/src/main/include — tell the preprocessor to look for header files here. These header files are ~/dev/apache-log4cxx-0.10.0/src/main/include/log4cxx/logger.h and ~/dev/apache-log4cxx-0.10.0/src/main/include/log4cxx/basicconfigurator.h. We can refer to them simply as log4cxx/logger.h and log4cxx/basicconfigurator.h because we specified this here.

-L ~/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs — — tell the linker to look for *.so libraries (shared objects) in ~/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs. In fact liblog4cxx.so is located in this directory. We shall later tell the linker to link this library in using the -l option.

-Wl,-Bdynamic — -Wl, passes an option to the linker. -Bdynamic tells the linker to link in the libraries that follow this option dynamically rather than statically.

-llog4cxx — tell the linker to link to the liblog4cxx.so library. Note that lib is assumed, so we specify -llog4cxx, not -lliblog4cxx (which wouldn't work!). The file extension is also assumed.

The command line is so much shorter because we don't need to bother about libaprutil-1, libexpat, and libapr-1 any longer. Why? Because our executable depends on liblog4cxx directly; it depends on libaprutil-1, libexpat, and libapr-1 indirectly (because liblog4cxx depends on them). When we were using static linking we had to include the code from the three indirect dependencies into our executable, so we had to tell the linker about them. Now we are using dynamic linking and all we need to know about is liblog4cxx, so we know how to call its functions. Its code will not be incorporated into the executable. The dependencies of liblog4cxx are in turn referenced from liblog4cxx.so but not from our executable. This will become clearer shortly.

This time the linking is almost instantaneous. The size of the resulting executable is only 11,527 bytes.

We have already mentioned that the dynamically linked shared object files must be available at runtime. Looks like the system (or, more specifically, the so-called runtime linker, ld.so, which is responsible for linking in the *.so shared objects) does not know where to find liblog4cxx.so.10.

Note that it is looking for liblog4cxx.so.10 rather than liblog4cxx.so this is because liblog4cxx.so is really a symbolic link to liblog4cxx.so.10 (a technicality that we previously omitted to mention). This is common for shared object files.

We can examine the dynamically linked dependencies of proto-log4cxx using ldd, a utility that prints shared library dependencies:

It depends on a number of standard libraries, like libstdc++.so.5, which has been located as /usr/lib/libstdc++.so.5, and liblog4cxx.so.10 which could not be found.

How can we fix this?

It turns out that there is more than one way. We could use an environment variable, LD_LIBRARY_PATH, which is very similar to PATH, only it specifies the paths to shared object files rather than executables. Let's add /home/paul/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs to LD_LIBRARY_PATH:

The answer to our question is now clear. The path to the shared object files in question has been incorporated into the shared object file liblog4cxx.so.10 by the linker; it appears on the RPATH in the so-called dynamic segment of the file.

RPATH

Let us perform an experiment. Let us restore LD_LIBRARY_PATH (remove /home/paul/dev/apache-log4cxx-0.10.0/src/main/cpp/.libs that we have added), or even make it blank:

-Xlinker -z -Xlinker origin — -Xlinker is similar to -Wl in that it passes an option to the linker. It needs to be added twice, once for the option and once for the argument, as in this case. So we are really passing -z origin to the linker. This tells the linker to mark the executable as requiring the immediate $ORIGIN processing at runtime. Thus we have enabled the use of $ORIGIN which we shall use right now...

-Xlinker -rpath -Xlinker '$ORIGIN:$ORIGIN/lib/third-party' — passes the option -rpath '$ORIGIN:$ORIGIN/lib/third-party' to the linker. This sets the RPATH value to $ORIGIN:$ORIGIN/lib/third-party. At runtime, $ORIGIN is replaced with the path to the directory that contains the executable.

Thus we have set the RPATH to search for shared object files in the executable's directory as well as in lib/third-party under the executable's directory. (We are not going to use lib/third-party; we have added it for the purposes of illustration.)

Let's check that the resulting executable does indeed have its RPATH properly set:

ldconfig

Finally, there is yet another way to make your shared objects found at runtime. The runtime linker, ld.so which is invoked when you launch the executable, examines the links and cache created by ldconfig of the most recent shared libraries found in the directories specified in the file /etc/ld.so.conf and in the trusted directories /usr/lib and /lib.

Thus you could move the shared object to /usr/lib and run ldconfig to make sure that it gets picked up, then run the executable.

Alternatively, you could add the directory that contains the shared object to /etc/ld.so.conf and then run ldconfig it should still be picked up.

It is instructive to run

$ ldconfig -v

which will print the current version number, the name of each directory as the bindings are scanned and any links that are created.

Whichever method you decide to use to specify the location of your shared objects at runtime depends on the configuration of your application, various administrative and infrastructure configurations, etc.

We (subjectively) find the RPATH approach the most intuitive.

GCC files

So far we have encountered several important file extensions: *.a and *.so. Let us review these and other file extensions on Linux one should be aware of:

*.o: an object file. According to Wikipedia, "an object file is an organised collection of named objects, and typically these objects are sequences of computer instructions in a machine code format, which may be directly executed by a compiter's CPU. Object files are typically produced by a compiler as a result of processing a source code file. Object files contain compact code, and are often called binaries. A linker is typically used to generate an executable or library by amalgamating parts of object files together".

*.a: statically linked library. In practice, these are merely archive files (created using the ar command) of object files (*.o). Thus linking them in statically is similar to linking in the object files contained in them individually.

*.la: GNU libtool library file. GNU libtool is a generic library support script. It aims to hide the complexity of using shared libraries behind a consistent interface. The *.la files contain the information required for libtool to ease the linking process during the compilation; they contain library names, location and dependent libraries during linking. We do not find these files particularly useful and won't discuss them further.

*.so: the so-called shared objects, or shared object files. These are meant to be linked in dynamically at runtime using the dynamic linker/loader, ld.so. Unlike the regular *.o files these contain the dynamic segment with some information used by the dynamic linker/loader.

The executable files (usually without extensions), as well as *.o and *.so (sometimes the extension *.elf is used, although it is rarely seen nowadays) share the same file format: the Executable and Linking Format (ELF), formerly called Extensible Linking Format. These files can be examined using the readelf utility: