Next create the following func.c program. In the file main.c we have declared a function func() through keyword ‘extern’ and have defined this function in a separate file func.c

$ vi func.c
void func(void)
{
printf("\n Inside func()\n");
}

Create the object file for func.c as shown below. This will create the file func.o in the current directory.

$ gcc -c func.c

Similarly create the object file for main.c as shown below. This will create the file main.o in the current directory.

$ gcc -c main.c

Now execute the following command to link these two object files to produce a final executable. This will create the file ‘main’ in the current directory.

$ gcc func.o main.o -o main

When you execute this ‘main’ program you’ll see the following output.

$ ./main
Inside main()
Inside func()

From the above output, it is clear that we were able to link the two object files successfully into a final executable.

What did we acheive when we separated function func() from main.c and wrote it in func.c?

The answer is that here it may not have mattered much if we would have written the function func() in the same file too but think of very large programs where we might have thousands of lines of code. A change to one line of code could result in recompilation of the whole source code which is not accceptable in most cases. So, very large programs are sometimes divided into small peices which are finaly linked together to produce the executable.

The make utility which works on makefiles comes into the play in most of these situations because this utility knows which source files have been changed and which object files need to be recompiled. The object files whose corresponding source files have not been altered are linked as it is. This makes the compilation process very easy and manageable.

So, now we understand that when we link the two object files func.o and main.o, the gcc linker is able to resolve the function call to func() and when the final executable main is executed, we see the printf() inside the function func() being executed.

Where did the linker find the definition of the function printf()? Since Linker did not give any error that surely means that linker found the definition of printf(). printf() is a function which is declared in stdio.h and defined as a part of standard ‘C’ shared library (libc.so)

We did not link this shared object file to our program. So, how did this work? Use the ldd tool to find out, which prints the shared libraries required by each program or shared library specified on the command line.

Execute ldd on the ‘main’ executable, which will display the following output.

The above output indicates that the main executable depends on three libraries. The second line in the above output is ‘libc.so.6′ (standard ‘C” library). This is how gcc linker is able to resolve the function call to printf().

The first library is required for making system calls while the third shared library is the one which loads all the other shared libraries required by the executable. This library will be present for every executable which depends on any other shared libraries for its execution.

During linking, the command that is internally used by gcc is very long but from users prespective, we just have to write.

$ gcc <object files> -o <output file name>

CODE RELOCATION

Relocations are entries within a binary that are left to be filled at link time or run time. A typical relocation entry says: Find the value of ‘z’ and put that value into the final executable at offset ‘x’

Create the following reloc.c for this example.

$ vi reloc.c
extern void func(void);
void func1(void)
{
func();
}

In the above reloc.c we declared a function func() whose definition is still not provided, but we are calling that function in func1().

Create an object file reloc.o from reloc.c as shown below.

$ gcc -c reloc.c -o reloc.o

Use readelf utility to see the relocations in this object file as shown below.

The address of func() is not known at the time we make reloc.o so the compiler leaves a relocation of type R_X86_64_PC32. This relocation indirectly says that “fill the address of the function func() in the final executable at offset 000000000005”.

The above relocation was corresponding to the .text section in the object file reloc.o (again one needs to understand the structure of ELF files to understand various sections) so lets disassemble the .text section using objdump utility:

In the 4th line, we can clearly see that the empty address bytes that we saw earlier are now filled with the address of function func().

To conclude, gcc compiler linking is such a vast sea to dive in that it cannot be covered in one article. Still, this article made an attempt to peel off the first layer of linking process to give you an idea about what happens beneath the gcc command that promises to link different object files to produce an executable.

thanks again for another quality article,
it will be great if you mention at the end a few sources or references that you would recommend for the people who want to know more, with a short comment on each.

About The Geek Stuff

My name is Ramesh Natarajan. I will be posting instruction guides, how-to, troubleshooting tips and tricks on Linux, database, hardware, security and web. My focus is to write articles that will either teach you or help you resolve a problem. Read more about Ramesh Natarajan and the blog.