I edited this question to be more about what linkers and loaders are than about other places to learn about them, since this site is the right place for that sort of thing. If there are good resources out there, let them emerge naturally through answers instead of just trying to build a list of external resources.
–
Anna Lear♦Aug 25 '11 at 19:20

Not sure why I got down vote - I thought this question might help other newbies too who wanted to know this - along with me geting some guidance . is this a non progrmming question ?
–
NishantAug 25 '11 at 19:38

1

I could see the down votes given this question is rather broad and doesn't show a lot of work done to figure out the answer on one's own.
–
JB KingAug 25 '11 at 20:40

5 Answers
5

The exact relationship varies somewhat. To start with, I'll consider (nearly) the simplest possible model, used by something like MS-DOS, where an executable will always be statically linked. For the sake of example, let's consider the canonical "Hello, World!" program, which we'll assume is written in C.

The compiler will compile this into a couple of pieces. It'll take the string literal "Hello, World!", and put it into one section marked as constant data, and it'll synthesize a name for that particular string (e.g., "$L1"). It'll compile the call to printf into another section that's marked as code. In this case, it'll say the name is main (or, frequently, _main). It'll also have something to say this chunk of code is N bytes long, and (importantly) contains a call to printf at offset M in that code.

Once the compiler is done producing that, the linker will run. It's normally considered part of the development tool chain (though there are exceptions -- MS-DOS used to include a linker, though it was rarely if ever used). Although it's not normally externally visible, it will normally be passed some command-line arguments, one specifying an object file containing some startup code, and another specifying whatever file contains the C standard library.

The linker will then look at the object file containing the startup code and find that it is, say, 1112 bytes long, and has a call to _main at offset 784 in that.

Based on that, it'll start to build a symbol table. It'll have one entry saying ".startup" (or whatever name) is 1112 bytes long, and (so far) nothing refers to that name. It'll have another entry saying "printf" is a current unknown length, but it's referred to from ".startup+784".

It'll then scan through the specified library (or libraries) to try to find definitions of the names in the symbol table that aren't currently defined -- in this case printf. It'll find the object file for printf saying that it's 4087 bytes long, and has references to other routines to do things like converting an int to a string, as well as things like putchar (or maybe fputc) to write the resulting string to the output file.

The linker will re-scan to try to find definitions of those symbols, recursively, until it reaches one of two conclusions: it's either found definitions of all the symbols, or else there's a symbol for which it can't find a definition.

If it's found a reference but no definition, it'll stop and give an error message typically saying something about an "undefined external XXX", and it'll be up to you to figure out what other library or object file you need to link.

If it finds definitions of all the symbols, it moves on to the next phase: it walks through the list of places that refer to each symbol, and it'll fill in the address where that symbol got put into memory, so (for example) where the startup code calls main, it'll fill in the address 1112 as the address of main. Once it's done all that, it'll write all the code and data out to an executable file.

There are a few other minor details that probably bear mentioning: it'll typically keep the code and data separate, and after each is complete, it'll put them all together at (more or less) consecutive addresses (e.g., all the pieces of code, then all the pieces of data). There will typically also be some rules about how to combine definitions for section/segments -- for example, if different object files all have code segments, it'll just arrange the pieces of code one after another. If two or more identical string literals (or other constants) are defined, it'll typically merge those together so all of them refer to the same place. There are also a few rules for what to do when/if it finds duplicate definitions of the same symbol. In a typical case, this will simply be an error. In a few cases, it'll have things like "weak external" symbols, that basically say: "I"m providing a definition of this symbol, but if somebody else also defines it, don't consider it an error -- just use that definition instead of this one.

Once it has entries for all the symbols, the linker has to arrange the "pieces" and assign addresses to them. The order in which it arranges the pieces will vary somewhat -- it'll typically have some flags about the types of different pieces, so (for example) all the constant data ends up next to each other, all the pieces of code next to each other and so on. In our simple MS-DOS-like system, most of this won't matter a whole lot though.

That brings us to the next phase: the loader. the loader is typically part of the operating system, which loads the executable. It'll pick a base address for the executable, and based on the entries put there by the linker, it'll "fix up" any absolute references in the executable to refer to the correct address. For example, where our startup code referred to main at address 1112, but the executable is being loaded at a base address of (say) 4000, it'll fix that address up to refer to 5112. In this simple of a system, however, the loader is a pretty simple piece of code -- basically just walking through the list of relocations, and adding the base address to each.

Now let's consider a bit more modern OS that supports something like shared object files or DLLs. This basically shifts some of the work from the linker to the loader. In particular, for a symbol that's defined in a .so/DLL, the linker will not attempt to assign an address itself.

Instead it'll create a symbol table entry that basically says "defined in .so/DLL file XXX". When the linker writes the executable, most of these symbol table entries will basically just get copied to the executable, saying "symbol XXX is defined in file YYY". It's then up to the loader to find file YYY, and the address of symbol XXX in that file, and fill in the correct address wherever it's used in the executable. Much like in the linker this will be recursive, so DLL A may refer to symbols in DLL B, which may refer to DLL C, and so on. Although the chain from executable to all the definitions may be long, the basic idea of the process is fairly simple -- scan through the list of external references, and find a definition for each. Also note that in most cases, it will be able to share a single executable across many processes, so the OS will normally have a list of loaded modules, and when/if it gets to a module that's already loaded, it'll just fill in entries for that, and be done rather than re-loading it from the beginning.

Again, there are some miscellaneous bits and pieces to consider. For example, the sharing will normally only happen on a section-by-section basis, not file-by-file. If a file has some code and some (non-constant) data, for example, all processes will share the same code sections, but each will get its own copy of the data.

Linkers are a part of compiler theory. When you compile a project made up of more than one module (source code file), it's common for the compiler to output a single intermediary file for each source module. This has several benefits, one of which is that if you only make changes to one file and then have to recompile, you don't have to rebuild the entire project when you've only made one local change.

But this means that if you have code in one module that calls a function in a different module, the compiler can't generate a CALL instruction to it, because it doesn't have the location of that other function. It's in a different intermediary file, and the exact location of the function can change if you make a local change to that intermediary's source file and recompile it. So instead, it inserts an "external reference token" (exactly what that is or what it looks like doesn't matter, just think of it as an abstract concept) that says "I need this function whose exact address I don't know at the moment."

Once everything has been compiled into intermediary files, the linker is what finishes the job. It goes through all the intermediary files and links them together into a final binary. Since it's putting things together, it does know the actual addresses of all the functions, and so it can replace the external reference tokens with actual CALL instructions to the correct locations in the binary.

The loader, on the other hand, belongs to the operating system, not the compiler. Its job is to load the binary into memory so it can execute, and to finish up the linking process, since the linker can only resolve code it knows about. If your program is using any DLLs, they are external even to the compiled binary, so the linker doesn't know their address. It leaves external reference tokens in the final binary in a format that the OS's loader knows about, and then the loader goes through and matches these tokens to the actual function addresses in the DLLs once everything has been loaded into memory.

To find out more about linkers, I think they'll generally be discussed in combination with compilers. They are for knitting your various modules together into a cohesive unit, finalizing addresses within that code. Some may even try to perform optimizations.

To find out more about loaders, I think they'll generally be discussed in combination with writing compilers for particular architectures unless you mean loader as a synonym for linker. I'm thinking of the loader as the part of the executable file header that tells the operating system how to open and execute your compiled software.

I agree that reading the Wikipedia articles will probably impart more information than you're looking for. As to where they come into development ... generally they are beyond the control of the project, and are part of the selection of the operating system and the development package you choose to use. It's very rare that you would use (for example) MSVC but want to run a GCC based linker ... might not even be possible. The ONLY place I've ever used a non-standard linker was at IBM when we were using development copies.

If you have more particular, specific questions about these topics, I think you'll find a much better response.

Computers basicly work with binary numbers,
people speak their native languages,
so, programming languages are for communication between people and computers.
If you say: Add 2 and 3 and then substract 1 from it, I doubt that computer would understand anything (maybe in some programming language it would).
So, you need to translate your source code into a format that computer understands, so you need a compiler, which translates a programming language to co called object code.
But object code is not yet the language a computer understands and executes directly. So it needs a linker which will make an executable file that containts instructions in so called machine language; a machine language is a set of operations coded into binary numbers which processor understands. All binary instructions have it's structure and it's published by a processor manufacturers. You can look for it on say Intel's site and see how do they look like.
I can't give a satisfactory answer for loaders at the moment so please search Google as a beginning step.

I haven't read this book yet but it covers your area of interest. If I had to pick an area I'll say compilers but that doesn't mean an understanding of computer architecture or operating system isn't necessary, is just to a lesser degree.I recommend playing around with the Gnu tools like gdb, as, ld to get a better understanding of what's going on.