The discussion about "smart" linking versus "dumb" linking reminded me of a
project of ours where dumb linking created problems.

Our product software was implemented on DEC's VMS operating system. Because
of the way it supported process context switching, VMS performed better with a
small number of larger programs rather than a larger number of small programs
like Unix. The links that created our programs pulled in anywhere from 50 to
300 modules.

It was noticed that many of the programs contained common sub-trees of
linked-in modules even though the functions of the modules were not relevant
to the overall function of many of the programs. Some investigation pointed
out what was happening.

The linker would start at the main module of a program and pull in the tree of
modules it needed. Somewhere in this tree, it would pull in a module because
routine A of that module was used by the program. The designer of the module
had however also packaged routine B in that module. Routine B was not used by
most programs, but its references to other modules had to be resolved. The
linker happily pulled in another subtree of modules (25 - 50 modules) even
though the sub-tree was not used by many of the programs. The end result was
that about 25% of the object code linked into a program was not even used by
the program.

It could be argued that the designer should have realized this behaviour and
packaged the modules differently. This would have been difficult. The team
of 30 designers did not have tools that showed how function calls traversed
modules. A designer usually only understood the calling behaviour for a few
modules around the module she/he was responsible for. The function call
hierarchy of the original design was useful, but after several versions of
evolution the software had left the design behind.

One of the recent postings suggested that object/archive libraries can be used
to solve the problem. Each routine is compiled separately and dropped into
the library. No unnecessary routines get linked in. Unfortunately, this
solution would not have worked for us.

The size of our product had already forced us to stop using the VMS library
mechanism due to poor performance. The product consisted of about 900 modules
of which the largest program only used 300. Most programs only used about 50.
If the symbol tables for individual modules in the library are hashed or
otherwise indexed, the linker can very quickly determine whether a given
symbol can be resolved by a module in the library. I don't know whether this
is a common technique or not. Regardless, the killer for us was the sheer
number of modules and symbols to be resolved. The performance of the library
mechanism suggested that there was no library-wide indexing scheme, so the
linker had to linearly search through the symbol tables of 900 modules for a
symbol. As the overall number of modules in the library grew and the size of
a program (number of symbols to be resolved) grew, the time required to link
the program grew exponentially. It eventually became faster to toss the
library and create dependency analysis tools to generate link commands that
explicitly mention each module to be linked in.

Splitting up the object modules amongst several libraries was not feasible,
because any conceivable after-the-fact organization would still have required
most libraries to be included in each link command.

These dependency analysis tools could handle the 900 modules we had to deal
with, but a separate object module per routine would have expanded the number
of object modules to between 2500 and 3000. This would have been a bit much
to manage.

I thought you might be interested in our experience. I am sure that everybody
out there could think up several solutions to the problem. I would be
delighted to hear them, but we have already thought of many of them. For the
sake of brevity :-) I have not mentioned many of these solutions or other
problems that might or did arise when they were tried.