I have read the recent exchanges on the merits of inlining with
interest. For the past several years, I have been researching automatic
inlining. This semester, I successfully defended my dissertation entitled
"A Study of the Effects of Subprogram Inlining" (faculty advisor: Jack
Davidson). An overview of some of my results, pertinent to the ongoing net
discussion, follows.
Subprogram inlining can yield significant improvements in execution time.
With the caveat that particular speed-ups observed depend on a number of
factors, including the computer systems and test programs used, reductions
in program execution time attributed to subprogram inlining (18% [RiG89],
12% [DaH88]) are comparable to those that have been reported for code
improvement techniques recognized in the compiler community as effective,
such as register allocation (17% [Ben89]) and loop-invariant code motion
(13% [Pow84]).
The belief that there is inevitably a time cost associated with the
increased size of inlined code is challenged by the data presented in my
dissertation. Clearly it is true that if, as a result of inlining, a
program becomes too large to fit in the address space of its target machine
or to be handled by the tools on its host system, or if it consumes an
unacceptable amount of space on a secondary storage device or requires more
time to be preloaded than the time saved by inlining, then the increased
size is detrimental to the inlined version of the program. However, I
found, using equations demonstrated to describe the execution time
performance of noninlined and inlined versions of a program, that the
increased size of inlined code did not affect its execution time performance
on a set of demand-paged virtual memory machines. In addition, I discovered
a modest improvement in the caching and paging behavior of test programs'
inlined versions.
I characterized certain situations in which inlining was not beneficial,
which were associated with its interactions with register allocation. The
greater competition in inlined code of a function's variables for the
allocable registers may cause variables assigned to registers in the
noninlined version of a program to be assigned to stack memory locations in
an inlined version. Also, inlining can augment the number of times that
registers are saved and restored by, in effect, moving the save and restore
operations to more frequently executed sites and by duplicating the save and
restore operations. These problems are avoidable; for example, noninlined
and inlined versions of a program can employ the exact same allocation of
registers, with register spills and reloads being executed in inlined code
at points corresponding to where register saves and restores are performed
in noninlined code. Moreover, it is possible to do a better job of
allocating registers in inlined code than in noninlined. Register
assignments in inlined subprogram code can be tailored to the context of
each site at which the code is expanded, and the worst case assumptions
compilers often make at a call site concerning the callee's accesses to
parameters and global variables can be replaced with accurate information
about such accesses when the callee's body is substituted for the call.
However, achieving this hypothetical improvement in the allocation of
registers for inlined code is challenging. Ideally, a subprogram inliner
should be accompanied by a compiler that employs the "shrink-wrapping"
[Cho88] or "smarter caller" [DaW89] scheme for placing register save and
restore operations at desirable points in the code, and that allocates
registers using variable usage counts, register coloring, and live range
splitting.
In my work, inlining was performed in conjunction with C compilers of
modest optimization capability. Hall [CHT90] investigated inlining in
combination with highly optimizing FORTRAN compilers and made a number of
discoveries, including the observation that the size growth associated with
inlining was greatly ameliorated by such compilers. Richardson and
Ganapathi [RiG89] compared inlining and interprocedural data-flow analysis
in a highly optimizing Pascal compiler, and found inlining to be the more
effective of the two techniques.
Respectfully,
Anne Holler (amh@uvacs.cs.virginia.edu)