amh@uvacs.cs.virginia.edu (Anne Holler) (05/07/91)
I have read the recent exchanges on the merits of inlining with interest. For the past several years, I have been researching automatic inlining. This semester, I successfully defended my dissertation entitled "A Study of the Effects of Subprogram Inlining" (faculty advisor: Jack Davidson). An overview of some of my results, pertinent to the ongoing net discussion, follows. Subprogram inlining can yield significant improvements in execution time. With the caveat that particular speed-ups observed depend on a number of factors, including the computer systems and test programs used, reductions in program execution time attributed to subprogram inlining (18% [RiG89], 12% [DaH88]) are comparable to those that have been reported for code improvement techniques recognized in the compiler community as effective, such as register allocation (17% [Ben89]) and loop-invariant code motion (13% [Pow84]). The belief that there is inevitably a time cost associated with the increased size of inlined code is challenged by the data presented in my dissertation. Clearly it is true that if, as a result of inlining, a program becomes too large to fit in the address space of its target machine or to be handled by the tools on its host system, or if it consumes an unacceptable amount of space on a secondary storage device or requires more time to be preloaded than the time saved by inlining, then the increased size is detrimental to the inlined version of the program. However, I found, using equations demonstrated to describe the execution time performance of noninlined and inlined versions of a program, that the increased size of inlined code did not affect its execution time performance on a set of demand-paged virtual memory machines. In addition, I discovered a modest improvement in the caching and paging behavior of test programs' inlined versions. I characterized certain situations in which inlining was not beneficial, which were associated with its interactions with register allocation. The greater competition in inlined code of a function's variables for the allocable registers may cause variables assigned to registers in the noninlined version of a program to be assigned to stack memory locations in an inlined version. Also, inlining can augment the number of times that registers are saved and restored by, in effect, moving the save and restore operations to more frequently executed sites and by duplicating the save and restore operations. These problems are avoidable; for example, noninlined and inlined versions of a program can employ the exact same allocation of registers, with register spills and reloads being executed in inlined code at points corresponding to where register saves and restores are performed in noninlined code. Moreover, it is possible to do a better job of allocating registers in inlined code than in noninlined. Register assignments in inlined subprogram code can be tailored to the context of each site at which the code is expanded, and the worst case assumptions compilers often make at a call site concerning the callee's accesses to parameters and global variables can be replaced with accurate information about such accesses when the callee's body is substituted for the call. However, achieving this hypothetical improvement in the allocation of registers for inlined code is challenging. Ideally, a subprogram inliner should be accompanied by a compiler that employs the "shrink-wrapping" [Cho88] or "smarter caller" [DaW89] scheme for placing register save and restore operations at desirable points in the code, and that allocates registers using variable usage counts, register coloring, and live range splitting. In my work, inlining was performed in conjunction with C compilers of modest optimization capability. Hall [CHT90] investigated inlining in combination with highly optimizing FORTRAN compilers and made a number of discoveries, including the observation that the size growth associated with inlining was greatly ameliorated by such compilers. Richardson and Ganapathi [RiG89] compared inlining and interprocedural data-flow analysis in a highly optimizing Pascal compiler, and found inlining to be the more effective of the two techniques. Respectfully, Anne Holler (amh@uvacs.cs.virginia.edu) References [Ben89] M. E. Benitez, A Global Object Code Optimizer, Masters Thesis, University of Virginia, 1989. [Cho88] F. C. Chow, Minimizing Register Usage Penalty at Procedure Calls, SIGPLAN Notices 23,7 (1988), 85-94. [CHT90] K. D. Cooper, M. Hall and L. Torczon, An Experiment with Inline Substitution, Rice Computer Tech. Rep. 90-128, Rice University, 1990. [DaH88] J. W. Davidson and A. M. Holler, A Study of a C Function Inliner, Software-Practice & Experience 18(1988), 775-790. [DaW89] J. W. Davidson and D. B. Whalley, Methods for Saving and Restoring Register Values across Function Calls, Computer Science Tech. Rep. 89-11, University of Virginia, 1989. [Pow84] M. L. Powell, A Portable Optimizing Compiler for Modula-2, SIGPLAN Notices 19,6 (1984), 310-318. [RiG89] S. Richardson and M. Ganapathi, Interprocedural Analysis vs. Procedure Integration, Inf. Proc. Letters 32(1989), 137-142. -- Send compilers articles to compilers@iecc.cambridge.ma.us or {ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.