toon@news.sara.nl (03/04/91)
In article <658@spim.mips.COM>, mash@mips.com (John Mashey) writes: > In article <7063:Mar202:29:0091@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > [ ... deleted observation that inlining might be useful in some cases ... ] > > HOWEVER, I would observe that: > a) People must be careful NOT to assume that function calls > are inordinately expensive. This is simply NOT true these > days. In particular, on most of the current RISCs, especially > when aided and abetted by good register allocators, function > calls only cost a few cycles; in particular, calls to > leaf routines (i.e., those that do not call others) are > usually very cheap, because they almost never save/restore > any registers. Well, I'm skating on thin ice here (to quote a certain most famous book on a much used HLL), because I know very little about chips and hardware, but it was my impression that there are in general two reasons why inlining delivers faster code and one why it could slow down your program: Pro: 1. Inlining removes the jump-to-subroutine-and-save-return-address and return-from-subroutine instructions. Now my impression was that the cost of these instructions is not so much their cycle count, but the fact that they destroy the locality of the code references (invalidating the instruction cache) two times. 2. Inlining the code enables certain optimizations and simplifications of the code that are impossible in the library routine that should be able to catch *all* uses of it. Con: 3. Inlining makes your program larger, therefore in a VM system you need a larger working set, which, other things being equal, leads to more paging and finally, to more disk I/O. [ ... more useful observations deleted ...] > -- > -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> > UUCP: mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash > DDD: 408-524-7015, 524-8253 or (main number) 408-720-1700 > USPS: MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086 -- Toon Moene, SARA - Amsterdam (NL) Internet: TOON@SARA.NL /usr/lib/sendmail.cf: Do.:%@!=/
cliffc@rice.edu (Cliff Click) (03/05/91)
[ deleted pros/cons of inlining ] Actually, some folks at Rice here observed that inlining causes large modules and thus large numbers of large live ranges and the *register allocator* drops the ball. In some cases the best thing you can do is save a bunch of registers, and load them with new values - exactly what the subroutine calling code does. Most graph coloring allocators spill *1* value at a time, dribbling them throughout the module. A fast save-multiple and load-multiple registers instruction is used in the calling sequence, but not in the code which spills one-at-a-time. A second effect is that things may spill inside loops inside the subroutines after inlining, but not before (because they were effectively spilled at the subroutine entrance). These defects in register allocation are the targets of a spate of papers on hierarchical register allocators. Cliff -- Cliff Click cliffc@rice.edu
jesup@cbmvax.commodore.com (Randell Jesup) (03/05/91)
In article <1991Mar4.111721.2819@news.sara.nl> toon@news.sara.nl writes: >Pro: >1. Inlining removes the jump-to-subroutine-and-save-return-address > and return-from-subroutine instructions. Now my impression was > that the cost of these instructions is not so much their cycle > count, but the fact that they destroy the locality of the code > references (invalidating the instruction cache) two times. Except that by inlining, separate calls to the same function now are separate copies of the code, and therefore are more likely to bust your cache (you may have 5 copies of strcpy() or whatever in the cache instead of 1, and of course you had to take initial misses on numbers 2-5). >2. Inlining the code enables certain optimizations and simplifications > of the code that are impossible in the library routine that should > be able to catch *all* uses of it. Yes, and avoids having to assume that all scratch registers are trashed, a big win in some architectures/compilers, especially for very small routines like strcpy. -- Randell Jesup, Keeper of AmigaDos, Commodore Engineering. {uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com BIX: rjesup The compiler runs Like a swift-flowing river I wait in silence. (From "The Zen of Programming") ;-)