[comp.arch] To inline or not to inline

toon@news.sara.nl (03/04/91)

In article <658@spim.mips.COM>, mash@mips.com (John Mashey) writes:
> In article <7063:Mar202:29:0091@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> 
[ ... deleted observation that inlining might be useful in some cases ... ]
>
> HOWEVER, I would observe that:
> 	a) People must be careful NOT to assume that function calls
> 	are inordinately expensive.  This is simply NOT true these
> 	days.  In particular, on most of the current RISCs, especially
> 	when aided and abetted by good register allocators, function
> 	calls only cost a few cycles; in particular, calls to
> 	leaf routines (i.e., those that do not call others) are
> 	usually very cheap, because they almost never save/restore
> 	any registers.
Well, I'm skating on thin ice here (to quote a certain most famous
book on a much used HLL), because I know very little about chips and
hardware, but it was my impression that there are in general two
reasons why inlining delivers faster code and one why it could slow
down your program:
Pro:
1. Inlining removes the jump-to-subroutine-and-save-return-address
   and return-from-subroutine instructions. Now my impression was
   that the cost of these instructions is not so much their cycle
   count, but the fact that they destroy the locality of the code
   references (invalidating the instruction cache) two times.
2. Inlining the code enables certain optimizations and simplifications
   of the code that are impossible in the library routine that should
   be able to catch *all* uses of it.
Con:
3. Inlining makes your program larger, therefore in a VM system you
   need a larger working set, which, other things being equal, leads
   to more paging and finally, to more disk I/O.
[ ... more useful observations deleted ...]
> -- 
> -john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
> UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
> DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
> USPS: 	MIPS Computer Systems MS 1/05, 930 E. Arques, Sunnyvale, CA 94086
-- 

Toon Moene, SARA - Amsterdam (NL)
Internet: TOON@SARA.NL

/usr/lib/sendmail.cf: Do.:%@!=/

cliffc@rice.edu (Cliff Click) (03/05/91)

[ deleted pros/cons of inlining ]

Actually, some folks at Rice here observed that inlining causes large
modules and thus large numbers of large live ranges and the *register 
allocator* drops the ball.  In some cases the best thing you can do
is save a bunch of registers, and load them with new values - exactly
what the subroutine calling code does.  Most graph coloring allocators
spill *1* value at a time, dribbling them throughout the module.  A fast
save-multiple and load-multiple registers instruction is used in the
calling sequence, but not in the code which spills one-at-a-time.  A
second effect is that things may spill inside loops inside the subroutines
after inlining, but not before (because they were effectively spilled at
the subroutine entrance).

These defects in register allocation are the targets of a spate of papers
on hierarchical register allocators.

Cliff

--
Cliff Click                
cliffc@rice.edu       

jesup@cbmvax.commodore.com (Randell Jesup) (03/05/91)

In article <1991Mar4.111721.2819@news.sara.nl> toon@news.sara.nl writes:
>Pro:
>1. Inlining removes the jump-to-subroutine-and-save-return-address
>   and return-from-subroutine instructions. Now my impression was
>   that the cost of these instructions is not so much their cycle
>   count, but the fact that they destroy the locality of the code
>   references (invalidating the instruction cache) two times.

	Except that by inlining, separate calls to the same function now
are separate copies of the code, and therefore are more likely to bust
your cache (you may have 5 copies of strcpy() or whatever in the cache
instead of 1, and of course you had to take initial misses on numbers 2-5).

>2. Inlining the code enables certain optimizations and simplifications
>   of the code that are impossible in the library routine that should
>   be able to catch *all* uses of it.

	Yes, and avoids having to assume that all scratch registers are
trashed, a big win in some architectures/compilers, especially for very small
routines like strcpy.

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
The compiler runs
Like a swift-flowing river
I wait in silence.  (From "The Zen of Programming")  ;-)