[comp.sys.acorn] Compiler inlining

osmith@acorn.co.uk (Owen Smith) (02/21/91)
In article <BJ~&89*@warwick.ac.uk> csuwr@warwick.ac.uk (Derek Hunter) writes:

>In article <5166@acorn.co.uk> john@acorn.co.uk (John Bowler) writes:

>>With an ARM3 in-lining is likely to be disadvantageous - code size will
>>increase significantly in some program inner loops and this will descrease
>>overall performance because the in-line code effectively flushes the cache.

>Really? Not that I know much about this, but I thought that caches just
> slurrped everything in, including any subroutines called, so a subroutine
> would have filled the cache (just a little) more than the inline coding.

The cache is of very limited size. Say your loop contains five calls
to strcpy(). Five inline versions of strcpy() takes up a lot more cache
space than five function calls and the function version of strcpy(). Taking
up more cache space like this means less of the rest of youe code/data fits
in the cache, so you will probably get a speed degradation from the
inlining. My example is a rather extreme case, but the principal does hold.
Procedure calls are dirt cheap on the ARM, particularly so if both the
caller and the callee are in the cache, which is more likely if you do
not have inlining. In the case of a shared C library, the win is even greater.
Several different programs can all be using the same cached strcpy() code.

Owen.

The views expressed are my own and are not necessarily those of Acorn.