osmith@acorn.co.uk (Owen Smith) (02/21/91)
In article <BJ~&89*@warwick.ac.uk> csuwr@warwick.ac.uk (Derek Hunter) writes: >In article <5166@acorn.co.uk> john@acorn.co.uk (John Bowler) writes: >>With an ARM3 in-lining is likely to be disadvantageous - code size will >>increase significantly in some program inner loops and this will descrease >>overall performance because the in-line code effectively flushes the cache. >Really? Not that I know much about this, but I thought that caches just > slurrped everything in, including any subroutines called, so a subroutine > would have filled the cache (just a little) more than the inline coding. The cache is of very limited size. Say your loop contains five calls to strcpy(). Five inline versions of strcpy() takes up a lot more cache space than five function calls and the function version of strcpy(). Taking up more cache space like this means less of the rest of youe code/data fits in the cache, so you will probably get a speed degradation from the inlining. My example is a rather extreme case, but the principal does hold. Procedure calls are dirt cheap on the ARM, particularly so if both the caller and the callee are in the cache, which is more likely if you do not have inlining. In the case of a shared C library, the win is even greater. Several different programs can all be using the same cached strcpy() code. Owen. The views expressed are my own and are not necessarily those of Acorn.