[comp.lang.misc] A note for those not consumed by efficiency worries

scs@athena.mit.edu (Steve Summit) (03/18/90)

It is unfortunate that perpetual concerns about microoptimization
tend to suggest that function call overhead is unacceptable and
should be avoided.  For most programs, readability and
maintainability are far more important than efficiency, and good
modularity (accompanied by lots of function calls) is of course
an important property of a well-written, readable and
maintainable program.

I hope that the abstruse and low-level function call details
being discussed here won't discourage people from moving code to
subroutines whenever it seems appropriate.  Worrying about
register allocation and cycle shaving should be of concern only
to compiler writers.

(Remember, too, that function calls on any machine are by no
means "slow."  This discussion is splitting hairs between
"extremely fast" and "ridiculously fast;" the concerns about
having to save state in a stack frame merely indicate that it is
nontrivial to make things ridiculously fast.)

In article <14273@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:

>It seems to me that if a procedure is so small that it can only find
>use for 24 (or fewer) registers, then it is small enough to be a real
>good candidate for inlining...  However, most procedures
>of any size can benefit from quite a few more registers than 24.

I'd dispute this, and I'm not even religious about keeping all of
my subroutines smaller than a page or anything.  Few of my
routines even have 24 variables, let alone 24 variables that have
good reason to be put in registers.  (On many machines, it's
better to leave a variable that's only used once or twice out of
a register, because it would take as long to save and restore the
previous register contents -- on the stack -- as it would to
access the variable "conventionally" -- on the stack.)  I
understand that a good compiler might find use for additional
registers to store intermediate values not explicitly named by
the coder, but still, 24 seems like an awful lot.  Good code,
particularly object-oriented code, has lots and lots of small
routines, but they don't all need inlining.  (A medium-sized --
i.e. small but not tiny -- routine that's called a lot shouldn't
necessarily be inlined.  Code size still matters.)

                                            Steve Summit
                                            scs@adam.mit.edu

jlg@lambda.UUCP (Jim Giles) (03/20/90)

From article <1990Mar17.190858.13930@athena.mit.edu>, by scs@athena.mit.edu (Steve Summit):
> It is unfortunate that perpetual concerns about microoptimization
> tend to suggest that function call overhead is unacceptable and
> should be avoided.  For most programs, readability and
> maintainability are far more important than efficiency, and good
> modularity (accompanied by lots of function calls) is of course
> an important property of a well-written, readable and
> maintainable program.

I quite agree.  No one in this thread of discussion has suggested otherwise,
certainly not me.  In fact, I have argued this very point on this newsgroup
in the recent past (against someone who thought the programmer should do
his own common expression eliminations, etc.).  If your most critical code
issue is readibility, maintainability, or just_get_it_out_the_door_before_
the_deadline, then concentration on optimization issues is not likely to
pay off.

HOWEVER, there _ARE_ some contexts where speed is everything.  Procedure
call overhead is one area where the compiler/environment cannot yet be
expected to do the optimizations for you (in fact, the compiler _CAN'T_
do it if separate compilation is allowed).

> [...]
> (Remember, too, that function calls on any machine are by no
> means "slow."  This discussion is splitting hairs between
> "extremely fast" and "ridiculously fast;" the concerns about
> having to save state in a stack frame merely indicate that it is
> nontrivial to make things ridiculously fast.)

This is simply not true.  Procedure calls are the slowest 'single operations'
that high-level languages provide.  On the Cray, for example, call/return
overhead varies from a low of several dozen to a high of a few hundred
clocks!  In fact, for 'baselevel' or 'leaf' procedures, the overhead
often exceeds the the time spent in the procedure by multiple factors.
If you claim that procedure calls are fast, let's see your benchmark.

Note: method for benchmarking call/return overhead.  Take a production
      code.  Fetch the real-time clock before and after a 'leaf' procedure
      call.  'Inline' the procedure and time the 'inlined' code.  compare
      the times.  No fair using 'dummy' drivers to make the call - you
      need an active 'caller' which has 'live' values to be saved/restored. 

> In article <14273@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>> [... 24 registers is a small ceiling ... ]
>
> I'd dispute this, and I'm not even religious about keeping all of
> my subroutines smaller than a page or anything.  Few of my
> routines even have 24 variables, let alone 24 variables that have
> good reason to be put in registers.  [...]

The smallest routine I've ever written (this was for a class, years ago)
was a matrix multiply on fixed-sized matrices (they were always guaranteed
to be 3 by 3).  Say the array arguments are A, B, and C - I want to compute
the value of A=B*C.  Each of the 18 elements of B and C are used 3 times
during the calculation - each of the 9 elements in A are summed to 3 times.
For efficiency, you want to load each element of the arguments only once.
Similarly, you want to store each element of A only once.  The rest of the
time, you want the data in registers.  If speed were paramount, I would
unroll the loops of the matrix multiply and do the whole thing explicitly.
I might be able to find an order in which to do the calculation which only
needed 24 registers, but the the obvious method would be to put each of
the array elements in a separate register - that's 27 'live' values!

Note that the above was the _smallest_ procedure I've written!  Normally
I would not even consider doing matrix multiply as a procedure call
(the idiom of three nested loops, etc., is simple to read and widely
recognized and understood).  In a language like C, this problem would
be a good candidate for a MACRO.

To be sure, there are certainly procedures which don't need as many as
24 registers.  But, if speed were paramouint, I suspect that nearly all
of these would be immediate candidates for 'inlining'.  The only exceptions
to this would be assembly routines which used hardware/instructions which
were not accessible from the high-level language.

> [...]                                              Good code,
> particularly object-oriented code, has lots and lots of small
> routines, but they don't all need inlining.

I disagree.  Good code is neither monolithic nor fragmented.  There is
a compromise between these two tendencies which is better than either.
A lot of the OOP that I've seen leans way too far toward fragmentation.
I've seen a matrix multiply method implemented as a sequence of calls
to a vector dot product method, which in turn does a vector elemental
multiply method and a vector summation method.  The problem is that
matrix multiply is a _single_ operation - it's actually _easier_ to
read and maintain if it's all done in one place.

> [...]                                         (A medium-sized --
> i.e. small but not tiny -- routine that's called a lot shouldn't
> necessarily be inlined.  Code size still matters.)

I agree, with a proviso: code should _always_ be 'inlined' if the user
explicitly asks for it.  Otherwise, the rules for automatic 'inlining'
should be:

   1) 'Inline' if the procedure/method is only called once in the
      entire call tree.
   2) 'Inline' if the procedure/method body is smaller than some multiple
      of the call/return overhead code (including register save/restores).
   Otherwise, don't 'inline'.

Of course, this requires a language environment which allows 'inlining'
at all.  I am hopeful that this ability may soon (within this decade) be
commonplace.  But, who knows?

J. Giles

amull@Morgan.COM (Andrew P. Mullhaupt) (03/20/90)

The discussion so far about C's excess function call overhead hasn't
really differentiated between the two sides of the efficiency concern.
There are two distinct points to be made, and the come from different
points of view.

Most programmers might say that they are not overly concerned with the
efficiency of a single function call. In fact, they would be happier
if they could be completely unconcerned with the issue. These are the
people (myself totally included) who want to use functions to properly
structure their code. 

A small minority, (perhaps they have to write high pressure versions
of algorithms which are rich in complex recursive data structures,
or they implement 2-D random choice methods, or Multigrid methods,
or some other important example of a function call overhead rich
application), might have to admit that they have been forced to
write their source code around reducing function call overhead.

The distance between these two points of view might be shorter than
it seems at first. Both groups are concerned with function call
overhead, but the first group are adequately served by their compilers,
and the second group have to strap on their big boots and do it on
their own. The first group are _vicariously_ concerned with function
call overhead. We want the compiler to apply every exigency and
subterfuge to arrive at the last word in object code, no matter how
much readability liberty we have taken in the source. (So we have
no need to worry about this issue when coding.)

C compilers have to treat this issue seriously, since C often uses
a slower calling convention than Pascal or FORTRAN, in order that
C can support functions with a nonconstant number (and/or type) of
arguments. Many C compilers are specially equipped to generate good
code for function calls despite this possible handicap, and it is
entirely to their credit. But this is no reason to ignore the point.
The way that I find myself normally thinking about function call
overhead is in case I wonder if I've somehow put my compiler off the
scent unnecessarily, or if I have to consider this possibility while
troubleshooting something else that isn't performing up to scratch.
It is at these times when members of the first group of programmers
("let the compiler take care of it") have to think like the second
group ("leggo my microsecond"). 

Later,
Andrew Mullhaupt

scs@athena.mit.edu (Steve Summit) (03/21/90)

In article <14278@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>From article <1990Mar17.190858.13930@athena.mit.edu>, by scs@athena.mit.edu (Steve Summit):
>> It is unfortunate that perpetual concerns about microoptimization
>> tend to suggest that function call overhead is unacceptable and
>> should be avoided.
>
>I quite agree.  No one in this thread of discussion has suggested otherwise,
>certainly not me.  In fact, I have argued this very point on this newsgroup
>in the recent past...

I hope it didn't sound like I had singled you out for criticism.
(I should have said "seem to suggest.")  I haven't been keeping
up with this list of late, and so wouldn't have seen your earlier
comments.

>> (Remember, too, that function calls on any machine are by no
>> means "slow."  This discussion is splitting hairs between
>> "extremely fast" and "ridiculously fast..."
>
>This is simply not true.  Procedure calls are the slowest 'single operations'
>that high-level languages provide...
>If you claim that procedure calls are fast, let's see your benchmark.

I knew I was gonna get reamed on this one, and you're all going
to groan when you hear the justification I had in mind all along.
My benchmark is from a PDP 11/23, using Ritchie's compiler.  I
measure approximately 60 microseconds per subroutine call, plus
about 4 microseconds per argument pushed.  (This in spite of the
fact that small run-time subroutines are called in the midst of
procedure call and return to build up and tear down the stack
frame.)

60 microseconds may not seem "fast" when compared with more
modern processors, but I assure you that, in real terms, it is a
tiny period of time indeed.  You and I can't do much of anything
in 60 microseconds.  One of the facts inevitably overlooked in
efficiency discussions (even more often than the fact that it
shouldn't be worried about too early) is the fact that computers
are intrinsically very, very fast.  It's their job.  And for most
of the code in most programs, they are quite fast enough.

I know y'all know this, as in fact I know that there are times
when efficiency does matter.  I just feel compelled to scramble
up on this soapbox from time to time to make sure that the other
side gets heard.  The industry is not plagued by software that is
too slow.  It is plagued by software that is behind schedule,
full of bugs, and impossible to maintain.  Too many programmers
seize upon the flimsiest excuses to abandon clean, simple coding
in favor of some purportedly more efficient implementation, which
is why I get defensive whenever efficiency is discussed.

                                            Steve Summit
                                            scs@adam.mit.edu

sullivan@aqdata.uucp (Michael T. Sullivan) (03/22/90)

:From article <1990Mar21.061420.9862@athena.mit.edu>, by scs@athena.mit.edu (Steve Summit):
> 
> I know y'all know this, as in fact I know that there are times
> when efficiency does matter.  I just feel compelled to scramble
> up on this soapbox from time to time to make sure that the other
> side gets heard.  The industry is not plagued by software that is
> too slow.  It is plagued by software that is behind schedule,
> full of bugs, and impossible to maintain.  Too many programmers
> seize upon the flimsiest excuses to abandon clean, simple coding
> in favor of some purportedly more efficient implementation, which
> is why I get defensive whenever efficiency is discussed.

Agreed.  I often see programmers who spend a lot of time trying to
speed up code that isn't causing a slowdown, and in fact there isn't
a slowdown in the first place.  Heck, I know people who don't want
to use shell scripts becasue they are slower than C.  But for what
shell scripts are used for, speed doesn't matter.  Same with a lot
of programs.  Most programs don't need to be sped up at all.  If
one does, THEN start micro-optimization but don't trash a program's
modularity and portability because of an imagined slowdown.
-- 
Michael Sullivan          uunet!jarthur!aqdata!sullivan
aQdata, Inc.              sullivan@aqdata.uucp
San Dimas, CA             +1 714 599 9992

peter@ficc.uu.net (Peter da Silva) (03/22/90)

> Agreed.  I often see programmers who spend a lot of time trying to
> speed up code that isn't causing a slowdown, and in fact there isn't
> a slowdown in the first place.  Heck, I know people who don't want
> to use shell scripts becasue they are slower than C...

Look at C News. It uses shell scripts extensively, but where it counts
it's orders of magnitude faster than B News.

Code smarter, not just faster.
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

jensting@skinfaxe.diku.dk (Jens Tingleff) (03/22/90)

sullivan@aqdata.uucp (Michael T. Sullivan) writes:

>:From article <1990Mar21.061420.9862@athena.mit.edu>, by scs@athena.mit.edu (Steve Summit):
>> 
>> I know y'all know this, as in fact I know that there are times
[....]

>Agreed.  I often see programmers who spend a lot of time trying to
>speed up code that isn't causing a slowdown, and in fact there isn't
[..]

Quite right. I always repeat to myself, whenever I'm urged to do an
unmotivated source code ``optimisation'', 

	MAKE THE THING WORK, THEN MAKE IT FAST.

I also recall some advice from a CS textbook

	1. rule on improvements in the form of source code optimisation :
		DON'T

very good rule that.

	Jens

jensting@diku.dk is
Jens Tingleff MSc EE, Research Assistent at DIKU
	Institute of Computer Science, Copenhagen University
Snail mail: DIKU Universitetsparken 1 DK2100 KBH O

ruud@targon.UUCP (Ruud Harmsen) (03/23/90)

In article <1990Mar22.072712.10902@diku.dk> jensting@skinfaxe.diku.dk (Jens Tingleff) writes:
>
>	MAKE THE THING WORK, THEN MAKE IT FAST.
>
True.  But only with one very important footnote:

While making the thing just work, do consider performance, and know where and
how you might want to optimise later.
If in the design of a programme or system you totally disregard all
performance issues, you might have to rebuild large parts from scratch,
and/or make the whole thing totally incomprehensible when optimisizing.

I've seen things like that happen, and I can tell you it's a tragic sight
to watch.  A much better approach is to have the spots for optimization
marked and planned, so it is easy to fulfill, will be really effective
and does not violate the original design.

mph@lion.inmos.co.uk (Mike Harrison) (03/23/90)

In article <1990Mar22.072712.10902@diku.dk> jensting@skinfaxe.diku.dk (Jens Tingleff) writes:
>sullivan@aqdata.uucp (Michael T. Sullivan) writes:
> ...
>Quite right. I always repeat to myself, whenever I'm urged to do an
>unmotivated source code ``optimisation'', 
>
>	MAKE THE THING WORK, THEN MAKE IT FAST.
>
The formulation of this rule that I always use is:

"It is easier to make a correct, slow program run fast than to make a fast,
wrong program run right".


On the subject of rules, I once saw this one in a book:

  "The first law of system design : 

       Never check for an error condition that you don't know how to handle!"

This sounds like the fruit of bitter experience.

Mike,


Michael P. Harrison - Software Group - Inmos Ltd. UK.
-----------------------------------------------------------
UK : mph@inmos.co.uk             with STANDARD_DISCLAIMERS;
US : mph@inmos.com               use  STANDARD_DISCLAIMERS;

cik@l.cc.purdue.edu (Herman Rubin) (03/24/90)

In article <1084@targon.UUCP>, ruud@targon.UUCP (Ruud Harmsen) writes:
> In article <1990Mar22.072712.10902@diku.dk> jensting@skinfaxe.diku.dk (Jens Tingleff) writes:
> >
> >	MAKE THE THING WORK, THEN MAKE IT FAST.
> >
> True.  But only with one very important footnote:
> 
> While making the thing just work, do consider performance, and know where and
> how you might want to optimise later.
> If in the design of a programme or system you totally disregard all
> performance issues, you might have to rebuild large parts from scratch,
> and/or make the whole thing totally incomprehensible when optimisizing.
> 
> I've seen things like that happen, and I can tell you it's a tragic sight
> to watch.  A much better approach is to have the spots for optimization
> marked and planned, so it is easy to fulfill, will be really effective
> and does not violate the original design.

There are far too many algorithms which work, but cannot be speeded up
appreciably, while other algorithms are very much faster.  The choice of
the algorithm depends on what can be done in the hardware.

A good example is the frexp() function in 4.2BSD.  This was (ugh) coded in C
for the implementations I have seen in a straightforward manner.  By using
additional resources, it could have been speeded up in the (unfortunately
too common) bad cases, but only by using additional resources and running
slower in the better cases.  It worked, but could not be made fast.

However, a semi-portable machine-dependent algorithm, deliberately ignoring
typing, could be written which is in the ballpark of machine language 
algorithms.  Starting with these, one could make it fast.  But unless
the operations of pack and unpack are in the language, this cannot be
done otherwise.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

henry@utzoo.uucp (Henry Spencer) (03/25/90)

In article <1084@targon.UUCP> ruud@targon.UUCP (Ruud Harmsen) writes:
>If in the design of a programme or system you totally disregard all
>performance issues, you might have to rebuild large parts from scratch,
>and/or make the whole thing totally incomprehensible when optimisizing.

If you are seriously bent on high performance, however, you *will* have
to rebuild significant parts from scratch, possibly several times.  Humans
simply cannot predict bottlenecks well enough to provide for them ahead
of time, and one gets much more mileage out of rethinking and reimplementing
code than out of tweaking it.  There are parts of C News that have been
reimplemented five or six times.
-- 
"Apparently IBM is not IBM    |     Henry Spencer at U of Toronto Zoology
compatible." -Andy Tanenbaum  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

tneff@bfmny0.UU.NET (Tom Neff) (03/26/90)

I.	Code it portably and soundly.

II.	Identify the places where the CPU wastes most of its time.

III.	Replace those sections, CONDITIONALLY if possible, with less
	portable, faster code.

Every good programmer KNOWS this works, and saves time in the long run.
That's precisely why some programmers HATE it, and waste words railing
against it.  It's like eating your vegetables.  Programming, as a field,
does not venerate vegetable eaters. :-)

The truth is that the squeaky wheel gets the grease, -or- no news is good
news.  The programs out there that are written RIGHT and optimized RIGHT
are the ones you never hear about, because they don't make news, they
just WORK.
-- 
Annex Canada now!  We need the room,    \)      Tom Neff
    and who's going to stop us.         (\      tneff@bfmny0.UU.NET

ge@kunivv1.sci.kun.nl (Ge' Weijers) (03/28/90)

ruud@targon.UUCP (Ruud Harmsen) writes:

>In article <1990Mar22.072712.10902@diku.dk> jensting@skinfaxe.diku.dk (Jens Tingleff) writes:
>>
>>	MAKE THE THING WORK, THEN MAKE IT FAST.
>>
>True.  But only with one very important footnote:

>While making the thing just work, do consider performance, and know where and
>how you might want to optimise later.
>If in the design of a programme or system you totally disregard all
>performance issues, you might have to rebuild large parts from scratch,
>and/or make the whole thing totally incomprehensible when optimisizing.

>I've seen things like that happen, and I can tell you it's a tragic sight
>to watch.  A much better approach is to have the spots for optimization
>marked and planned, so it is easy to fulfill, will be really effective
>and does not violate the original design.

I have to second that. You have to look at the feasibility of what you are
doing. If your solution of a problem is O(e ** N) for large N, look for a
better solution.

You still need to do some rough calculations.
Ge' Weijers                                    Internet/UUCP: ge@cs.kun.nl
Faculty of Mathematics and Computer Science,   (uunet.uu.net!cs.kun.nl!ge)
University of Nijmegen, Toernooiveld 1         
6525 ED Nijmegen, the Netherlands              tel. +3180612483 (UTC-2)