[comp.lang.c] A note for those not consumed by efficiency worries

scs@athena.mit.edu (Steve Summit) (03/18/90)

It is unfortunate that perpetual concerns about microoptimization
tend to suggest that function call overhead is unacceptable and
should be avoided.  For most programs, readability and
maintainability are far more important than efficiency, and good
modularity (accompanied by lots of function calls) is of course
an important property of a well-written, readable and
maintainable program.

I hope that the abstruse and low-level function call details
being discussed here won't discourage people from moving code to
subroutines whenever it seems appropriate.  Worrying about
register allocation and cycle shaving should be of concern only
to compiler writers.

(Remember, too, that function calls on any machine are by no
means "slow."  This discussion is splitting hairs between
"extremely fast" and "ridiculously fast;" the concerns about
having to save state in a stack frame merely indicate that it is
nontrivial to make things ridiculously fast.)

In article <14273@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:

>It seems to me that if a procedure is so small that it can only find
>use for 24 (or fewer) registers, then it is small enough to be a real
>good candidate for inlining...  However, most procedures
>of any size can benefit from quite a few more registers than 24.

I'd dispute this, and I'm not even religious about keeping all of
my subroutines smaller than a page or anything.  Few of my
routines even have 24 variables, let alone 24 variables that have
good reason to be put in registers.  (On many machines, it's
better to leave a variable that's only used once or twice out of
a register, because it would take as long to save and restore the
previous register contents -- on the stack -- as it would to
access the variable "conventionally" -- on the stack.)  I
understand that a good compiler might find use for additional
registers to store intermediate values not explicitly named by
the coder, but still, 24 seems like an awful lot.  Good code,
particularly object-oriented code, has lots and lots of small
routines, but they don't all need inlining.  (A medium-sized --
i.e. small but not tiny -- routine that's called a lot shouldn't
necessarily be inlined.  Code size still matters.)

                                            Steve Summit
                                            scs@adam.mit.edu

jlg@lambda.UUCP (Jim Giles) (03/20/90)

From article <1990Mar17.190858.13930@athena.mit.edu>, by scs@athena.mit.edu (Steve Summit):
> It is unfortunate that perpetual concerns about microoptimization
> tend to suggest that function call overhead is unacceptable and
> should be avoided.  For most programs, readability and
> maintainability are far more important than efficiency, and good
> modularity (accompanied by lots of function calls) is of course
> an important property of a well-written, readable and
> maintainable program.

I quite agree.  No one in this thread of discussion has suggested otherwise,
certainly not me.  In fact, I have argued this very point on this newsgroup
in the recent past (against someone who thought the programmer should do
his own common expression eliminations, etc.).  If your most critical code
issue is readibility, maintainability, or just_get_it_out_the_door_before_
the_deadline, then concentration on optimization issues is not likely to
pay off.

HOWEVER, there _ARE_ some contexts where speed is everything.  Procedure
call overhead is one area where the compiler/environment cannot yet be
expected to do the optimizations for you (in fact, the compiler _CAN'T_
do it if separate compilation is allowed).

> [...]
> (Remember, too, that function calls on any machine are by no
> means "slow."  This discussion is splitting hairs between
> "extremely fast" and "ridiculously fast;" the concerns about
> having to save state in a stack frame merely indicate that it is
> nontrivial to make things ridiculously fast.)

This is simply not true.  Procedure calls are the slowest 'single operations'
that high-level languages provide.  On the Cray, for example, call/return
overhead varies from a low of several dozen to a high of a few hundred
clocks!  In fact, for 'baselevel' or 'leaf' procedures, the overhead
often exceeds the the time spent in the procedure by multiple factors.
If you claim that procedure calls are fast, let's see your benchmark.

Note: method for benchmarking call/return overhead.  Take a production
      code.  Fetch the real-time clock before and after a 'leaf' procedure
      call.  'Inline' the procedure and time the 'inlined' code.  compare
      the times.  No fair using 'dummy' drivers to make the call - you
      need an active 'caller' which has 'live' values to be saved/restored. 

> In article <14273@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>> [... 24 registers is a small ceiling ... ]
>
> I'd dispute this, and I'm not even religious about keeping all of
> my subroutines smaller than a page or anything.  Few of my
> routines even have 24 variables, let alone 24 variables that have
> good reason to be put in registers.  [...]

The smallest routine I've ever written (this was for a class, years ago)
was a matrix multiply on fixed-sized matrices (they were always guaranteed
to be 3 by 3).  Say the array arguments are A, B, and C - I want to compute
the value of A=B*C.  Each of the 18 elements of B and C are used 3 times
during the calculation - each of the 9 elements in A are summed to 3 times.
For efficiency, you want to load each element of the arguments only once.
Similarly, you want to store each element of A only once.  The rest of the
time, you want the data in registers.  If speed were paramount, I would
unroll the loops of the matrix multiply and do the whole thing explicitly.
I might be able to find an order in which to do the calculation which only
needed 24 registers, but the the obvious method would be to put each of
the array elements in a separate register - that's 27 'live' values!

Note that the above was the _smallest_ procedure I've written!  Normally
I would not even consider doing matrix multiply as a procedure call
(the idiom of three nested loops, etc., is simple to read and widely
recognized and understood).  In a language like C, this problem would
be a good candidate for a MACRO.

To be sure, there are certainly procedures which don't need as many as
24 registers.  But, if speed were paramouint, I suspect that nearly all
of these would be immediate candidates for 'inlining'.  The only exceptions
to this would be assembly routines which used hardware/instructions which
were not accessible from the high-level language.

> [...]                                              Good code,
> particularly object-oriented code, has lots and lots of small
> routines, but they don't all need inlining.

I disagree.  Good code is neither monolithic nor fragmented.  There is
a compromise between these two tendencies which is better than either.
A lot of the OOP that I've seen leans way too far toward fragmentation.
I've seen a matrix multiply method implemented as a sequence of calls
to a vector dot product method, which in turn does a vector elemental
multiply method and a vector summation method.  The problem is that
matrix multiply is a _single_ operation - it's actually _easier_ to
read and maintain if it's all done in one place.

> [...]                                         (A medium-sized --
> i.e. small but not tiny -- routine that's called a lot shouldn't
> necessarily be inlined.  Code size still matters.)

I agree, with a proviso: code should _always_ be 'inlined' if the user
explicitly asks for it.  Otherwise, the rules for automatic 'inlining'
should be:

   1) 'Inline' if the procedure/method is only called once in the
      entire call tree.
   2) 'Inline' if the procedure/method body is smaller than some multiple
      of the call/return overhead code (including register save/restores).
   Otherwise, don't 'inline'.

Of course, this requires a language environment which allows 'inlining'
at all.  I am hopeful that this ability may soon (within this decade) be
commonplace.  But, who knows?

J. Giles

amull@Morgan.COM (Andrew P. Mullhaupt) (03/20/90)

The discussion so far about C's excess function call overhead hasn't
really differentiated between the two sides of the efficiency concern.
There are two distinct points to be made, and the come from different
points of view.

Most programmers might say that they are not overly concerned with the
efficiency of a single function call. In fact, they would be happier
if they could be completely unconcerned with the issue. These are the
people (myself totally included) who want to use functions to properly
structure their code. 

A small minority, (perhaps they have to write high pressure versions
of algorithms which are rich in complex recursive data structures,
or they implement 2-D random choice methods, or Multigrid methods,
or some other important example of a function call overhead rich
application), might have to admit that they have been forced to
write their source code around reducing function call overhead.

The distance between these two points of view might be shorter than
it seems at first. Both groups are concerned with function call
overhead, but the first group are adequately served by their compilers,
and the second group have to strap on their big boots and do it on
their own. The first group are _vicariously_ concerned with function
call overhead. We want the compiler to apply every exigency and
subterfuge to arrive at the last word in object code, no matter how
much readability liberty we have taken in the source. (So we have
no need to worry about this issue when coding.)

C compilers have to treat this issue seriously, since C often uses
a slower calling convention than Pascal or FORTRAN, in order that
C can support functions with a nonconstant number (and/or type) of
arguments. Many C compilers are specially equipped to generate good
code for function calls despite this possible handicap, and it is
entirely to their credit. But this is no reason to ignore the point.
The way that I find myself normally thinking about function call
overhead is in case I wonder if I've somehow put my compiler off the
scent unnecessarily, or if I have to consider this possibility while
troubleshooting something else that isn't performing up to scratch.
It is at these times when members of the first group of programmers
("let the compiler take care of it") have to think like the second
group ("leggo my microsecond"). 

Later,
Andrew Mullhaupt

scs@athena.mit.edu (Steve Summit) (03/21/90)

In article <14278@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes:
>From article <1990Mar17.190858.13930@athena.mit.edu>, by scs@athena.mit.edu (Steve Summit):
>> It is unfortunate that perpetual concerns about microoptimization
>> tend to suggest that function call overhead is unacceptable and
>> should be avoided.
>
>I quite agree.  No one in this thread of discussion has suggested otherwise,
>certainly not me.  In fact, I have argued this very point on this newsgroup
>in the recent past...

I hope it didn't sound like I had singled you out for criticism.
(I should have said "seem to suggest.")  I haven't been keeping
up with this list of late, and so wouldn't have seen your earlier
comments.

>> (Remember, too, that function calls on any machine are by no
>> means "slow."  This discussion is splitting hairs between
>> "extremely fast" and "ridiculously fast..."
>
>This is simply not true.  Procedure calls are the slowest 'single operations'
>that high-level languages provide...
>If you claim that procedure calls are fast, let's see your benchmark.

I knew I was gonna get reamed on this one, and you're all going
to groan when you hear the justification I had in mind all along.
My benchmark is from a PDP 11/23, using Ritchie's compiler.  I
measure approximately 60 microseconds per subroutine call, plus
about 4 microseconds per argument pushed.  (This in spite of the
fact that small run-time subroutines are called in the midst of
procedure call and return to build up and tear down the stack
frame.)

60 microseconds may not seem "fast" when compared with more
modern processors, but I assure you that, in real terms, it is a
tiny period of time indeed.  You and I can't do much of anything
in 60 microseconds.  One of the facts inevitably overlooked in
efficiency discussions (even more often than the fact that it
shouldn't be worried about too early) is the fact that computers
are intrinsically very, very fast.  It's their job.  And for most
of the code in most programs, they are quite fast enough.

I know y'all know this, as in fact I know that there are times
when efficiency does matter.  I just feel compelled to scramble
up on this soapbox from time to time to make sure that the other
side gets heard.  The industry is not plagued by software that is
too slow.  It is plagued by software that is behind schedule,
full of bugs, and impossible to maintain.  Too many programmers
seize upon the flimsiest excuses to abandon clean, simple coding
in favor of some purportedly more efficient implementation, which
is why I get defensive whenever efficiency is discussed.

                                            Steve Summit
                                            scs@adam.mit.edu

sullivan@aqdata.uucp (Michael T. Sullivan) (03/22/90)

:From article <1990Mar21.061420.9862@athena.mit.edu>, by scs@athena.mit.edu (Steve Summit):
> 
> I know y'all know this, as in fact I know that there are times
> when efficiency does matter.  I just feel compelled to scramble
> up on this soapbox from time to time to make sure that the other
> side gets heard.  The industry is not plagued by software that is
> too slow.  It is plagued by software that is behind schedule,
> full of bugs, and impossible to maintain.  Too many programmers
> seize upon the flimsiest excuses to abandon clean, simple coding
> in favor of some purportedly more efficient implementation, which
> is why I get defensive whenever efficiency is discussed.

Agreed.  I often see programmers who spend a lot of time trying to
speed up code that isn't causing a slowdown, and in fact there isn't
a slowdown in the first place.  Heck, I know people who don't want
to use shell scripts becasue they are slower than C.  But for what
shell scripts are used for, speed doesn't matter.  Same with a lot
of programs.  Most programs don't need to be sped up at all.  If
one does, THEN start micro-optimization but don't trash a program's
modularity and portability because of an imagined slowdown.
-- 
Michael Sullivan          uunet!jarthur!aqdata!sullivan
aQdata, Inc.              sullivan@aqdata.uucp
San Dimas, CA             +1 714 599 9992

jamiller@hpcupt1.HP.COM (Jim Miller) (03/22/90)

/ hpcupt1:comp.lang.c / jlg@lambda.UUCP (Jim Giles) /  3:07 pm  Mar 20, 1990 /
From article <29551@amdcad.AMD.COM>, by tim@nucleus.amd.com (Tim Olson):
> [...]
> It might be true that scientific routines written in FORTRAN may have
> this many live, non-overlapping variables to keep in registers, but I
> don't believe this is true in general.  Statistics from a large
> collection of programs and library routines (a mix of general and
> scientific applications written in C) show that of 782 functions (620
> of which were non-leaf functions), an average of 6.5 registers per
> function were live across function calls.

This statistic can only be interpreted in one way: the C compiler in
question didn't allocate registers very well.  Especially in scientific
packages, there are _HUGE_ numbers of 'live' _VALUES_ to deal with during
execution of even simple routines.  Vectors, arrays, lists, strings, etc,
are alle being either produced or consumed.  The fact that none of these
_VALUES_ were in registers at the time of the call indicates one of two
things: 1) the code in question was fragmented to the point that most
procedures had only a few data items (and scalar at that), or 2) the
compiler simply wasn't using the registers to anywhere near their potential.
Since you imply the code was well written, I reject the first explanation.
That leaves the compiler.

My experience (I don't have statistics) with both Fortran and C is that
good compilers generally PACK the registers with as much live data as
possible.  Even an apparently pure scalar loop that does only 'simple'
operations may be 'unrolled' a few times to make better use of the
registers.  Compilers are becoming available that apply that optimization
automatically (so this isn't just a case which applies only to 'coder
enhanced' code).  If such a loop (and this is still the simple kind
mind you) had a procedure call imbedded in it, all the registers that
the procedure might use would have to be spilled to memory - and then
reloaded on return.  On the Cray, spilling and reloading just _ONE_
vector register is over 150 clocks (64 elements at one clock each
to and from memory plus time for the memory pipeline plus a little
overhead to set up stride, address, etc.).  This is _not_ a tiny
problem.

Again I say, this scheme of having 'preserved' vs. 'temp' registers
for procedure calls only _appears_ to save time.  In truth, it deprives
you of registers which could be put to use (at least if your compiler
was clever enough).  The only solution to the problem is to 'inline'
(or, at least, schedule registers globally).  At present, in most
environments, the only way to do this is by manually 'inlining' the
procedures.  Let's hope that more automatic solutions become generally
available in the next few years.

By the way, aside from a problem with aliasing with C, Fortran and C are
identical with respect to optimization, register allocation, etc..
So, your implied put-down of Fortran is not relevant in this context.

J. Giles
----------

jamiller@hpcupt1.HP.COM (Jim Miller) (03/22/90)

OOPS, sorry about the last note.
I didn't mean to send a duplicate of the message I was remarking
on.  Here is what I thought I was sayin.
    jim - is my face red - miller

>> .......................  Statistics from a large
>> collection of programs and library routines (a mix of general and
>> scientific applications written in C) show that of 782 functions (620
>> of which were non-leaf functions), an average of 6.5 registers per
>> function were live across function calls.
>
>This statistic can only be interpreted in one way: the C compiler in
>question didn't allocate registers very well.  Especially in scientific
>packages, there are _HUGE_ numbers of 'live' _VALUES_ to deal with during
>execution of even simple routines.  Vectors, arrays, lists, strings, etc,
>are alle being either produced or consumed.  The fact that none of these
>_VALUES_ were in registers at the time of the call indicates one of two
>things: 1) the code in question was fragmented to the point that most
>procedures had only a few data items (and scalar at that), or 2) the
>compiler simply wasn't using the registers to anywhere near their potential.
>Since you imply the code was well written, I reject the first explanation.
>That leaves the compiler.

"This statistic can only be interpreted in one way: ", oh really?
"a mix of ..." and the reply is "Especially in scientific packages ..."
No conflict, just fruit salad.  

The dividing up the registers into mine and yours is not necessarily
as bad as you think, there are lots of registers that get set for
temporary values which do not need to be preserved across procedure
calls.  If the compiler is clever, guess which registers get used for
the temporary values?  

another complicating part of the problem is that with a fast cache, saving
in memory is not as much of a penalty as one might think.  One
must be careful about cache lines, working sets, etc.

Part of this discussion may be impacted by what machine one is using.
Cray (bunch), PDP-8(1), HP-PA(32), etc.  If one must save 512 registers,
often calling a subroutine to do some simple function is a real time waster
(factor of >2 in one application on a Cray).  However, with fewer
registers to dump (no matter how you manage it), the coder/compiler/OS
can and will use more small routines.

So, are we really talking about fruit salad (cray v.s. one part of the rest
of the world v.s. another part v.s. ... )?  

    jim - it's my opinion so it's the truth - miller

peter@ficc.uu.net (Peter da Silva) (03/22/90)

> Agreed.  I often see programmers who spend a lot of time trying to
> speed up code that isn't causing a slowdown, and in fact there isn't
> a slowdown in the first place.  Heck, I know people who don't want
> to use shell scripts becasue they are slower than C...

Look at C News. It uses shell scripts extensively, but where it counts
it's orders of magnitude faster than B News.

Code smarter, not just faster.
-- 
 _--_|\  `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>.
/      \  'U`
\_.--._/
      v

jensting@skinfaxe.diku.dk (Jens Tingleff) (03/22/90)

sullivan@aqdata.uucp (Michael T. Sullivan) writes:

>:From article <1990Mar21.061420.9862@athena.mit.edu>, by scs@athena.mit.edu (Steve Summit):
>> 
>> I know y'all know this, as in fact I know that there are times
[....]

>Agreed.  I often see programmers who spend a lot of time trying to
>speed up code that isn't causing a slowdown, and in fact there isn't
[..]

Quite right. I always repeat to myself, whenever I'm urged to do an
unmotivated source code ``optimisation'', 

	MAKE THE THING WORK, THEN MAKE IT FAST.

I also recall some advice from a CS textbook

	1. rule on improvements in the form of source code optimisation :
		DON'T

very good rule that.

	Jens

jensting@diku.dk is
Jens Tingleff MSc EE, Research Assistent at DIKU
	Institute of Computer Science, Copenhagen University
Snail mail: DIKU Universitetsparken 1 DK2100 KBH O

jlg@lambda.UUCP (Jim Giles) (03/23/90)

From article <5940024@hpcupt1.HP.COM>, by jamiller@hpcupt1.HP.COM (Jim Miller):
> [...]
> The dividing up the registers into mine and yours is not necessarily
> as bad as you think, there are lots of registers that get set for
> temporary values which do not need to be preserved across procedure
> calls.  If the compiler is clever, guess which registers get used for
> the temporary values?  

Well, the first N temporary values anyway.  Meanwhile, the caller had
to save all the values that _it_ had stored in those registers before
it could call the procedure.  I have yet to come across a high-level
language which made good use of the registers AND had a fast procedure
call mechanism.

By the way, in general, it's better for a compiler to make good use
of the registers and then pay high procedure-call overhead: the user then
has some degree of control (he CAN 'inline' any critical procedures).
If the compiler makes bad use of the register set, there is nothing the
user can do about it.

But, one way or another, you pay for procedure calls in implementations
which don't have extensive interprocedural analysis built-in.

J. Giles

ruud@targon.UUCP (Ruud Harmsen) (03/23/90)

In article <1990Mar22.072712.10902@diku.dk> jensting@skinfaxe.diku.dk (Jens Tingleff) writes:
>
>	MAKE THE THING WORK, THEN MAKE IT FAST.
>
True.  But only with one very important footnote:

While making the thing just work, do consider performance, and know where and
how you might want to optimise later.
If in the design of a programme or system you totally disregard all
performance issues, you might have to rebuild large parts from scratch,
and/or make the whole thing totally incomprehensible when optimisizing.

I've seen things like that happen, and I can tell you it's a tragic sight
to watch.  A much better approach is to have the spots for optimization
marked and planned, so it is easy to fulfill, will be really effective
and does not violate the original design.

mph@lion.inmos.co.uk (Mike Harrison) (03/23/90)

In article <1990Mar22.072712.10902@diku.dk> jensting@skinfaxe.diku.dk (Jens Tingleff) writes:
>sullivan@aqdata.uucp (Michael T. Sullivan) writes:
> ...
>Quite right. I always repeat to myself, whenever I'm urged to do an
>unmotivated source code ``optimisation'', 
>
>	MAKE THE THING WORK, THEN MAKE IT FAST.
>
The formulation of this rule that I always use is:

"It is easier to make a correct, slow program run fast than to make a fast,
wrong program run right".


On the subject of rules, I once saw this one in a book:

  "The first law of system design : 

       Never check for an error condition that you don't know how to handle!"

This sounds like the fruit of bitter experience.

Mike,


Michael P. Harrison - Software Group - Inmos Ltd. UK.
-----------------------------------------------------------
UK : mph@inmos.co.uk             with STANDARD_DISCLAIMERS;
US : mph@inmos.com               use  STANDARD_DISCLAIMERS;

cik@l.cc.purdue.edu (Herman Rubin) (03/24/90)

In article <1084@targon.UUCP>, ruud@targon.UUCP (Ruud Harmsen) writes:
> In article <1990Mar22.072712.10902@diku.dk> jensting@skinfaxe.diku.dk (Jens Tingleff) writes:
> >
> >	MAKE THE THING WORK, THEN MAKE IT FAST.
> >
> True.  But only with one very important footnote:
> 
> While making the thing just work, do consider performance, and know where and
> how you might want to optimise later.
> If in the design of a programme or system you totally disregard all
> performance issues, you might have to rebuild large parts from scratch,
> and/or make the whole thing totally incomprehensible when optimisizing.
> 
> I've seen things like that happen, and I can tell you it's a tragic sight
> to watch.  A much better approach is to have the spots for optimization
> marked and planned, so it is easy to fulfill, will be really effective
> and does not violate the original design.

There are far too many algorithms which work, but cannot be speeded up
appreciably, while other algorithms are very much faster.  The choice of
the algorithm depends on what can be done in the hardware.

A good example is the frexp() function in 4.2BSD.  This was (ugh) coded in C
for the implementations I have seen in a straightforward manner.  By using
additional resources, it could have been speeded up in the (unfortunately
too common) bad cases, but only by using additional resources and running
slower in the better cases.  It worked, but could not be made fast.

However, a semi-portable machine-dependent algorithm, deliberately ignoring
typing, could be written which is in the ballpark of machine language 
algorithms.  Starting with these, one could make it fast.  But unless
the operations of pack and unpack are in the language, this cannot be
done otherwise.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

henry@utzoo.uucp (Henry Spencer) (03/25/90)

In article <1084@targon.UUCP> ruud@targon.UUCP (Ruud Harmsen) writes:
>If in the design of a programme or system you totally disregard all
>performance issues, you might have to rebuild large parts from scratch,
>and/or make the whole thing totally incomprehensible when optimisizing.

If you are seriously bent on high performance, however, you *will* have
to rebuild significant parts from scratch, possibly several times.  Humans
simply cannot predict bottlenecks well enough to provide for them ahead
of time, and one gets much more mileage out of rethinking and reimplementing
code than out of tweaking it.  There are parts of C News that have been
reimplemented five or six times.
-- 
"Apparently IBM is not IBM    |     Henry Spencer at U of Toronto Zoology
compatible." -Andy Tanenbaum  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

tneff@bfmny0.UU.NET (Tom Neff) (03/26/90)

I.	Code it portably and soundly.

II.	Identify the places where the CPU wastes most of its time.

III.	Replace those sections, CONDITIONALLY if possible, with less
	portable, faster code.

Every good programmer KNOWS this works, and saves time in the long run.
That's precisely why some programmers HATE it, and waste words railing
against it.  It's like eating your vegetables.  Programming, as a field,
does not venerate vegetable eaters. :-)

The truth is that the squeaky wheel gets the grease, -or- no news is good
news.  The programs out there that are written RIGHT and optimized RIGHT
are the ones you never hear about, because they don't make news, they
just WORK.
-- 
Annex Canada now!  We need the room,    \)      Tom Neff
    and who's going to stop us.         (\      tneff@bfmny0.UU.NET

ge@kunivv1.sci.kun.nl (Ge' Weijers) (03/28/90)

ruud@targon.UUCP (Ruud Harmsen) writes:

>In article <1990Mar22.072712.10902@diku.dk> jensting@skinfaxe.diku.dk (Jens Tingleff) writes:
>>
>>	MAKE THE THING WORK, THEN MAKE IT FAST.
>>
>True.  But only with one very important footnote:

>While making the thing just work, do consider performance, and know where and
>how you might want to optimise later.
>If in the design of a programme or system you totally disregard all
>performance issues, you might have to rebuild large parts from scratch,
>and/or make the whole thing totally incomprehensible when optimisizing.

>I've seen things like that happen, and I can tell you it's a tragic sight
>to watch.  A much better approach is to have the spots for optimization
>marked and planned, so it is easy to fulfill, will be really effective
>and does not violate the original design.

I have to second that. You have to look at the feasibility of what you are
doing. If your solution of a problem is O(e ** N) for large N, look for a
better solution.

You still need to do some rough calculations.
Ge' Weijers                                    Internet/UUCP: ge@cs.kun.nl
Faculty of Mathematics and Computer Science,   (uunet.uu.net!cs.kun.nl!ge)
University of Nijmegen, Toernooiveld 1         
6525 ED Nijmegen, the Netherlands              tel. +3180612483 (UTC-2)

EGNILGES@pucc.Princeton.EDU (Ed Nilges) (03/29/90)

In article <15286@bfmny0.UU.NET>, tneff@bfmny0.UU.NET (Tom Neff) writes:

>I.      Code it portably and soundly.
>
>II.     Identify the places where the CPU wastes most of its time.
>
>III.    Replace those sections, CONDITIONALLY if possible, with less
>        portable, faster code.
>
It needed to be said: thank you for saying it.  Niklaus Wirth sounded
a note of sanity at the 1987 ASPLOS conference when he appealed to
an unsympathetic audience of microefficiency-obsessed RISC designers
for reliability and APPROPRIATE efficiency.  Obsession with "efficiency"
at the cost of reliability is not limited to hardware design (indeed,
hardware designers seem to have a greater loyalty to basic reliability
than software designers).

If efficiency is important and reliability is not, here is the greatest
program in the world.  It looks trivial, but I tested it and it seems
to work.  It's even got a comment to tell what it is.

     /***** The greatest program in the world *****/
     main()
     {
     return;
     }

Although the program does not appear to meet the specifications of the
project that a given programmer may be working on at present, if re-
liability truly does not matter then the above program is the allpurpose
panacea (for somewhat the same reason that n*0 is 0 for any n).
It's also guaranteed not to upset the user by having features
he doesn't like.  It's the worlds first Taoist program, for "it is
in non-action that we discover the mysterious beginnings of the
universe".

That was a joke ((-8), but long ago Gerald Weinberg told a story
(in his THE PSYCHOLOGY OF COMPUTER PROGRAMMING, Van Nostrand Reinhold
1971) of a young programmer who had a slightly "inefficient" table-
driven solution to a problem in automotive manufacture: determining
if a given combination of options was feasible.  When challenged on
the "inefficiency" of his solution, the programmer replied with a
variant of the above program.  The "efficient", non-table-driven
program (which was rife with bugs) didn't work.  Therefore, correct-
ness was not a necessity and the above "do-nothing", Taoist program
was even better...because it is more efficient!

>Every good programmer KNOWS this works, and saves time in the long run.
>That's precisely why some programmers HATE it, and waste words railing
>against it.  It's like eating your vegetables.  Programming, as a field,
>does not venerate vegetable eaters. :-)

It is gradually becoming worse in this regard as it is "rationalized":
as we get more and more MBAs and MSCSs and fewer good programmers.
(Note: yes, you CAN be a good programmer and have an MSCS, or even
an MBA...so turn those flamethrowers off.)

Soon we'll be as "efficient" as the American auto industry, where
a simple procedure for avoiding rust (E-coat) was rejected by the
Big Three for years until finally Japanese competition forced their
hand.  E-coat wasn't an "efficient" way of building a car since it
involved one extra step: it was more "efficient" to make the driver
(who was, twenty years ago, at the mercy of Detroit) spend hundreds
of dollars at Ziebart or Rusty Jones (in dark moments, this reminds
me of the RISC philosophy: make the compiler writer do all the work).
>
>The truth is that the squeaky wheel gets the grease, -or- no news is good
>news.  The programs out there that are written RIGHT and optimized RIGHT
>are the ones you never hear about, because they don't make news, they
>just WORK.

...and the poor programmers get "consulting" assignments, at 40 bucks
an hour, fixing bugs they coded at their former jobs.  It's a
wicked world.