scs@athena.mit.edu (Steve Summit) (03/18/90)
It is unfortunate that perpetual concerns about microoptimization tend to suggest that function call overhead is unacceptable and should be avoided. For most programs, readability and maintainability are far more important than efficiency, and good modularity (accompanied by lots of function calls) is of course an important property of a well-written, readable and maintainable program. I hope that the abstruse and low-level function call details being discussed here won't discourage people from moving code to subroutines whenever it seems appropriate. Worrying about register allocation and cycle shaving should be of concern only to compiler writers. (Remember, too, that function calls on any machine are by no means "slow." This discussion is splitting hairs between "extremely fast" and "ridiculously fast;" the concerns about having to save state in a stack frame merely indicate that it is nontrivial to make things ridiculously fast.) In article <14273@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes: >It seems to me that if a procedure is so small that it can only find >use for 24 (or fewer) registers, then it is small enough to be a real >good candidate for inlining... However, most procedures >of any size can benefit from quite a few more registers than 24. I'd dispute this, and I'm not even religious about keeping all of my subroutines smaller than a page or anything. Few of my routines even have 24 variables, let alone 24 variables that have good reason to be put in registers. (On many machines, it's better to leave a variable that's only used once or twice out of a register, because it would take as long to save and restore the previous register contents -- on the stack -- as it would to access the variable "conventionally" -- on the stack.) I understand that a good compiler might find use for additional registers to store intermediate values not explicitly named by the coder, but still, 24 seems like an awful lot. Good code, particularly object-oriented code, has lots and lots of small routines, but they don't all need inlining. (A medium-sized -- i.e. small but not tiny -- routine that's called a lot shouldn't necessarily be inlined. Code size still matters.) Steve Summit scs@adam.mit.edu
jlg@lambda.UUCP (Jim Giles) (03/20/90)
From article <1990Mar17.190858.13930@athena.mit.edu>, by scs@athena.mit.edu (Steve Summit): > It is unfortunate that perpetual concerns about microoptimization > tend to suggest that function call overhead is unacceptable and > should be avoided. For most programs, readability and > maintainability are far more important than efficiency, and good > modularity (accompanied by lots of function calls) is of course > an important property of a well-written, readable and > maintainable program. I quite agree. No one in this thread of discussion has suggested otherwise, certainly not me. In fact, I have argued this very point on this newsgroup in the recent past (against someone who thought the programmer should do his own common expression eliminations, etc.). If your most critical code issue is readibility, maintainability, or just_get_it_out_the_door_before_ the_deadline, then concentration on optimization issues is not likely to pay off. HOWEVER, there _ARE_ some contexts where speed is everything. Procedure call overhead is one area where the compiler/environment cannot yet be expected to do the optimizations for you (in fact, the compiler _CAN'T_ do it if separate compilation is allowed). > [...] > (Remember, too, that function calls on any machine are by no > means "slow." This discussion is splitting hairs between > "extremely fast" and "ridiculously fast;" the concerns about > having to save state in a stack frame merely indicate that it is > nontrivial to make things ridiculously fast.) This is simply not true. Procedure calls are the slowest 'single operations' that high-level languages provide. On the Cray, for example, call/return overhead varies from a low of several dozen to a high of a few hundred clocks! In fact, for 'baselevel' or 'leaf' procedures, the overhead often exceeds the the time spent in the procedure by multiple factors. If you claim that procedure calls are fast, let's see your benchmark. Note: method for benchmarking call/return overhead. Take a production code. Fetch the real-time clock before and after a 'leaf' procedure call. 'Inline' the procedure and time the 'inlined' code. compare the times. No fair using 'dummy' drivers to make the call - you need an active 'caller' which has 'live' values to be saved/restored. > In article <14273@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes: >> [... 24 registers is a small ceiling ... ] > > I'd dispute this, and I'm not even religious about keeping all of > my subroutines smaller than a page or anything. Few of my > routines even have 24 variables, let alone 24 variables that have > good reason to be put in registers. [...] The smallest routine I've ever written (this was for a class, years ago) was a matrix multiply on fixed-sized matrices (they were always guaranteed to be 3 by 3). Say the array arguments are A, B, and C - I want to compute the value of A=B*C. Each of the 18 elements of B and C are used 3 times during the calculation - each of the 9 elements in A are summed to 3 times. For efficiency, you want to load each element of the arguments only once. Similarly, you want to store each element of A only once. The rest of the time, you want the data in registers. If speed were paramount, I would unroll the loops of the matrix multiply and do the whole thing explicitly. I might be able to find an order in which to do the calculation which only needed 24 registers, but the the obvious method would be to put each of the array elements in a separate register - that's 27 'live' values! Note that the above was the _smallest_ procedure I've written! Normally I would not even consider doing matrix multiply as a procedure call (the idiom of three nested loops, etc., is simple to read and widely recognized and understood). In a language like C, this problem would be a good candidate for a MACRO. To be sure, there are certainly procedures which don't need as many as 24 registers. But, if speed were paramouint, I suspect that nearly all of these would be immediate candidates for 'inlining'. The only exceptions to this would be assembly routines which used hardware/instructions which were not accessible from the high-level language. > [...] Good code, > particularly object-oriented code, has lots and lots of small > routines, but they don't all need inlining. I disagree. Good code is neither monolithic nor fragmented. There is a compromise between these two tendencies which is better than either. A lot of the OOP that I've seen leans way too far toward fragmentation. I've seen a matrix multiply method implemented as a sequence of calls to a vector dot product method, which in turn does a vector elemental multiply method and a vector summation method. The problem is that matrix multiply is a _single_ operation - it's actually _easier_ to read and maintain if it's all done in one place. > [...] (A medium-sized -- > i.e. small but not tiny -- routine that's called a lot shouldn't > necessarily be inlined. Code size still matters.) I agree, with a proviso: code should _always_ be 'inlined' if the user explicitly asks for it. Otherwise, the rules for automatic 'inlining' should be: 1) 'Inline' if the procedure/method is only called once in the entire call tree. 2) 'Inline' if the procedure/method body is smaller than some multiple of the call/return overhead code (including register save/restores). Otherwise, don't 'inline'. Of course, this requires a language environment which allows 'inlining' at all. I am hopeful that this ability may soon (within this decade) be commonplace. But, who knows? J. Giles
amull@Morgan.COM (Andrew P. Mullhaupt) (03/20/90)
The discussion so far about C's excess function call overhead hasn't really differentiated between the two sides of the efficiency concern. There are two distinct points to be made, and the come from different points of view. Most programmers might say that they are not overly concerned with the efficiency of a single function call. In fact, they would be happier if they could be completely unconcerned with the issue. These are the people (myself totally included) who want to use functions to properly structure their code. A small minority, (perhaps they have to write high pressure versions of algorithms which are rich in complex recursive data structures, or they implement 2-D random choice methods, or Multigrid methods, or some other important example of a function call overhead rich application), might have to admit that they have been forced to write their source code around reducing function call overhead. The distance between these two points of view might be shorter than it seems at first. Both groups are concerned with function call overhead, but the first group are adequately served by their compilers, and the second group have to strap on their big boots and do it on their own. The first group are _vicariously_ concerned with function call overhead. We want the compiler to apply every exigency and subterfuge to arrive at the last word in object code, no matter how much readability liberty we have taken in the source. (So we have no need to worry about this issue when coding.) C compilers have to treat this issue seriously, since C often uses a slower calling convention than Pascal or FORTRAN, in order that C can support functions with a nonconstant number (and/or type) of arguments. Many C compilers are specially equipped to generate good code for function calls despite this possible handicap, and it is entirely to their credit. But this is no reason to ignore the point. The way that I find myself normally thinking about function call overhead is in case I wonder if I've somehow put my compiler off the scent unnecessarily, or if I have to consider this possibility while troubleshooting something else that isn't performing up to scratch. It is at these times when members of the first group of programmers ("let the compiler take care of it") have to think like the second group ("leggo my microsecond"). Later, Andrew Mullhaupt
scs@athena.mit.edu (Steve Summit) (03/21/90)
In article <14278@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes: >From article <1990Mar17.190858.13930@athena.mit.edu>, by scs@athena.mit.edu (Steve Summit): >> It is unfortunate that perpetual concerns about microoptimization >> tend to suggest that function call overhead is unacceptable and >> should be avoided. > >I quite agree. No one in this thread of discussion has suggested otherwise, >certainly not me. In fact, I have argued this very point on this newsgroup >in the recent past... I hope it didn't sound like I had singled you out for criticism. (I should have said "seem to suggest.") I haven't been keeping up with this list of late, and so wouldn't have seen your earlier comments. >> (Remember, too, that function calls on any machine are by no >> means "slow." This discussion is splitting hairs between >> "extremely fast" and "ridiculously fast..." > >This is simply not true. Procedure calls are the slowest 'single operations' >that high-level languages provide... >If you claim that procedure calls are fast, let's see your benchmark. I knew I was gonna get reamed on this one, and you're all going to groan when you hear the justification I had in mind all along. My benchmark is from a PDP 11/23, using Ritchie's compiler. I measure approximately 60 microseconds per subroutine call, plus about 4 microseconds per argument pushed. (This in spite of the fact that small run-time subroutines are called in the midst of procedure call and return to build up and tear down the stack frame.) 60 microseconds may not seem "fast" when compared with more modern processors, but I assure you that, in real terms, it is a tiny period of time indeed. You and I can't do much of anything in 60 microseconds. One of the facts inevitably overlooked in efficiency discussions (even more often than the fact that it shouldn't be worried about too early) is the fact that computers are intrinsically very, very fast. It's their job. And for most of the code in most programs, they are quite fast enough. I know y'all know this, as in fact I know that there are times when efficiency does matter. I just feel compelled to scramble up on this soapbox from time to time to make sure that the other side gets heard. The industry is not plagued by software that is too slow. It is plagued by software that is behind schedule, full of bugs, and impossible to maintain. Too many programmers seize upon the flimsiest excuses to abandon clean, simple coding in favor of some purportedly more efficient implementation, which is why I get defensive whenever efficiency is discussed. Steve Summit scs@adam.mit.edu
sullivan@aqdata.uucp (Michael T. Sullivan) (03/22/90)
:From article <1990Mar21.061420.9862@athena.mit.edu>, by scs@athena.mit.edu (Steve Summit): > > I know y'all know this, as in fact I know that there are times > when efficiency does matter. I just feel compelled to scramble > up on this soapbox from time to time to make sure that the other > side gets heard. The industry is not plagued by software that is > too slow. It is plagued by software that is behind schedule, > full of bugs, and impossible to maintain. Too many programmers > seize upon the flimsiest excuses to abandon clean, simple coding > in favor of some purportedly more efficient implementation, which > is why I get defensive whenever efficiency is discussed. Agreed. I often see programmers who spend a lot of time trying to speed up code that isn't causing a slowdown, and in fact there isn't a slowdown in the first place. Heck, I know people who don't want to use shell scripts becasue they are slower than C. But for what shell scripts are used for, speed doesn't matter. Same with a lot of programs. Most programs don't need to be sped up at all. If one does, THEN start micro-optimization but don't trash a program's modularity and portability because of an imagined slowdown. -- Michael Sullivan uunet!jarthur!aqdata!sullivan aQdata, Inc. sullivan@aqdata.uucp San Dimas, CA +1 714 599 9992
peter@ficc.uu.net (Peter da Silva) (03/22/90)
> Agreed. I often see programmers who spend a lot of time trying to > speed up code that isn't causing a slowdown, and in fact there isn't > a slowdown in the first place. Heck, I know people who don't want > to use shell scripts becasue they are slower than C... Look at C News. It uses shell scripts extensively, but where it counts it's orders of magnitude faster than B News. Code smarter, not just faster. -- _--_|\ `-_-' Peter da Silva. +1 713 274 5180. <peter@ficc.uu.net>. / \ 'U` \_.--._/ v
jensting@skinfaxe.diku.dk (Jens Tingleff) (03/22/90)
sullivan@aqdata.uucp (Michael T. Sullivan) writes: >:From article <1990Mar21.061420.9862@athena.mit.edu>, by scs@athena.mit.edu (Steve Summit): >> >> I know y'all know this, as in fact I know that there are times [....] >Agreed. I often see programmers who spend a lot of time trying to >speed up code that isn't causing a slowdown, and in fact there isn't [..] Quite right. I always repeat to myself, whenever I'm urged to do an unmotivated source code ``optimisation'', MAKE THE THING WORK, THEN MAKE IT FAST. I also recall some advice from a CS textbook 1. rule on improvements in the form of source code optimisation : DON'T very good rule that. Jens jensting@diku.dk is Jens Tingleff MSc EE, Research Assistent at DIKU Institute of Computer Science, Copenhagen University Snail mail: DIKU Universitetsparken 1 DK2100 KBH O
ruud@targon.UUCP (Ruud Harmsen) (03/23/90)
In article <1990Mar22.072712.10902@diku.dk> jensting@skinfaxe.diku.dk (Jens Tingleff) writes: > > MAKE THE THING WORK, THEN MAKE IT FAST. > True. But only with one very important footnote: While making the thing just work, do consider performance, and know where and how you might want to optimise later. If in the design of a programme or system you totally disregard all performance issues, you might have to rebuild large parts from scratch, and/or make the whole thing totally incomprehensible when optimisizing. I've seen things like that happen, and I can tell you it's a tragic sight to watch. A much better approach is to have the spots for optimization marked and planned, so it is easy to fulfill, will be really effective and does not violate the original design.
mph@lion.inmos.co.uk (Mike Harrison) (03/23/90)
In article <1990Mar22.072712.10902@diku.dk> jensting@skinfaxe.diku.dk (Jens Tingleff) writes: >sullivan@aqdata.uucp (Michael T. Sullivan) writes: > ... >Quite right. I always repeat to myself, whenever I'm urged to do an >unmotivated source code ``optimisation'', > > MAKE THE THING WORK, THEN MAKE IT FAST. > The formulation of this rule that I always use is: "It is easier to make a correct, slow program run fast than to make a fast, wrong program run right". On the subject of rules, I once saw this one in a book: "The first law of system design : Never check for an error condition that you don't know how to handle!" This sounds like the fruit of bitter experience. Mike, Michael P. Harrison - Software Group - Inmos Ltd. UK. ----------------------------------------------------------- UK : mph@inmos.co.uk with STANDARD_DISCLAIMERS; US : mph@inmos.com use STANDARD_DISCLAIMERS;
cik@l.cc.purdue.edu (Herman Rubin) (03/24/90)
In article <1084@targon.UUCP>, ruud@targon.UUCP (Ruud Harmsen) writes: > In article <1990Mar22.072712.10902@diku.dk> jensting@skinfaxe.diku.dk (Jens Tingleff) writes: > > > > MAKE THE THING WORK, THEN MAKE IT FAST. > > > True. But only with one very important footnote: > > While making the thing just work, do consider performance, and know where and > how you might want to optimise later. > If in the design of a programme or system you totally disregard all > performance issues, you might have to rebuild large parts from scratch, > and/or make the whole thing totally incomprehensible when optimisizing. > > I've seen things like that happen, and I can tell you it's a tragic sight > to watch. A much better approach is to have the spots for optimization > marked and planned, so it is easy to fulfill, will be really effective > and does not violate the original design. There are far too many algorithms which work, but cannot be speeded up appreciably, while other algorithms are very much faster. The choice of the algorithm depends on what can be done in the hardware. A good example is the frexp() function in 4.2BSD. This was (ugh) coded in C for the implementations I have seen in a straightforward manner. By using additional resources, it could have been speeded up in the (unfortunately too common) bad cases, but only by using additional resources and running slower in the better cases. It worked, but could not be made fast. However, a semi-portable machine-dependent algorithm, deliberately ignoring typing, could be written which is in the ballpark of machine language algorithms. Starting with these, one could make it fast. But unless the operations of pack and unpack are in the language, this cannot be done otherwise. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
henry@utzoo.uucp (Henry Spencer) (03/25/90)
In article <1084@targon.UUCP> ruud@targon.UUCP (Ruud Harmsen) writes: >If in the design of a programme or system you totally disregard all >performance issues, you might have to rebuild large parts from scratch, >and/or make the whole thing totally incomprehensible when optimisizing. If you are seriously bent on high performance, however, you *will* have to rebuild significant parts from scratch, possibly several times. Humans simply cannot predict bottlenecks well enough to provide for them ahead of time, and one gets much more mileage out of rethinking and reimplementing code than out of tweaking it. There are parts of C News that have been reimplemented five or six times. -- "Apparently IBM is not IBM | Henry Spencer at U of Toronto Zoology compatible." -Andy Tanenbaum | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
tneff@bfmny0.UU.NET (Tom Neff) (03/26/90)
I. Code it portably and soundly. II. Identify the places where the CPU wastes most of its time. III. Replace those sections, CONDITIONALLY if possible, with less portable, faster code. Every good programmer KNOWS this works, and saves time in the long run. That's precisely why some programmers HATE it, and waste words railing against it. It's like eating your vegetables. Programming, as a field, does not venerate vegetable eaters. :-) The truth is that the squeaky wheel gets the grease, -or- no news is good news. The programs out there that are written RIGHT and optimized RIGHT are the ones you never hear about, because they don't make news, they just WORK. -- Annex Canada now! We need the room, \) Tom Neff and who's going to stop us. (\ tneff@bfmny0.UU.NET
ge@kunivv1.sci.kun.nl (Ge' Weijers) (03/28/90)
ruud@targon.UUCP (Ruud Harmsen) writes: >In article <1990Mar22.072712.10902@diku.dk> jensting@skinfaxe.diku.dk (Jens Tingleff) writes: >> >> MAKE THE THING WORK, THEN MAKE IT FAST. >> >True. But only with one very important footnote: >While making the thing just work, do consider performance, and know where and >how you might want to optimise later. >If in the design of a programme or system you totally disregard all >performance issues, you might have to rebuild large parts from scratch, >and/or make the whole thing totally incomprehensible when optimisizing. >I've seen things like that happen, and I can tell you it's a tragic sight >to watch. A much better approach is to have the spots for optimization >marked and planned, so it is easy to fulfill, will be really effective >and does not violate the original design. I have to second that. You have to look at the feasibility of what you are doing. If your solution of a problem is O(e ** N) for large N, look for a better solution. You still need to do some rough calculations. Ge' Weijers Internet/UUCP: ge@cs.kun.nl Faculty of Mathematics and Computer Science, (uunet.uu.net!cs.kun.nl!ge) University of Nijmegen, Toernooiveld 1 6525 ED Nijmegen, the Netherlands tel. +3180612483 (UTC-2)