martin@mwtech.UUCP (Martin Weitzel) (11/20/90)
In general, my advice is to use no more than two register variables, and only in the *outmost* blocklevel in the body of any function. Why? Listen: If you have a modern, optimizing compiler, it will ignore the register keyword and make independent decissions about use of CPU-registers. If you have modern hardware with many available registers, most probably you also have a modern compiler, so there's no reason to worry about having defined "not enough" register variables. So we can assume that register variables are mostly for software which may eventually be ported to some unknown ancient compiler that produces code for some unknown ancient hardware with very few registers. In this situation, it may in fact hurt performance if you specify too many register variables, because the compiler may put the wrong ones into the available registers. Even if you trust in the compiler implementing correctly what K&R-I mentions, that the register storage class is obbeyed in the same order as variables are defined, are you sure how the unknown ancient compiler will interpret the following example? foo() { register int a; .... /* some code using a */ { register int b; .... /* some code using b */ } .... /* some more code using a */ { register int c; .... /* some code using c */ } .... /* some more code using a */ } Take some ancient CPU with two registers available for local int-s. The order in which the variables are declared is a-b-c, so c will not profit from its storage class. Or rather, will the compiler generate some code to safe one register on each entry to the block defining c? Will it eventually even do so for the block defining b? (Furthermore, don't forget that it may require more instructions to call another function if either the called or the calling funktion has register variables, because the used CPU-registers must be saved%.) Of course, if you know your particular compiler/CPU-combination well and if you accept that your performance-gain may well be a performance-loss in case the program is ported to anywhere else, you may carefully investigate which variables to put into registers to achieve the best performance. ======================== %:I think there is more than one approach of delegating the responsibility for saving registers, so you can not tell exactly where the overhead will occur. -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83
gwyn@smoke.brl.mil (Doug Gwyn) (11/21/90)
In article <967@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes: >In general, my advice is to use no more than two register variables, and >only in the *outmost* blocklevel in the body of any function. Mine is the opposite. >The order in which the variables are declared is a-b-c, so c will not >profit from its storage class. Sure it will. Since b and c are declared in separate parallel blocks, older-technology compilers such as PCC will share the explicit register that is assigned for these two variables. This is in fact a good way to exploit "register" in such compilers.
henry@zoo.toronto.edu (Henry Spencer) (11/22/90)
In article <967@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes: >So we can assume that register variables are mostly for software which >may eventually be ported to some unknown ancient compiler that produces >code for some unknown ancient hardware with very few registers... Unfortunately, such compilers are by no means unknown, and the hardware in question often has a useful number of registers. For example, if you want a modern compiler from Sun, you have to pay extra, so most people have their old one, in which register variables most assuredly are important. -- "I don't *want* to be normal!" | Henry Spencer at U of Toronto Zoology "Not to worry." | henry@zoo.toronto.edu utzoo!henry
lfd@cbnewsm.att.com (leland.f.derbenwick) (11/22/90)
In article <967@mwtech.UUCP>, martin@mwtech.UUCP (Martin Weitzel) writes: > In general, my advice is to use no more than two register variables, and > only in the *outmost* blocklevel in the body of any function. > > Why? Listen: > > If you have a modern, optimizing compiler, it will ignore the register > keyword and make independent decissions about use of CPU-registers. If > you have modern hardware with many available registers, most probably > you also have a modern compiler, so there's no reason to worry about having > defined "not enough" register variables. Modern hardware. Such as the entire VAX line, Intel 80x86, Motorola 680x0, etc.? None of these have "many" available registers, yet all are certainly modern in the sense that they are currently used and sold in large quantities. And while commonly available compilers for these do vary in their optimization quality, none that I've used has come close to the levels of optimization being applied to RISC architectures. So let's turn the proposed advice around. Unless you _know_ that your code is being written _only_ for a processor with lots of registers and a smart compiler, and that it will never be ported to any of today's common processors, use register declarations generously -- typically from 1 up to 4 or 5 in each function will _help_, not hurt. (Alternatively, the performance of your code may be irrelevant to you. This _does_ occur in practice: in some code I worked on several years ago to run on IBM mainframes, the _only_ relevant optimization was reducing the number of data base accesses -- everything else was so fast by comparison that it was lost in the noise.) Martin Weitzel's advice not to declare inner-block variables as register is good -- some un-smart compilers ignore them; others limit their optimizizations near them; some handle them well. It's a gamble. Using register declarations will _never_ interfere with a smart optimizing compiler. A register declaration is a suggestion, not an absolute: the compiler is perfectly free to ignore it in order to do other optimizations. (It is also a promise: you will never take the address of a register variable.) Even as far back as K&R I, "A register declaration is best thought of as an auto declaration, together with a hint to the compiler that the variables declared will be heavily used." It's a hint that a good compiler can use, but that it will ignore if better optimizations are available. (Anyone who writes a compiler capable of doing optimal register allocation on its own had _better_ make it ignore register declarations!) -- Speaking strictly for myself, -- Lee Derbenwick, AT&T Bell Laboratories, Warren, NJ -- lfd@cbnewsm.ATT.COM or <wherever>!att!cbnewsm!lfd
jfc@Achates.MIT.edu (John F Carr) (11/24/90)
In article <1990Nov21.221908.19871@cbnewsm.att.com> lfd@cbnewsm.att.com (leland.f.derbenwick) writes: >(Anyone who >writes a compiler capable of doing optimal register allocation on its >own had _better_ make it ignore register declarations!) I disagree. Often the programmer knows better than the compiler which variables are most used. Optimizing compilers should eliminate the need for every function to have a few register declarations, but they do not obsolete the "register" keyword. I will agree that optimizing compilers should not take "register" as an order; it should be treated internally as an increment to the estimated number of uses of the variable. -- John Carr (jfc@athena.mit.edu)
martin@mwtech.UUCP (Martin Weitzel) (11/24/90)
In article <14538@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes: >In article <967@mwtech.UUCP> martin@mwtech.UUCP (I) wrote: >>In general, my advice is to use no more than two register variables, and >>only in the *outmost* blocklevel in the body of any function. > >Mine is the opposite. > >>The order in which the variables are declared is a-b-c, so c will not >>profit from its storage class. > >Sure it will. Since b and c are declared in separate parallel blocks, >older-technology compilers such as PCC will share the explicit register >that is assigned for these two variables. This is in fact a good way >to exploit "register" in such compilers. Though it might be not wise to argue with one of the network gods, I beg to differ here. (For those who tuned in late: The question is: If you write code for an unknown compiler, may *too many* register declarations hurt overall performance.) My argument was that *without* some internal knowledge of the inner workings of some (non-optimizing) compiler, it is *not* possible to choose the appropriate places for more than two or maybe three register variables declared in the outmost (function) block. K&R-1, pg. 193 states concerning register: ... only the first few of such declarations are effective ... and on page 81 ... on the PDP-11, only the first three register declarations in a function are effective ... Now let's assume the implementor of an unknown% compiler has read his K&R-1, though it might not be alike K&R's PDP-11 compiler, but James McCosh's for the 6809, which has only *two* free register variables and I determine the five most heavily used variables in my function as `a', `b', `c', `d', `e' (in that order, i.e. `e' is less frequently accessed). Further we may have the following block structure: func() { ...d, e, used here .... for (......) { ...b, d, used here { ... c used here } for (......) { ... a, b used here } ...b, d, used here } ... e, used here again } If I follow the simple rule to place all register declarations outside (at function block level) and to depend not on more than two beeing effective, I can easily verify that the "right" ones are given. If I further trust the compiler that it gets K&R-1 right in only obbeying the declarations that come first, I may even declare all the five variables with sorage class register (in decreasing order of their access frequency), not risking that the most important ones are missed. This would work for the 6809 (2 registers) and the PDP-11 (3 registers). On the other hand, if I declare the variables at the inner blocks (as the required scope allows), it may be possible for PCC-like compilers to share registers between blocks, but it is not possible for me to find the set of variables which should receive the register attribut, without knowing the number of available registers: For McCosh's-6809 compiler (only two registers) I should declare `a', `b', and `c' as register (and hope that the compiler is PCC-ish enough to share the register for `a' and `c' between the blocks). On the PDP-11 I could (and propably should) try to use the third register by also declaring `d' as register, but that would on the 6809 force the most important variable (`a') out of the available registers. ---------------------------- %: Hand-optimizing register declarations for a compiler which I know very well is quite another topic ... -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83
mikes@ingres.com (Mike Schilling) (11/25/90)
From article <1990Nov21.221908.19871@cbnewsm.att.com>, by lfd@cbnewsm.att.com (leland.f.derbenwick): > Modern hardware. Such as the entire VAX line, Intel 80x86, Motorola > 680x0, etc.? None of these have "many" available registers, yet all > are certainly modern in the sense that they are currently used and > sold in large quantities. "Many" is relative, of course. Compared to the 3 free registers a PDP11 had, the 10 a VAX is likely to have looks like a lot. ---------------------------------------------------------------------------- mikes@rtech.com = Mike Schilling, Ask Corporation, Alameda, CA Just machines that make big decisions, Programmed by fellows with compassion and vision. -- Donald Fagen, "IGY"
gwyn@smoke.brl.mil (Doug Gwyn) (11/25/90)
In article <972@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes: > ... only the first few of such declarations are effective ... Clearly what was meant by this is that only the first few OF THOSE CURRENTLY IN SCOPE are effective. Out-of-scope declarations are simply irrelevant. As you note, the programmer has a hard time indicating which variables are most important to registerize, when a variety of compilation environments must be accommodated. What some developers have done is to add a batch of macros in their system-configuration header: /* Prioritized "register" declarations: */ #define REG_0 register #define REG_1 register /* last available for 6809 */ #define REG_2 register /* last available for PDP-11 */ #define REG_3 /* nothing */ #define REG_4 /* nothing */ and in the application use these in a mutually-exclusive manner: func() { REG_2 int i; REG_0 char *p; ... { REG_1 char *q; REG_3 int j; ... } } so that no more "register" storage-class specifiers are seen by the compiler in any scope than are actually effective for the implementation. I don't use this method myself, preferring to use "register" as a hint rather than a requirement, but if you are hyper-concerned about this level of optimization you might want to consider such an approach.
martin@mwtech.UUCP (Martin Weitzel) (11/27/90)
In article <967@mwtech.UUCP>, martin@mwtech.UUCP I gave recommendations for using the `register' attribut. If had known the number of followups on this I had possibly chosen a more careful wording. Most of my original posting should treat the case where you try to make things "right" even in case your software gets ported to some environment you don't know while you write the program. (To some degree this could be compared to the recommendation: "Never assume the bit pattern of the NULL-pointer is all zero", even if you currently work only on machines where this is true.) In short, I tried to warn the reader that everything which exceeds two register declarations in the outmost block of a function *could* result in code that performed worse than with fewer declarations (hence the subject). Thinking a bit more about it I should rather have recommended to make sure that "the two variables for which the register attribute is most important should be declared as the first ones in the outmost block". If we trust the compiler getting right what K&R says wrt to the significance of the order of register declarations, using more register declarations in the outmost block should be OK. (But see also the last paragraph.) Any assumptions made about a maximum number of available registers is allways somehow influenced by the hardware one has in mind. (Maybe I'm still influenced here by the machine which was the first one I used with C, the 6809 :-)). If todays hardware commonly supports five registers, there is no point not using them. (Another followup by Doug Gwyn in this thread shows an elegant way how register usage can be fine-tuned using the preprocessor.) In article <1990Nov21.221908.19871@cbnewsm.att.com> lfd@cbnewsm.att.com (leland.f.derbenwick) answered: >[...] use register declarations generously -- typically >from 1 up to 4 or 5 in each function will _help_, not hurt. What is so different from Lee Derbenwick's recommendations compared to mine, except that the `portability range' he has in mind includes machines with 4 or 5 registers? Of course, the number 4 or 5 may be more appropriate to "typical" hardware today, and if the access-frequency of the five variables in question is about the same *and* their logical lifetime perfectly overlaps, there is nothing to gain from *not* declaring them with `register' storage class. I admitt that strictly following my original advice to use no more than two register declarations will result in slower code then. >Martin Weitzel's advice not to declare inner-block variables as register >is good -- some un-smart compilers ignore them; others limit their >optimizizations near them; some handle them well. It's a gamble. I'm quite glad to read this :-), as Doug Gwyn in his first followup to my original article wrote in contrary and recommended using `register' within blocks as for PCC-like compilers this will even allow sharing registers. It really seems to be a gamble ... and as in every gamble you may loose. If we now leave the question of "how much exactly", there remains the more general problem: Should a programmer limit the number of register declarations, or, to drive it to the extreme, should he or she simply declare every `auto'-variable with attribute `register' - provided there's no need to take the adress? Given that the access frequency for all those variable differs considerably - and this is true in most every case - a programmer who tunes a function for execution speed must care to get the `right' variables into registers. >Using register declarations will _never_ interfere with a smart >optimizing compiler. [...] IMHO I never wrote so, but maybe that was not meant as an objection. Generally I still tend to see registers as a scarce resource. Given that in most programs only very small parts have considerable influence to overall performance, and further assuming the situation where I try to put in optimizations for target environments I do not yet know, I think it's no bad idea - if with small changes to the algorithm possible - to concentrate high access frequency to very few variables; often the set of such variables will not be the same in different parts of the function I'm about to optimize. Then IMHO it's better to only use a limited number of register variables, declared at function block level, and to manually reuse them, as to depend on the compiler to reuse register variables in nested blocks. (Note that I carefully avoided to mention an exact number in this paragraph. It is up to the reader to replace `few' and `limited' with 2, 5, 10, or whatever :-)) -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/28/90)
In article <976@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes: > In short, I tried to warn the reader that everything which exceeds two > register declarations in the outmost block of a function *could* result in > code that performed worse than with fewer declarations (hence the subject). Sure. And it *could* also result in much better code. What most of us are saying is that in practice extra register declarations help much more than they hurt. In typical programs, some variables are used quite a lot, and they should be declared register. Some variables are rarely used, and they shouldn't be declared register. It's better to err on the side of extra register declarations than to pessimize your code in the common case. Past that, who cares? The language doesn't provide better mechanisms for asserting variable use, so you won't be able to outguess the compiler in very many cases. ---Dan
rjc@uk.ac.ed.cstr (Richard Caley) (11/29/90)
In article <9733:Nov2722:02:3090@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: In typical programs, some variables are used quite a lot, and they should be declared register. Some variables are rarely used, and they shouldn't be declared register. It's better to err on the side of extra register declarations than to pessimize your code in the common case. IMHO, it is better not to declare register variabes unless you need to (i.e. the code won't perform as needed without). Given void tweakit(register struct foobar *lastone) { } vs. void tweakit(struct foobar *lastone) { } I find the second _so_ much more readable that adding in the register for, say, a 10% speedup in a program which runs in 10 seconds is not worth it unless you are _very_ sure that that second is critical and the code is more or less totally stable. If the time _is_ critical then it is not enough to stick in a few register variables anyway, it is time to wheel out the profiler, stare at the assembler output and work out whether floating point arithmetic or jumps are more time critical on your machine. -- rjc@uk.ac.ed.cstr real men don't use typedefs!
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/30/90)
In article <RJC.90Nov29012724@brodie.uk.ac.ed.cstr> rjc@uk.ac.ed.cstr (Richard Caley) writes: [ void tweakit(register struct foobar *lastone) ] [ vs. void tweakit(struct foobar *lastone) ] > I find the second _so_ much more readable Oh, fgs. I find both of them equally unreadable. With pre-ANSI syntax, it's trivial to see the variable right near the end of a line, and to work backwards through its declaration to see how it's used. (How else can you read a declaration?) Adding ``register'' or another keyword to the beginning of a line hardly impairs this skill. > that adding in the register > for, say, a 10% speedup in a program which runs in 10 seconds is not > worth it unless you are _very_ sure that that second is critical and > the code is more or less totally stable. Well, fine. In every program's life there comes a time when it must inspect itself, and say, ``Yea, verily, my registers are hardly used, and it is time for me to give up the slowness of immaturity, and take on some register declarations, so that I can run at a respectable speed, until death do us part.'' Or something like that. I'm just tidying up a program whose main code has 19 register declarations. I knew as I added each of those variables that it would be used a few times in the inner loop, and the nature of the problem meant that none of the variables would be used much more often than the others. Should I take away any of those declarations? Don't be silly. Did I have to wait for the code to be stable to do this optimization? Not at all. What happens when I remove the declarations? With optimization, nothing. Without optimization, #define register null makes it run 40-80% slower. You have to be crazy to say that programmers should take such a penalty during development, for no more reason than ``register looks ugly'' or ``it *could* slow down your code.'' > If the time _is_ critical then it is not enough to stick in a few > register variables anyway, it is time to wheel out the profiler, stare > at the assembler output and work out whether floating point arithmetic > or jumps are more time critical on your machine. True. Every optimization helps. ---Dan