[comp.lang.c] May too many register variables hurt?

martin@mwtech.UUCP (Martin Weitzel) (11/20/90)

In general, my advice is to use no more than two register variables, and
only in the *outmost* blocklevel in the body of any function.

Why? Listen:

If you have a modern, optimizing compiler, it will ignore the register
keyword and make independent decissions about use of CPU-registers. If
you have modern hardware with many available registers, most probably
you also have a modern compiler, so there's no reason to worry about having
defined "not enough" register variables.

So we can assume that register variables are mostly for software which
may eventually be ported to some unknown ancient compiler that produces
code for some unknown ancient hardware with very few registers. In this
situation, it may in fact hurt performance if you specify too many register
variables, because the compiler may put the wrong ones into the available
registers.

Even if you trust in the compiler implementing correctly what K&R-I mentions,
that the register storage class is obbeyed in the same order as variables
are defined, are you sure how the unknown ancient compiler will interpret
the following example?

	foo()
	{ register int a;
	.... /* some code using a */
	     { register int b;
	     .... /* some code using b */
	     }
	.... /* some more code using a */
	     { register int c;
	     .... /* some code using c */
	     }
        .... /* some more code using a */
        }

Take some ancient CPU with two registers available for local int-s.
The order in which the variables are declared is a-b-c, so c will not
profit from its storage class. Or rather, will the compiler generate
some code to safe one register on each entry to the block defining c?
Will it eventually even do so for the block defining b? (Furthermore,
don't forget that it may require more instructions to call another
function if either the called or the calling funktion has register
variables, because the used CPU-registers must be saved%.)

Of course, if you know your particular compiler/CPU-combination well and
if you accept that your performance-gain may well be a performance-loss
in case the program is ported to anywhere else, you may carefully
investigate which variables to put into registers to achieve the best
performance.

========================
%:I think there is more than one approach of delegating the responsibility
  for saving registers, so you can not tell exactly where the overhead
  will occur.
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

gwyn@smoke.brl.mil (Doug Gwyn) (11/21/90)

In article <967@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes:
>In general, my advice is to use no more than two register variables, and
>only in the *outmost* blocklevel in the body of any function.

Mine is the opposite.

>The order in which the variables are declared is a-b-c, so c will not
>profit from its storage class.

Sure it will.  Since b and c are declared in separate parallel blocks,
older-technology compilers such as PCC will share the explicit register
that is assigned for these two variables.  This is in fact a good way
to exploit "register" in such compilers.

henry@zoo.toronto.edu (Henry Spencer) (11/22/90)

In article <967@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes:
>So we can assume that register variables are mostly for software which
>may eventually be ported to some unknown ancient compiler that produces
>code for some unknown ancient hardware with very few registers...

Unfortunately, such compilers are by no means unknown, and the hardware in
question often has a useful number of registers.  For example, if you want
a modern compiler from Sun, you have to pay extra, so most people have
their old one, in which register variables most assuredly are important.
-- 
"I don't *want* to be normal!"         | Henry Spencer at U of Toronto Zoology
"Not to worry."                        |  henry@zoo.toronto.edu   utzoo!henry

lfd@cbnewsm.att.com (leland.f.derbenwick) (11/22/90)

In article <967@mwtech.UUCP>, martin@mwtech.UUCP (Martin Weitzel) writes:
> In general, my advice is to use no more than two register variables, and
> only in the *outmost* blocklevel in the body of any function.
> 
> Why? Listen:
> 
> If you have a modern, optimizing compiler, it will ignore the register
> keyword and make independent decissions about use of CPU-registers. If
> you have modern hardware with many available registers, most probably
> you also have a modern compiler, so there's no reason to worry about having
> defined "not enough" register variables.

Modern hardware.  Such as the entire VAX line, Intel 80x86, Motorola
680x0, etc.?  None of these have "many" available registers, yet all
are certainly modern in the sense that they are currently used and
sold in large quantities.  And while commonly available compilers for
these do vary in their optimization quality, none that I've used has
come close to the levels of optimization being applied to RISC
architectures.

So let's turn the proposed advice around.  Unless you _know_ that your
code is being written _only_ for a processor with lots of registers and
a smart compiler, and that it will never be ported to any of today's
common processors, use register declarations generously -- typically
from 1 up to 4 or 5 in each function will _help_, not hurt.

(Alternatively, the performance of your code may be irrelevant to you.
This _does_ occur in practice: in some code I worked on several years
ago to run on IBM mainframes, the _only_ relevant optimization was
reducing the number of data base accesses -- everything else was so
fast by comparison that it was lost in the noise.)

Martin Weitzel's advice not to declare inner-block variables as register
is good -- some un-smart compilers ignore them; others limit their
optimizizations near them; some handle them well.  It's a gamble.

Using register declarations will _never_ interfere with a smart
optimizing compiler.  A register declaration is a suggestion, not an
absolute: the compiler is perfectly free to ignore it in order to do
other optimizations.  (It is also a promise: you will never take the
address of a register variable.)  Even as far back as K&R I, "A
register declaration is best thought of as an auto declaration,
together with a hint to the compiler that the variables declared will
be heavily used."  It's a hint that a good compiler can use, but that
it will ignore if better optimizations are available.  (Anyone who
writes a compiler capable of doing optimal register allocation on its
own had _better_ make it ignore register declarations!)

 -- Speaking strictly for myself,
 --   Lee Derbenwick, AT&T Bell Laboratories, Warren, NJ
 --   lfd@cbnewsm.ATT.COM  or  <wherever>!att!cbnewsm!lfd

jfc@Achates.MIT.edu (John F Carr) (11/24/90)

In article <1990Nov21.221908.19871@cbnewsm.att.com>
	lfd@cbnewsm.att.com (leland.f.derbenwick) writes:
>(Anyone who
>writes a compiler capable of doing optimal register allocation on its
>own had _better_ make it ignore register declarations!)

I disagree.  Often the programmer knows better than the compiler which
variables are most used.  Optimizing compilers should eliminate the need for
every function to have a few register declarations, but they do not obsolete
the "register" keyword.  I will agree that optimizing compilers should not
take "register" as an order; it should be treated internally as an increment
to the estimated number of uses of the variable.

--
    John Carr (jfc@athena.mit.edu)

martin@mwtech.UUCP (Martin Weitzel) (11/24/90)

In article <14538@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes:
>In article <967@mwtech.UUCP> martin@mwtech.UUCP (I) wrote:
>>In general, my advice is to use no more than two register variables, and
>>only in the *outmost* blocklevel in the body of any function.
>
>Mine is the opposite.
>
>>The order in which the variables are declared is a-b-c, so c will not
>>profit from its storage class.
>
>Sure it will.  Since b and c are declared in separate parallel blocks,
>older-technology compilers such as PCC will share the explicit register
>that is assigned for these two variables.  This is in fact a good way
>to exploit "register" in such compilers.

Though it might be not wise to argue with one of the network gods,
I beg to differ here. (For those who tuned in late: The question is:
If you write code for an unknown compiler, may *too many* register
declarations hurt overall performance.)

My argument was that *without* some internal knowledge of the inner
workings of some (non-optimizing) compiler, it is *not* possible to
choose the appropriate places for more than two or maybe three register 
variables declared in the outmost (function) block.

K&R-1, pg. 193 states concerning register:

	... only the first few of such declarations are effective ...

and on page 81

	... on the PDP-11, only the first three register declarations
	    in a function are effective ...

Now let's assume the implementor of an unknown% compiler has read his K&R-1,
though it might not be alike K&R's PDP-11 compiler, but James McCosh's for
the 6809, which has only *two* free register variables and I determine the
five most heavily used variables in my function as `a', `b', `c', `d', `e'
(in that order, i.e. `e' is less frequently accessed). Further we may have
the following block structure:


	func()
	{
		...d, e, used here ....

		for (......)
		{
			...b, d, used here
			{
				... c used here
			}

			for (......)
			{
				... a, b used here

			}
			...b, d, used here
		}
		... e, used here again
	}

If I follow the simple rule to place all register declarations outside
(at function block level) and to depend not on more than two beeing
effective, I can easily verify that the "right" ones are given. If I
further trust the compiler that it gets K&R-1 right in only obbeying the
declarations that come first, I may even declare all the five variables
with sorage class register (in decreasing order of their access
frequency), not risking that the most important ones are missed. This
would work for the 6809 (2 registers) and the PDP-11 (3 registers).

On the other hand, if I declare the variables at the inner blocks (as
the required scope allows), it may be possible for PCC-like compilers
to share registers between blocks, but it is not possible for me to find
the set of variables which should receive the register attribut, without
knowing the number of available registers: For McCosh's-6809 compiler
(only two registers) I should declare `a', `b', and `c' as register (and
hope that the compiler is PCC-ish enough to share the register for
`a' and `c' between the blocks). On the PDP-11 I could (and propably
should) try to use the third register by also declaring `d' as register,
but that would on the 6809 force the most important variable (`a') out
of the available registers.

----------------------------
%: Hand-optimizing register declarations for a compiler which I know
   very well is quite another topic ...
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

mikes@ingres.com (Mike Schilling) (11/25/90)

From article <1990Nov21.221908.19871@cbnewsm.att.com>, by lfd@cbnewsm.att.com (leland.f.derbenwick):
> Modern hardware.  Such as the entire VAX line, Intel 80x86, Motorola
> 680x0, etc.?  None of these have "many" available registers, yet all
> are certainly modern in the sense that they are currently used and
> sold in large quantities. 
"Many" is relative, of course.  Compared to the 3 free registers a PDP11 had,
the 10 a VAX is likely to have looks like a lot.
----------------------------------------------------------------------------
mikes@rtech.com = Mike Schilling, Ask Corporation, Alameda, CA
Just machines that make big decisions,
Programmed by fellows with compassion and vision.	-- Donald Fagen, "IGY"

gwyn@smoke.brl.mil (Doug Gwyn) (11/25/90)

In article <972@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes:
>	... only the first few of such declarations are effective ...

Clearly what was meant by this is that only the first few OF THOSE CURRENTLY
IN SCOPE are effective.  Out-of-scope declarations are simply irrelevant.

As you note, the programmer has a hard time indicating which variables
are most important to registerize, when a variety of compilation
environments must be accommodated.  What some developers have done is
to add a batch of macros in their system-configuration header:
	/* Prioritized "register" declarations: */
	#define	REG_0	register
	#define	REG_1	register	/* last available for 6809 */
	#define	REG_2	register	/* last available for PDP-11 */
	#define	REG_3	/* nothing */
	#define	REG_4	/* nothing */
and in the application use these in a mutually-exclusive manner:
	func()	{
		REG_2 int	i;
		REG_0 char	*p;
		...
			{
			REG_1 char	*q;
			REG_3 int	j;
			...
			}
		}
so that no more "register" storage-class specifiers are seen by the
compiler in any scope than are actually effective for the implementation.

I don't use this method myself, preferring to use "register" as a hint
rather than a requirement, but if you are hyper-concerned about this
level of optimization you might want to consider such an approach.

martin@mwtech.UUCP (Martin Weitzel) (11/27/90)

In article <967@mwtech.UUCP>, martin@mwtech.UUCP I gave recommendations
for using the `register' attribut. If had known the number of followups
on this I had possibly chosen a more careful wording.

Most of my original posting should treat the case where you try to make
things "right" even in case your software gets ported to some environment
you don't know while you write the program. (To some degree this could
be compared to the recommendation: "Never assume the bit pattern of the
NULL-pointer is all zero", even if you currently work only on machines
where this is true.)

In short, I tried to warn the reader that everything which exceeds two
register declarations in the outmost block of a function *could* result in
code that performed worse than with fewer declarations (hence the subject).
Thinking a bit more about it I should rather have recommended to make
sure that "the two variables for which the register attribute is most
important should be declared as the first ones in the outmost block". If
we trust the compiler getting right what K&R says wrt to the significance
of the order of register declarations, using more register declarations
in the outmost block should be OK. (But see also the last paragraph.)

Any assumptions made about a maximum number of available registers is
allways somehow influenced by the hardware one has in mind. (Maybe I'm
still influenced here by the machine which was the first one I used with
C, the 6809 :-)). If todays hardware commonly supports five registers,
there is no point not using them. (Another followup by Doug Gwyn in this
thread shows an elegant way how register usage can be fine-tuned using
the preprocessor.)

In article <1990Nov21.221908.19871@cbnewsm.att.com> lfd@cbnewsm.att.com
(leland.f.derbenwick) answered:

>[...] use register declarations generously -- typically
>from 1 up to 4 or 5 in each function will _help_, not hurt.

What is so different from Lee Derbenwick's recommendations compared to
mine, except that the `portability range' he has in mind includes machines
with 4 or 5 registers? Of course, the number 4 or 5 may be more appropriate
to "typical" hardware today, and if the access-frequency of the five
variables in question is about the same *and* their logical lifetime
perfectly overlaps, there is nothing to gain from *not* declaring them
with `register' storage class. I admitt that strictly following my
original advice to use no more than two register declarations will
result in slower code then.

>Martin Weitzel's advice not to declare inner-block variables as register
>is good -- some un-smart compilers ignore them; others limit their
>optimizizations near them; some handle them well.  It's a gamble.

I'm quite glad to read this :-), as Doug Gwyn in his first followup to my
original article wrote in contrary and recommended using `register' within
blocks as for PCC-like compilers this will even allow sharing registers.
It really seems to be a gamble ... and as in every gamble you may loose.

If we now leave the question of "how much exactly", there remains the
more general problem: Should a programmer limit the number of register
declarations, or, to drive it to the extreme, should he or she simply
declare every `auto'-variable with attribute `register' - provided there's
no need to take the adress? Given that the access frequency for all those
variable differs considerably - and this is true in most every case - a
programmer who tunes a function for execution speed must care to get
the `right' variables into registers.

>Using register declarations will _never_ interfere with a smart
>optimizing compiler. [...]

IMHO I never wrote so, but maybe that was not meant as an objection.

Generally I still tend to see registers as a scarce resource. Given that in
most programs only very small parts have considerable influence to overall
performance, and further assuming the situation where I try to put in
optimizations for target environments I do not yet know, I think it's no
bad idea - if with small changes to the algorithm possible - to concentrate
high access frequency to very few variables; often the set of such variables
will not be the same in different parts of the function I'm about to optimize.
Then IMHO it's better to only use a limited number of register variables,
declared at function block level, and to manually reuse them, as to depend
on the compiler to reuse register variables in nested blocks. (Note that
I carefully avoided to mention an exact number in this paragraph. It is up to
the reader to replace `few' and `limited' with 2, 5, 10, or whatever :-))
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/28/90)

In article <976@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes:
> In short, I tried to warn the reader that everything which exceeds two
> register declarations in the outmost block of a function *could* result in
> code that performed worse than with fewer declarations (hence the subject).

Sure. And it *could* also result in much better code. What most of us
are saying is that in practice extra register declarations help much
more than they hurt. In typical programs, some variables are used quite
a lot, and they should be declared register. Some variables are rarely
used, and they shouldn't be declared register. It's better to err on the
side of extra register declarations than to pessimize your code in the
common case. Past that, who cares? The language doesn't provide better
mechanisms for asserting variable use, so you won't be able to outguess
the compiler in very many cases.

---Dan

rjc@uk.ac.ed.cstr (Richard Caley) (11/29/90)

In article <9733:Nov2722:02:3090@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

    In typical programs, some variables are used quite
    a lot, and they should be declared register. Some variables are rarely
    used, and they shouldn't be declared register. It's better to err on the
    side of extra register declarations than to pessimize your code in the
    common case.

IMHO, it is better not to declare register variabes unless you need to
(i.e. the code won't perform as needed without). Given

	void
	tweakit(register struct foobar *lastone)

	{
	}

vs.

	void
	tweakit(struct foobar *lastone)

	{
	}

I find the second _so_ much more readable that adding in the register
for, say, a 10% speedup in a program which runs in 10 seconds is not
worth it unless you are _very_ sure that that second is critical and
the code is more or less totally stable.

If the time _is_ critical then it is not enough to stick in a few
register variables anyway, it is time to wheel out the profiler, stare
at the assembler output and work out whether floating point arithmetic
or jumps are more time critical on your machine.

--
rjc@uk.ac.ed.cstr	real men don't use typedefs!

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/30/90)

In article <RJC.90Nov29012724@brodie.uk.ac.ed.cstr> rjc@uk.ac.ed.cstr (Richard Caley) writes:
  [     void tweakit(register struct foobar *lastone) ]
  [ vs. void tweakit(struct foobar *lastone) ]
> I find the second _so_ much more readable

Oh, fgs. I find both of them equally unreadable.

With pre-ANSI syntax, it's trivial to see the variable right near the
end of a line, and to work backwards through its declaration to see how
it's used. (How else can you read a declaration?) Adding ``register'' or
another keyword to the beginning of a line hardly impairs this skill.

> that adding in the register
> for, say, a 10% speedup in a program which runs in 10 seconds is not
> worth it unless you are _very_ sure that that second is critical and
> the code is more or less totally stable.

Well, fine. In every program's life there comes a time when it must
inspect itself, and say, ``Yea, verily, my registers are hardly used,
and it is time for me to give up the slowness of immaturity, and take on
some register declarations, so that I can run at a respectable speed,
until death do us part.'' Or something like that.

I'm just tidying up a program whose main code has 19 register
declarations. I knew as I added each of those variables that it would be
used a few times in the inner loop, and the nature of the problem meant
that none of the variables would be used much more often than the
others.

Should I take away any of those declarations? Don't be silly.

Did I have to wait for the code to be stable to do this optimization?
Not at all.

What happens when I remove the declarations? With optimization, nothing.
Without optimization, #define register null makes it run 40-80% slower.

You have to be crazy to say that programmers should take such a penalty
during development, for no more reason than ``register looks ugly'' or
``it *could* slow down your code.''

> If the time _is_ critical then it is not enough to stick in a few
> register variables anyway, it is time to wheel out the profiler, stare
> at the assembler output and work out whether floating point arithmetic
> or jumps are more time critical on your machine.

True. Every optimization helps.

---Dan