[comp.lang.c] Yet Another Silly Question

decot@hpisod2.HP.COM (Dave Decot) (05/23/89)

> 	int a[MAX];			int a[MAX];
> 	int i;				int *p;
> 	for (i=0; i<MAX; ++i)		for (p=&a[0]; p<&a[MAX]; ++p)
> 		a[i] = 0;			*p=0;
> 
> 	...
> 
> ... On some compiler/machine combinations, this will run faster,
> because the scaling operation and base/offset addition have been
> eliminated; on others it may run slower, because a specific addressing
> mode cannot be used.

Note that the scaling operation has not necessarily been completely
eliminated; it may have become hidden in "++p" because increments by 1
may be much faster than increments by sizeof(*p).

Anyway, if one is interested in permitting high performance, one might as
well give the compiler complete latitude to do this efficiently by
rewriting this as:

	static struct { int i[MAX]; } a_, zero;
	int *a = &a_.i[0];

	...

	a_ = zero;

  :-) :-) :-) :-) :-) :-) :-) :-)

Dave "Equal rights for arrays, NOW!" Decot

jcbst3@unix.cis.pittsburgh.edu (James C. Benz) (06/01/89)

In article <2550091@hpisod2.HP.COM] decot@hpisod2.HP.COM (Dave Decot) writes:
]] 	int a[MAX];			int a[MAX];
]] 	int i;				int *p;
]] 	for (i=0; i<MAX; ++i)		for (p=&a[0]; p<&a[MAX]; ++p)
]] 		a[i] = 0;			*p=0;
]] 
]] 	...
]] 
]] ... On some compiler/machine combinations, this will run faster,
]] because the scaling operation and base/offset addition have been
]] eliminated; on others it may run slower, because a specific addressing
]] mode cannot be used.
]
]Note that the scaling operation has not necessarily been completely
]eliminated; it may have become hidden in "++p" because increments by 1
]may be much faster than increments by sizeof(*p).
]
Also, the scaling operation has simply been moved to the "for" line, as
far as I can tell.  The machine still has to dereference &a[MAX] on each
iteration of the loop, doesn't it?  It has to perform the test on each
iteration to see if p is < &a[MAX], and unless the optimizer is really
on top of things, it has no way of generating the comparison address
as a constant.  It really has no way of knowing that &a[MAX] won't
change during the course of the program, so it must be re-computed on
each iteration of the loop.  Seems to me that computing &a[MAX] will
take just as long and just as many arithmetic operations as computing
the address of a[i], with the minor difference between using a constant
as index as opposed to a variable.  In any situation I hope to encounter
in the kind of work I do, (database admin) I will trade off the readability
of the first example against the run time efficiency of the second, and 
choose readability every time.
-- 
Jim Benz 		     jcbst3@unix.cis.pittsburgh.edu     If a modem 
University of Pittsburgh					 answers,
UCIR			     (412) 648-5930			 hang up!

tim@crackle.amd.com (Tim Olson) (06/06/89)

In article <18229@unix.cis.pittsburgh.edu> jcbst3@unix.cis.pittsburgh.edu (James C. Benz) writes:
| In article <2550091@hpisod2.HP.COM] decot@hpisod2.HP.COM (Dave Decot) writes:
| ]] 	int a[MAX];			int a[MAX];
| ]] 	int i;				int *p;
| ]] 	for (i=0; i<MAX; ++i)		for (p=&a[0]; p<&a[MAX]; ++p)
| ]] 		a[i] = 0;			*p=0;
| ]] 
| ]] 	...
| ]] 
| ]] ... On some compiler/machine combinations, this will run faster,
| ]] because the scaling operation and base/offset addition have been
| ]] eliminated; on others it may run slower, because a specific addressing
| ]] mode cannot be used.
| ]
| ]Note that the scaling operation has not necessarily been completely
| ]eliminated; it may have become hidden in "++p" because increments by 1
| ]may be much faster than increments by sizeof(*p).
| ]
| Also, the scaling operation has simply been moved to the "for" line, as
| far as I can tell.

Actually, the "scaling operation" has been strength-reduced from a shift
or multiply in the inner-loop to a constant add (which was already
required to increment the variable "i").

| The machine still has to dereference &a[MAX] on each
| iteration of the loop, doesn't it?

No.  The address of a[MAX] is compared with the current pointer without
referencing the value at a[MAX].

| It has to perform the test on each
| iteration to see if p is < &a[MAX], and unless the optimizer is really
| on top of things, it has no way of generating the comparison address
| as a constant.  It really has no way of knowing that &a[MAX] won't
| change during the course of the program, so it must be re-computed on
| each iteration of the loop.

Not so in this example -- the array "a" is declared as "int a[MAX]", so
the compiler knows that "a" is a constant lvalue which can never change.
However, if "a" were actually a pointer, you are correct -- the value of
"&a[MAX]" could potentially change if the value of "a" is changed in the
loop.


	-- Tim Olson
	Advanced Micro Devices
	(tim@amd.com)

guy@auspex.auspex.com (Guy Harris) (06/07/89)

>Also, the scaling operation has simply been moved to the "for" line, as
>far as I can tell.  The machine still has to dereference &a[MAX] on each
>iteration of the loop, doesn't it?

I presume you mean "compute" rather than "dereference".  (I should *hope*
it doesn't dereference the pointer value "&a[MAX]"; since "a" has "MAX"
elements, the last element is "&a[MAX-1]", so dereferencing "&a[MAX]"
could cause all sorts of problems....)

>It really has no way of knowing that &a[MAX] won't change during the
>course of the program,

Say what?  It knows that "MAX" is a constant (assuming this is vanilla
C, and not, say, C as extended by compilers such as GCC that permit you
to declare automatic arrays of variable size), and it knows that "a"
isn't going to move, so it knows that "&a[MAX]" will always be the same
address.

karl@haddock.ima.isc.com (Karl Heuer) (06/07/89)

In article <18229@unix.cis.pittsburgh.edu> jcbst3@unix.cis.pittsburgh.edu (James C. Benz) writes:
>[someone writes about the loop: for (p=&a[0]; p<&a[MAX]; ++p) ...]
>The machine still has to dereference &a[MAX] on each iteration of the loop,
>doesn't it?

There's no dereference involved.  It does have to *evaluate* &a[MAX].

>unless the optimizer is really on top of things, it has no way of generating
>the comparison address as a constant.  It really has no way of knowing that
>&a[MAX] won't change during the course of the program, so it must be
>re-computed on each iteration of the loop.

The value "&a[MAX]" cannot change during the scope of the identifier "a".  On
a VAX-like architecture, the array a (which has block scope and auto storage
duration) is commonly implemented as a constant offset from a frame pointer;
the additional offset MAX*sizeof(int) will be absorbed by constant folding.
The only non-obvious optimization is to keep the value &a[MAX] in a register.

>In any situation I hope to encounter in the kind of work I do, (database
>admin) I will trade off the readability of the first example against the run
>time efficiency of the second, and choose readability every time.

I don't necessarily agree that pointers are less understandable than indexing,
but I agree that readability/maintainability should be the first concern.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

chris@mimsy.UUCP (Chris Torek) (06/07/89)

[context:	int a[MAX], *p; for (p = &a[0]; p < &a[MAX]; p++) ...]

In article <18229@unix.cis.pittsburgh.edu> jcbst3@unix.cis.pittsburgh.edu
(James C. Benz) writes:
>The machine still has to dereference &a[MAX] on each iteration of the
>loop, doesn't it?  It has to perform the test on each iteration to see
>if p is < &a[MAX], and unless the optimizer is really on top of things,
>it has no way of generating the comparison address as a constant.

No: on a typical machine, with a stack pointer (and without alloca(),
or with a frame pointer and optionally with alloca()), the `for' loop
generates code of the form:

		lea	sp@(-8),a2	| compute &a[MAX]
		lea	sp@(-100),a3	| p = &a[0];
	loop:	cmp	a3,a2		| p < [constant]
		jcc	out		| no, exit loop
		...
		jbr	loop
	out:

`a' is not a modifiable lvalue (although it is not `constant', since
the array is local to some function), and MAX is a constant, so &a[MAX]
cannot change during any iteration of the loop.

Actually, many machines have a `count down to zero' loop form which
may be still more efficient, and an optimising compiler (such as
`gcc -O -fstrength-reduce') may change the loop to

	for (p = &a[0], temp = sizeof a/sizeof *a; --temp >= 0; p++)
		... use *p ...

which contains both a strength reduction and a variable substitution
over the original

	for (i = 0; i < MAX; i++) ... use a[i] ...

but does require that `i' not be used for anything except the subscript
calculation.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris