[comp.lang.c] Loop optimization

rang@cs.wisc.edu (Anton Rang) (09/06/90)

In article <1589@redsox.bsw.com> campbell@redsox.bsw.com (Larry Campbell) writes:
>    /* Fragment 1 */
>    for (p = array; p < &array[ARRAY_SIZE]; p++)
>	*p++ = '\0';

        ^^^^                                 ^^^
   presumably only one of these increments is wanted...

>    /* Fragment 2 */
>    for (i = 0; i < ARRAY_SIZE; i++)
>	array[i] = '\0';

  The MIPS compiler generates identical code for these (and two
similar cases I checked).  GNU C on an MC68020, which usually
generates better code than any other 68000-series compiler I've seen,
uses array operations for the second loop (at least 1.37.1 does).

>If the answer to my question is "yes", then I think we should flame the
>culpable compiler vendors to a crisp.

  Yep...it's time for compiler authors to stop putting so much of the
burden on programs, IMHO.  (Then again, I like Pascal too... :-)

	Anton

+---------------------------+------------------+-------------+
| Anton Rang (grad student) | rang@cs.wisc.edu | UW--Madison |
+---------------------------+------------------+-------------+

rang@cs.wisc.edu (Anton Rang) (09/07/90)

In article <1990Sep6.203031.29254@laguna.ccsf.caltech.edu>,
  jimmy@tybalt.caltech.edu (Jimmy Hu) writes:
>I would expect the two fragments to generate different codes. [ ... ]
>I don't think that the compiler will "convert" or "optimize" the
>second to the first or vice versa.

  It should, if the first will run faster--and on most modern
machines, it will.  On machines with autoincrement, the pointer
version becomes something like:

loop:	clr.b	(r0)+		; Clear a byte, autoincrement R0
	cmp.l	r0,r1		; Are we at the end of the array yet?
	bleq	loop		; No, keep going

  The second loop, if compiled naively, generates code more like

loop:	clr.b	(r0)[d0.l]	; Clear a byte using indexed addressing
	inc.l	d0		; Increment i
	cmp.l	d0,d1		; Done yet?
	bleq	loop		; No, not yet

  This is likely to run slower on nearly all machines.  (The above
code is intended to reflect a typical CISC machine, say a 68020 or
VAX.)

>where is it going to get a pointer from? Create one?

  Yes, it should.

>How will i become ARRAYSIZE after exiting the loop?

  If i is used after the loop, there should be an instruction *after*
the loop which moves ARRAYSIZE into i.  If i is used inside the loop
for something, the compiler has to decide between using indexed
addressing for the array, or maintaining an address register
synchronized to i (if i isn't assigned to within the loop, and there
are extra registers available, this is fairly easy).

>I think that the first fragment will run faster on most machines,
>since the address (array+i) doesn't have to be computed at each step.

  Exactly; this is why we want the compiler to convert the array
version into the pointer version.  That way, programmers can write it
more clearly (or at least, perhaps more clearly to them), and it will
still run at the same speed.  (On some machines, though I don't know
of any, it could even be that the pointer version would be slower, and
the compiler should then convert case (1) into (2).)

  This particular optimization is well-known, and has been used in
FORTRAN compilers for some time.  It's called "induction variable
elimination"; check out pp. 466-471 of the original Dragon book for an
explanation and (simplistic) algorithms.  (It's easier in FORTRAN
since aliasing is rarer.)

	Anton

+---------------------------+------------------+-------------+
| Anton Rang (grad student) | rang@cs.wisc.edu | UW--Madison |
+---------------------------+------------------+-------------+