[comp.sys.amiga] Amiga compiler optimizing test

lindwall@sdsu.UUCP (John Lindwall) (03/02/89)

	This posting was prompted by a friendly competition between two IBM
PC  compilers  - Microsoft C 5.0 and Turbo C 2.0.  We compared the assembly
output  generated  by  these compilers from a simple piece of code (I had a
similar  statement  in a graphics program I was writing at the time).  Here
is the code:

main()
{
	int x, y, z;

	x = 8*y - 8*z;
}


	[Keep reading - this IS an amiga posting.]

	It turned out that MSC generated nicer code for this statement.  It
fetched  y  and  z , performed the subtraction, bit shifted the result, and
stored  the  result  in x.  Turbo fetched y, shifted it, fetched z, shifted
it, subtracted, and stored the result in x.

	So what does this have to do with Amigas?  Well, out of curiosity I
tried  the same test on the only Ami compiler C I have, Manx 3.6a.  Here is
how I compiled it:
	cc -a +L test.c
And here is the code generated:

	move.l	-8(a5),d0	; get y
	asl.l	#3,d0		; y *= 8
	move.l	-12(a5),d1	; get z
	asl.l	#3,d1		; z *= z
	sub.l	d1,d0		; x = 8*y - 8*z
	move.l	d0,-4(a5)

	Just out of curiosity I was wondering if someone could compile this
code  using  Lattice so we can compare results.  Use long integers.  Please
reveal the command line used to perform the compilation.

	Thanks!

johnl@tw-rnd.SanDiego.NCR.COM
john.lindwall@tw-rnd.SanDiego.NCR.COM

deven@pawl.rpi.edu (Deven Corzine) (03/03/89)

Lattice C V5.00 generated similar code; it did not factor the
expression and do the shift once only.  (with or without the global
optimizer.)  with your example, the compiler complained about
uninitialized automatic variables and the optimizer simply killed the
whold thing as being dead code.  recoded as:

test(x,y);
int x,y;
{
   return(x*8-y*8);
}

compiled with:

lc -O -v test.c

nothing complained, produced code was:

	link a5,#0000		; (exactly what does this do?)
				; [something w/stack]
	move.l 000c(a5),d0	; x
	asl.l #3,d0		; x*8
	move.l 0008(a5),d1	; y
	asl.l #3,d1		; y*8
	sub.l d0,d1		; x*8-y*8
	move.l d1,d0		; return x*8-y*8
	unlk a5			; undo the link
	rts			; return

Deven
--
------- shadow@pawl.rpi.edu ------- Deven Thomas Corzine ---------------------
Cogito  shadow@acm.rpi.edu          2346 15th Street            Pi-Rho America
ergo    userfxb6@rpitsmts.bitnet    Troy, NY 12180-2306         (518) 272-5847
sum...     In the immortal words of Socrates:  "I drank what?"     ...I think.

bader+@andrew.cmu.edu (Miles Bader) (03/04/89)

deven@pawl.rpi.edu (Deven Corzine) writes:
> Lattice C V5.00 generated similar code; it did not factor the
> expression and do the shift once only.  (with or without the global
> optimizer.)
>...
> test(x,y);
> int x,y;
> {
>    return(x*8-y*8);
> }

Just for reference, this is what gcc -O (which does a pretty damn good
job optimizing) on a sun outputs:

_addt8:
	link a6,#0
	movel a6@(8),d0
	asll #3,d0
	movel a6@(12),d1
	asll #3,d1
	subl d1,d0
	unlk a6
	rts

It manages to save one instruction by subtracting into the return
register.  I don't think this is accidental, as it manages to do the
same thing with the order of the subtraction reversed.

-Miles

brianr@tekig5.PEN.TEK.COM (Brian Rhodefer) (03/05/89)

Why would an optimizing compiler put `link a5, 0000' and 'unlnk a5'
instructions into a subroutine that needed no local variables?

dillon@POSTGRES.BERKELEY.EDU (Matt Dillon) (03/05/89)

>Why would an optimizing compiler put `link a5, 0000' and 'unlnk a5'
>instructions into a subroutine that needed no local variables?

	(1) So A5 can be used to referenced arguments

	(2) So a debugger can backtrace the stack frame.

	Apart from that, there is no reason to use link/unlk at all.  If
you want to talk about optimization, one can cut the call-return overhead by
half or more by:

	(1) caller passes the return address in a register and jmp's or bra's
	    to the routine instead of jsr'ing

	(2) callee pops caller's arguments (it can pop the args and free up
	    its own stack (local vars) in one instruction.. an add).

							-Matt

darin@nova.laic.uucp (Darin Johnson) (03/07/89)

In article <3839@tekig5.PEN.TEK.COM> brianr@tekig5.PEN.TEK.COM (Brian Rhodefer) writes:
>Why would an optimizing compiler put `link a5, 0000' and 'unlnk a5'
>instructions into a subroutine that needed no local variables?

1) to support alloca type stuff?

2) Because it's simple.  Otherwise compilers would have to backpatch
the generated code.  If it was determined later that the routine
didn't need a link/unlnk, then it would have to remove that instruction,
shuffle things around, etc.  This isn't that difficult, but a lot of
compilers don't do it.  I see this the most on UN*X systems, whose
compilers were derived from PCC.  Usually, the code generated is something
like this:

  link a5, $T997
  .
  .
  unlnk a5
$T997 equ 42

Remember, the words "optimizing compiler" doesn't mean much.  If
a simple peephole optimizer was thrown in, they can call it an
optimizing compiler.  (such as a few bigname UN*X machines, whose
compilers got rid of the equ statements in the above example, but
very little else in my tests. [1985ish])

Darin Johnson (leadsv!laic!darin@pyramid.pyramid.com)
	Can you "Spot the Looney"?

jesup@cbmvax.UUCP (Randell Jesup) (03/09/89)

In article <462@laic.UUCP> darin@nova.UUCP (Darin Johnson) writes:
>In article <3839@tekig5.PEN.TEK.COM> brianr@tekig5.PEN.TEK.COM (Brian Rhodefer) writes:
>>Why would an optimizing compiler put `link a5, 0000' and 'unlnk a5'
>>instructions into a subroutine that needed no local variables?
...
>2) Because it's simple.  Otherwise compilers would have to backpatch
>the generated code.  If it was determined later that the routine
>didn't need a link/unlnk, then it would have to remove that instruction,
>shuffle things around, etc.  This isn't that difficult, but a lot of
>compilers don't do it.  I see this the most on UN*X systems, whose
>compilers were derived from PCC.  Usually, the code generated is something

	Lattice will not normally put in LINK #0,An's if the optimizer is
turned on.  Otherwise it will, for debugger support (debuggers usually use
LINKs to find the stack frames.)  However, there are some cases where LINK #0
will still be generated.

-- 
Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup