[comp.sys.amiga] Code Optimization

palarson@watdragon.waterloo.edu (Paul Larson) (01/24/88)

Currently, single pass compilers are very popular, due to their speed.  This
speed allows the development cycle to be shortened, sometimes by very
significant amounts.  However, unless I'm very wrong, the speed of the
single pass compiler is gained at the expense of code quality.  One 
immediately obvious solution is a code optimizer, which could be used on a
compiled program once the critical debuggion is done.  However, such a 
code optimizer could also be put to another, and potentially more powerful
use.

The Mac and Amiga worlds are currently in the process of changing from the
68000 processor to the similar, but more powerful 68020.  The 68030 has also
been rumoured to be lurking somewhere on the horizon.  However, regardless of
whether a '000 or a '020 is installed, the same code is used.  Thus, if a 
system is using the more advanced '020, it is using code which was written for
a more primitive, though similar, processor and thus the code does not take
proper advantage of the more advanced features of the more modern processors.
The same agrument applies for the math co-processors, the 68881 and its new
sibling the 68882, which are rapidly becoming more popular on home computers.

The code optimizer, if properly written, could remove the inefficiency of
using primitive code on an advanced processor, simply by modifying the
code to take advantage of the features of the more advanced processor.
Since not everyone has a 68020, it would be left up to the user to decide
whether or not (s)he wants the code of a program modified in such a way.
The optimizer could also be written to rewrite the program code so as
to take advantage of a coprocessor, it one is present.

Is there some fundamental problem with what is described above?  Is anyone
working on such an optimizer?
I welcome your comments and criticisms.

	Johan Larson

denbeste@bbn.COM (Steven Den Beste) (01/25/88)

Johan Larson writes:

> Currently, single pass compilers are very popular, due to their speed.  This
> speed allows the development cycle to be shortened, sometimes by very
> significant amounts.  However, unless I'm very wrong, the speed of the
> single pass compiler is gained at the expense of code quality.  One 
> immediately obvious solution is a code optimizer, which could be used on a
> compiled program once the critical debuggion is done.  However, such a 
> code optimizer could also be put to another, and potentially more powerful
> use.
It depends on what you mean by a "single pass" compiler. If you mean a
single-pass compiler which yields assembly language, then you are sort of
cheating, because the two passes of the assembler represent the second and
third passes of your "one-pass" compiler.

If you mean a single-pass compiler which goes direct to object (or executable -
did I mention the two linking passes above, too?) then the big big problem with
single pass compilers is that they have to keep everything in memory. If your
program won't fit (and it has to be the WHOLE program, too - you can't break it
up unless you have a separate linker (which means two more passes)) then you
are SOL.

The biggest advantage of a multi-pass compiler is that the available memory at
compilation is almost unrelated to the size of the program being compiled (the
only thing which is related is the symbol table size).

If a single-pass compiler operates in memory, there is no fundamental reason
why it can't generate code just as efficient as a multi-pass compiler.

(By the way, I think you must be a Mac person. I don't know of any single pass
C-compilers for the Amiga, and very little development is done in any other
languages for it.)

> The Mac and Amiga worlds are currently in the process of changing from the
> 68000 processor to the similar, but more powerful 68020.  The 68030 has also
> been rumoured to be lurking somewhere on the horizon.  However, regardless of
> whether a '000 or a '020 is installed, the same code is used.  Thus, if a 
> system is using the more advanced '020, it is using code which was written for
> a more primitive, though similar, processor and thus the code does not take
> proper advantage of the more advanced features of the more modern processors.
> The same agrument applies for the math co-processors, the 68881 and its new
> sibling the 68882, which are rapidly becoming more popular on home computers.
Now, I may be wrong about this, but the big advantage of the 68020 isn't so
much new instructions (there are a few, but they aren't commonly used) but the
fact that it executes the old instructions much faster - because it uses a
32-bit internal data path and the 68000 uses a 16-bit path, so it takes twice
as many internal cycles on the 68000 for a given amount of wide data.

The execution speed of a program using the 68000 set, but run on the 68020,
will not be much worse typically than the same program compiled to use the full
68020 set - but both will beat the pants off of a 68000 running the same clock
rate.

The one big exception to this is that the 68000 cannot work with a math
co-processor and the 68020 (and the 68010 before it) can - so for math stuff if
you use the coprocessor instructions you can get orders of magnitude increases
in speed. There are ways of working around this so that the same code works on
both, but takes advantage of the math chip if it is there:

Way #1: All math operations are handled as calls to a run-time-loaded library
of math routines. On a 68000 or a 68020 without math chip, you load a software
floating point library. On the 68020 with math chip, you load one which calls
the chip. (This is what the Amiga does.)

Way #2: Everyone's code uses the math chip instructions, but on a 68000 or an
uninstalled 68020 they trap as illegal instructions - to a special routine
which emulates them in software and returns!

In the former case you pay a small performance hit on the 68020 with math
chip as compared to the latter.


> The code optimizer, if properly written, could remove the inefficiency of
> using primitive code on an advanced processor, simply by modifying the
> code to take advantage of the features of the more advanced processor.
> Since not everyone has a 68020, it would be left up to the user to decide
> whether or not (s)he wants the code of a program modified in such a way.
> The optimizer could also be written to rewrite the program code so as
> to take advantage of a coprocessor, it one is present.
An optimizer cannot run on compiled binary - the problem is indeterminate. The
reason is simple: The optimizer cannot differentiate between data and code.
(And if it optimizes the data, the program dies.) Worse than that, even if it
knew that it was in code, it cannot unambiguously identify the beginnings of
instructions.

In any case, in anything greater than a peephole optimizer, even the raw
assembly language isn't enough - the optimizer must be given information about
the original program.

An example is "common subtree" optimization in a complex
expression: To identify this is difficult and fraught with errors in the
assembly language. The right place to do it is in the parser when building the
expression tree.

Another example (probably closer to your heart) would be "Oh, I can use a 68881
instruction here". Unless your math library emulates the 68881 on a
one-call-per-one-instruction basis, you may not be able to figure out that a
68881 instruction will help - without reference to the original source.
(And if you are operating on binary, you may not know that this particular
subroutine call was a call to the math library, anyway!)

Peep-hole optimizers have their place, but the kind of optimization you are
asking for can't be done that way.

...and even if it could, I gather that you are thinking of this as a COMMERCIAL
thing - and what manufacturer will distribute the assembly language of their
program, complete with labels? (I gather that your idea was that someone buys a
commercial program intended for a 68000, passes it through the magic optimizer,
and all-of-a-sudden it is optimized for the 68020 with math chip.)


> Is there some fundamental problem with what is described above?  Is anyone
> working on such an optimizer?
> I welcome your comments and criticisms.
> 
> 	Johan Larson

It seems to me you've missed an easy answer to your problem: Use your
inefficient fast single-pass compiler while you develop, and when you are done
use the efficient slow multi-pass compiler for the production version.

Why do you need an optimizer at all? Use the one that's built into the
multi-pass compiler.
-- 
Steven C. Den Beste,   Bolt Beranek & Newman, Cambridge MA
denbeste@bbn.com(ARPA/CSNET/UUCP)    harvard!bbn.com!denbeste(UUCP)

cmcmanis%pepper@Sun.COM (Chuck McManis) (01/26/88)

In article <4776@watdragon.waterloo.edu> (Paul Larson) writes:
> Currently, single pass compilers are very popular, due to their speed.  This
> speed allows the development cycle to be shortened, sometimes by very
> significant amounts.  However, unless I'm very wrong, the speed of the
> single pass compiler is gained at the expense of code quality.  One 
> immediately obvious solution is a code optimizer, which could be used on a
> compiled program once the critical debuggion is done. ...
>                                                           ...  Thus, if a 
> system is using the more advanced '020, it is using code which was written for
> a more primitive, though similar, processor and thus the code does not take
> proper advantage of the more advanced features of the more modern processors.
> The same agrument applies for the math co-processors, the 68881 and its new
> sibling the 68882, which are rapidly becoming more popular on home computers.

Interesting point Paul, a couple of comments :

On the Amiga the Lattice C compiler optimizes to use code that would be faster
on the '010 - '020 by using instructions, like dbra, that take advantage of the
caching aspects of these chips. Since this does not impose a penalty on the
68000 code it seems to be a good optimization choice. 

And yes, incremental and single pass compilers cannot implement some 
optimization schemes but they are fast at compiling. So they are really good
for prototyping. Thus I think a more reasonable solution would be to have,
in effect, two compilers living in the same executable. One the incremental
compiler, the other a multipass optimizing compiler. Then you offer the user
a choice of which mode to compile the program in. I believe this is the 
approach that Microsoft has taken with Quick-C. Claiming that it is 100%
code compatible with their regular C compiler, so that once you have a 
working program you can recompile with their regular compiler to get optimized,
space efficient code. 


--Chuck McManis
uucp: {anywhere}!sun!cmcmanis   BIX: cmcmanis  ARPAnet: cmcmanis@sun.com
These opinions are my own and no one elses, but you knew that didn't you.

gary@fairlight.oz (Gary Evesson) (01/27/88)

This is an idea that I thought about in the early days of the Mac. I think it's
a good idea, but convincing someone to do it is another question. Writing
optimisers is a art in itself - not an easy thing to do. It doesn't seem that
any of the manufacturers of compilers for these machines are interested
(I would welcome a contradiction from anyone) as far as I can tell. I guess the
trick is to apply enough market pressure for them to do something about the
quality of their code, which I agree is atrocious.

							gary@fairlight.oz
							Gary Evesson

harald@ccicpg.UUCP ( Harald Milne) (01/28/88)

In article <4776@watdragon.waterloo.edu>, palarson@watdragon.waterloo.edu (Paul Larson) writes:
> The Mac and Amiga worlds are currently in the process of changing from the
> 68000 processor to the similar, but more powerful 68020.  The 68030 has also
> been rumoured to be lurking somewhere on the horizon.  However, regardless of
> whether a '000 or a '020 is installed, the same code is used. 

	Yes and no. There should be no difference in code generation on the
Amiga.

> Thus, if a 
> system is using the more advanced '020, it is using code which was written for
> a more primitive, though similar, processor and thus the code does not take
> proper advantage of the more advanced features of the more modern processors.

	Well here is the no answer. The Amiga makes use of shared libraries,
not unlike the shared memory concept in System V UNIX. Just what advanced
features are you talking about? Cache management, floating point coproccessor
interface, or what?

	Cache management is one thing, and the Amiga OS uses it (yes the
cache is turned on), if a 68020 is detected. The OS does this on boot.

	In terms of floating point, things are very different here. Floating
point operations on the Amiga are not compiled "inline", they are called.
If you have a 68000, you can use the standard software emulation (IEEE). If
you have the 68020, and have the 68881 or 68882 present (also detected on boot) you can simply replace the math library, and use 68881/68882 instructions
directly. No recompiles, no changes, existing software is upward compatabible.

> The same agrument applies for the math co-processors, the 68881 and its new
> sibling the 68882, which are rapidly becoming more popular on home computers.

	Uh, they are? What do you have? A MacII? At a $4000.00 base price, this
revolution is not comming soon. We are talking about some expensive hardware!
But after what I read in the latest Unix/World, the 68000 family could fall
out of favor, meaning lower prices for us all. That could make it popular.

> The code optimizer, if properly written, could remove the inefficiency of
> using primitive code on an advanced processor, simply by modifying the
> code to take advantage of the features of the more advanced processor.

	Well you lost me completely here. What advantages are not being used?

> Since not everyone has a 68020, it would be left up to the user to decide
> whether or not (s)he wants the code of a program modified in such a way.
> The optimizer could also be written to rewrite the program code so as
> to take advantage of a coprocessor, it one is present.

	No rewriting needed for the Amiga.

	The Amiga handles all this.

> Is there some fundamental problem with what is described above?  Is anyone
> working on such an optimizer?
> I welcome your comments and criticisms.

	Just what I mentioned so far.

> 	Johan Larson
-- 
Work: Computer Consoles Inc. (CCI), Advanced Development Group (ADG)
      Irvine, CA (RISCy business! Home of the CCI POWER 6/32)
UUCP: uunet!ccicpg!harald

dillon@CORY.BERKELEY.EDU (Matt Dillon) (01/29/88)

:	In terms of floating point, things are very different here. Floating
:point operations on the Amiga are not compiled "inline", they are called.
:If you have a 68000, you can use the standard software emulation (IEEE). If
:you have the 68020, and have the 68881 or 68882 present (also detected on boot) you can simply replace the math library, and use 68881/68882 instructions
:directly. No recompiles, no changes, existing software is upward compatabible.

	What do you mean floating point operations are not compiled inline?
The *real* answer is that you can have it both ways... inline if you don't 
care about downward compatibility to a 68000, and via shared library calls
if you want downward compatibility (i.e. 68020 machines would have a different
library implementing the same functions).

:> The same agrument applies for the math co-processors, the 68881 and its new
:> sibling the 68882, which are rapidly becoming more popular on home computers.
:
:	Uh, they are? What do you have? A MacII? At a $4000.00 base price, this
:revolution is not comming soon. We are talking about some expensive hardware!

	Huh?  what kind of answer is that?  "Since I don't use it because it
costs too much nobody else uses it." ... give me a break!

:> The code optimizer, if properly written, could remove the inefficiency of
:> using primitive code on an advanced processor, simply by modifying the
:> code to take advantage of the features of the more advanced processor.
:
:	Well you lost me completely here. What advantages are not being used?

	I can think of two right off.  (1) The 68020 bit field instructions,
and (2) The 68020's extra addressing modes that allow for automatic pre
multiplication (read: array indexing in one instruction).

					-Matt

haitex@pnet01.cts.com (Wade Bickel) (01/29/88)

gary@fairlight.oz (Gary Evesson) writes:
>This is an idea that I thought about in the early days of the Mac. I think it's
>a good idea, but convincing someone to do it is another question. Writing
>optimisers is a art in itself - not an easy thing to do. It doesn't seem that
>any of the manufacturers of compilers for these machines are interested
>(I would welcome a contradiction from anyone) as far as I can tell. I guess the
>trick is to apply enough market pressure for them to do something about the
>quality of their code, which I agree is atrocious.

        Leon Frenkel's Benchmark Modula-2 compiler for the Amiga is fairly
well optimised.  I've had a few long discussions with him as to how to write
optimised source, and it is suprising the degree to which the coder can be
sloppy in his structuring and still get the same results.  Still it is
essential that certain coding guidlines be followed when possible (such as
creating optimized mathmatical expressions).

        I've been trying to convince Leon to implement some kind of programmer
controlled register allocation, and his counter argument has been that it is
non-standard, and in most cases the compiler makes better overall register
allocations than programmers would.  A strong supporting argument on his sid
is that if the coder should change previously optimized code, the compiler
will automatically re-optimize whereas the programmer often would not.
        None-the-less, I think at some point he will include programmer
register allocations via compiler directives, which I suggested (as it avoids
most transportability problems) and I think he will do, eventually...


                                                        Thanks,


                                                            Wade.


UUCP: {cbosgd, hplabs!hp-sdd, sdcsvax, nosc}!crash!pnet01!haitex
ARPA: crash!pnet01!haitex@nosc.mil
INET: haitex@pnet01.CTS.COM

dwb@apple.UUCP (David W. Berry) (02/02/88)

In article <8801281857.AA09606@cory.Berkeley.EDU> dillon@CORY.BERKELEY.EDU (Matt Dillon) writes:
>:	In terms of floating point, things are very different here. Floating
>:point operations on the Amiga are not compiled "inline", they are called.
>:If you have a 68000, you can use the standard software emulation (IEEE). If
>:you have the 68020, and have the 68881 or 68882 present (also detected on boot) you can simply replace the math library, and use 68881/68882 instructions
>:directly. No recompiles, no changes, existing software is upward compatabible.
>
>	What do you mean floating point operations are not compiled inline?
>The *real* answer is that you can have it both ways... inline if you don't 
>care about downward compatibility to a 68000, and via shared library calls
>if you want downward compatibility (i.e. 68020 machines would have a different
>library implementing the same functions).
	The same thing is true of the Macintosh if you regard the trap
mechanism as a "shared library" which it really is.  Most of the existing
68020 upgrades for the SE replace the standard traps (which run on a 68000
and even faster on an 020) with direct calls to the 881.  You lose a little
to the extra trap overhead, but you do with shared libraries too.
-- 
	David W. Berry
	dwb@well.uucp                   dwb@Delphi
	dwb@apple.com                   973-5168@408.MaBell
Disclaimer: Apple doesn't even know I have an opinion and certainly
	wouldn't want if they did.

dillon@CORY.BERKELEY.EDU (Matt Dillon) (02/02/88)

>	The same thing is true of the Macintosh if you regard the trap
>mechanism as a "shared library" which it really is.  Most of the existing
>68020 upgrades for the SE replace the standard traps (which run on a 68000
>and even faster on an 020) with direct calls to the 881.  You lose a little
>to the extra trap overhead, but you do with shared libraries too.

	I suppose one could call the mac traps shared libraries.  The 
overhead of a trap, though, is too much for my tastes.  Of course *nothing*
beats inline code, not even an optimal Amiga library call (jsr offset(Ax)/rts)

				-Matt