[comp.lang.misc] Query re optimising compilers

db@lfcs.ed.ac.uk (Dave Berry) (07/23/88)

Background:

Several languages leave the order of evaluation of the arguments of
operators undefined.  This allows compilers to optimise expressions
containing such operators.

Question:

Does anyone know of any compilers which produce code that decides the order
of evaluation at run-time?   My guess is that most of these optimisations
are made at compile-time.

Would it be sensible to optimise the following code:

a := function ();
b := (complicated expression) * a;

to the equivalent of:

a:= function ();
if (a == 0)
  b := 0;
else
  (complicated expression) * a;


Assuming:
  the result returned by "function" is a very small integer (including 0),
	and isn't known at compile-time
  "complicated_expression" has no side-effects and would take much longer
	to evaluate than a comparison with 0.
  the time taken to compile a program isn't considered.


I'd like to be able to reference such a compiler.  However, I expect that
cases like this occur too infrequently to be worth bothering with.

 Dave Berry.		  "May you have interesting weather"
 db@lfcs.ed.ac.uk	   	   -- Ancient British curse.

barmar@think.COM (Barry Margolin) (07/24/88)

In article <561@etive.ed.ac.uk> db@lfcs.ed.ac.uk (Dave Berry) writes:
>Would it be sensible to optimise the following code:
>
>a := function ();
>b := (complicated expression) * a;
>
>to the equivalent of:
>
>a:= function ();
>if (a == 0)
>  b := 0;
>else
>  (complicated expression) * a;

I think that would not be a very useful optimization for a compiler to
make in most cases.  This would not speed things up unless function()
is likely to return 0 pretty frequently.  For example, assume that "if
(a == 0)" takes 2 units of time, "b := 0" takes 1 unit of time, and "b
:= (complicated expression) * a" takes 100 units.  100 invocations of
the unoptimized version takes 10,000 units.  If function() returns 0
1% of the time, then 100 invocations of the "optimized" version takes
100*2 (for the test) + 1*1 (for the optimized case) + 99*100 (for the
regular cases) = 10,101, i.e. it is about 1% slower than the original.
For this set of timings, the break even point is when function()
returns 0 2.0202...% of the time.  In general, if the complicated
version takes N times longer than the simple version, and the test
takes M times longer than the simple version, then you win if the test
succeeds more than M/(N-1) of the times.

I would be very surprised if an optimizing compiler were to make this
transformation on its own.  If there were a pragma that permitted the
programmer to inform the compiler of the expected frequency of 0
return values then the compiler could attempt the above computation to
decide whether the transformation improves things.

This should not be confused with the common optimization where boolean
expressions are short-circuited even when non-short-circuit operators
are used.  This can be done when the expressions don't have
side-effects (just like above).  However, since boolean expressions
only have two possible values, true and false, it is quite common for
compiler implementors to assume that each value will occur a
significant fraction of the time.  Under this assumption, the
transformation of

     if <complex-expr> & <simple-expr>
     then <result>

into

     if <simple-expr>
     then if <complex-expr>
	  then <result>

is reasonable.

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

nevin1@ihlpf.ATT.COM (00704a-Liber) (07/26/88)

In article <561@etive.ed.ac.uk> db@lfcs.ed.ac.uk (Dave Berry) writes:

|Would it be sensible to optimise the following code:

|a := function ();
|b := (complicated expression) * a;

|to the equivalent of:

|a:= function ();
|if (a == 0)
|  b := 0;
|else
|  (complicated expression) * a;


|Assuming:
|  the result returned by "function" is a very small integer (including 0),
|	and isn't known at compile-time
|  "complicated_expression" has no side-effects and would take much longer
|	to evaluate than a comparison with 0.
|  the time taken to compile a program isn't considered.

This is not a good optimization, unless the compiler can determine at compile
time that 'a' is much more likely to be 0 than not.  Since this is very
difficult to determine at compile time, it is usually not done.

Note:  if you know that 'a' is going to be 0 most of the time it might pay to
recode your first example into your second example.  But you have to make the
choice, not the compiler, since this involves information that is not
(usually) known to the compiler.
-- 
 _ __			NEVIN J. LIBER	..!att!ihlpf!nevin1	(312) 979-????
' )  )				You are in a maze of little twisting
 /  / _ , __o  ____		 email paths, all different.
/  (_</_\/ <__/ / <_	These are solely MY opinions, not AT&T's, blah blah blah

firth@sei.cmu.edu (Robert Firth) (07/26/88)

In article <5423@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes:
>In article <561@etive.ed.ac.uk> db@lfcs.ed.ac.uk (Dave Berry) writes:
>
>|Would it be sensible to optimise the following code:
>
>|a := function ();
>|b := (complicated expression) * a;
>
>|to the equivalent of:
>
>|a:= function ();
>|if (a == 0)
>|  b := 0;
>|else
>|  (complicated expression) * a;
>

>This is not a good optimization, unless the compiler can determine at compile
>time that 'a' is much more likely to be 0 than not.  Since this is very
>difficult to determine at compile time, it is usually not done.

This question seems interesting enough to warrant looking at the probable
code.  If we take the customary Vax, and pick up just after the return
from the function, it looks like this:

	; result in R0
	MOVL  R0, A
	<evaluate complicated expression into Rx>
	MULL2 Rx, R0
	MOVL  R0, B

Now, with the optimisation (and here I think the compiler can do better
than Dave), we have:

	; result in R0
	MOVL  R0, A
	BEQL  1$
	<evaluate complicated expression into Rx>
	MULL2 Rx, R0
    1$:
	MOVL  R0, B

So the cost is a single conditional branch not taken, and the benefit is
presumably the cost of the "complicated expression".  Making a wild guess,
the benefit will be at least 10x the cost, (the multiply alone accounts
for half that), and substantially greater if the complicated expression
includes things like two-dimensional array elements.

So it is probably wrong to assert that A should be "much more likely to
be 0 than not"; it should be good enough if A is zero 10% of the time.
But that still seems an unlikely thing for a compiler to be able to
deduce, so the optimisation is still probably unachievable without some
clue.

In sum, this seems like an excellent example of an optimisation that a
good execution profiler would find, but that a compiler cannot.

(Note: the above code assumes integer data; the floating code looks almost
 identical but the benefit is greater)

nevin1@ihlpb.ATT.COM (Liber) (08/02/88)

In article <6376@aw.sei.cmu.edu> firth@bd.sei.cmu.edu.UUCP (Robert Firth) writes:
|In article <5423@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (Nevin's evil twin brother :-))
writes:
||In article <561@etive.ed.ac.uk> db@lfcs.ed.ac.uk (Dave Berry) writes:

|||Would it be sensible to optimise the following code:

|||a := function ();
|||b := (complicated expression) * a;

|||to the equivalent of:

|||a:= function ();
|||if (a == 0)
|||  b := 0;
|||else
|||  (complicated expression) * a;


|So the cost is a single conditional branch not taken, and the benefit is
|presumably the cost of the "complicated expression".  Making a wild guess,
|the benefit will be at least 10x the cost, (the multiply alone accounts
|for half that), and substantially greater if the complicated expression
|includes things like two-dimensional array elements.

|So it is probably wrong to assert that A should be "much more likely to
|be 0 than not"; it should be good enough if A is zero 10% of the time.
|But that still seems an unlikely thing for a compiler to be able to
|deduce, so the optimisation is still probably unachievable without some
|clue.

I think that you are correct.  However, this brings up another question:
is this truly an optimization?  The way I (used to?) think about
optimization with respect to execution speed is that

	speed of optimized code >= speed of unoptimized code

for *all instantiations* of a given source statement.  Under this definition,
things like constant folding, common subexpression elimination, etc., can
all be considered optimizations.

The proposed optimization (explicit check for 0) does not meet my criteria,
however.  Assuming it can be determined that 'a' will be '0' at least 10%
of the time (and 'complicated expression' has no side effects, of course), and
that 10% is the point at which this optimization is worth making, the
*overall* execution speed of the program will be faster (also assuming that
this program is run often enough so that statistics actually applies to
it), but any given *instantiation* of the program might be slower.

Because of this, I (currently) consider this to be an algorithm design issue
and NOT a compiler optimization issue; ie, I *don't* want my compiler doing
this for me.
-- 
 _ __			NEVIN J. LIBER	..!att!ihlpb!nevin1	(312) 979-????
' )  )		 	 I got a new job and account but no office, *phone*, or
 /  / _ , __o  ____	 paycheck; is AT&T is trying to tell me something? :-)
/  (_</_\/ <__/ / <_	These are solely MY opinions, not AT&T's, blah blah blah

firth@sei.cmu.edu (Robert Firth) (08/02/88)

In article <8445@ihlpb.ATT.COM> nevin1@ihlpb.UUCP (55528-Liber,N.J.) writes:
["optimisation" of multiply when one operand is zero]

>  However, this brings up another question:
>is this truly an optimization?  The way I (used to?) think about
>optimization with respect to execution speed is that
>
>	speed of optimized code >= speed of unoptimized code
>
>for *all instantiations* of a given source statement.  Under this definition,
>things like constant folding, common subexpression elimination, etc., can
>all be considered optimizations.

That's a very good point.  I'm inclined to agree, but I think current
practice is against us.  For example, it is considered an optimisation
to take the CSE out of

	if b1() then x := expression();
	if b2() then y := expression();

even though that costs more when both conditions are false.  Likewise,
compilers cheerfully hoist code out of loops that might be obeyed zero
times.  What do others think?  (Note: I'm aware there are also compiler
correctness issues here, but that's not the point)

Incidentally, I found a great example where constant folding is a
pessimisation (it was on the PE3200), and a colleague found another
on the Mips R2000.  It can be faster on some machines to add two
"short" constants than to load a "long" one, eg on PE3200 this

	x := 12
	y := 12+4

is faster than

	x := 12
	y := 16

Computers are strange creatures.

smryan@garth.UUCP (Steven Ryan) (08/03/88)

>                               However, this brings up another question:
>is this truly an optimization?  The way I (used to?) think about
>optimization with respect to execution speed is that
>
>	speed of optimized code >= speed of unoptimized code
>
>for *all instantiations* of a given source statement.

The fine print has always read that it is attempt to speedup a program that
helps in most cases, but once in while hurts.

The various optimisations interact in a nonlinear way so that any attempt
to get better code will sometimes produce worse code.

hjm@cernvax.UUCP (hjm) (08/08/88)

If my memory serves me, some of the new RISCy chips do an explicit test for
zero as they are tweaked for multiplication by small integers, so the compiler
needn't do the optimisation anyway for these chips :-).

	Hubert Matthews