db@lfcs.ed.ac.uk (Dave Berry) (07/23/88)
Background: Several languages leave the order of evaluation of the arguments of operators undefined. This allows compilers to optimise expressions containing such operators. Question: Does anyone know of any compilers which produce code that decides the order of evaluation at run-time? My guess is that most of these optimisations are made at compile-time. Would it be sensible to optimise the following code: a := function (); b := (complicated expression) * a; to the equivalent of: a:= function (); if (a == 0) b := 0; else (complicated expression) * a; Assuming: the result returned by "function" is a very small integer (including 0), and isn't known at compile-time "complicated_expression" has no side-effects and would take much longer to evaluate than a comparison with 0. the time taken to compile a program isn't considered. I'd like to be able to reference such a compiler. However, I expect that cases like this occur too infrequently to be worth bothering with. Dave Berry. "May you have interesting weather" db@lfcs.ed.ac.uk -- Ancient British curse.
barmar@think.COM (Barry Margolin) (07/24/88)
In article <561@etive.ed.ac.uk> db@lfcs.ed.ac.uk (Dave Berry) writes: >Would it be sensible to optimise the following code: > >a := function (); >b := (complicated expression) * a; > >to the equivalent of: > >a:= function (); >if (a == 0) > b := 0; >else > (complicated expression) * a; I think that would not be a very useful optimization for a compiler to make in most cases. This would not speed things up unless function() is likely to return 0 pretty frequently. For example, assume that "if (a == 0)" takes 2 units of time, "b := 0" takes 1 unit of time, and "b := (complicated expression) * a" takes 100 units. 100 invocations of the unoptimized version takes 10,000 units. If function() returns 0 1% of the time, then 100 invocations of the "optimized" version takes 100*2 (for the test) + 1*1 (for the optimized case) + 99*100 (for the regular cases) = 10,101, i.e. it is about 1% slower than the original. For this set of timings, the break even point is when function() returns 0 2.0202...% of the time. In general, if the complicated version takes N times longer than the simple version, and the test takes M times longer than the simple version, then you win if the test succeeds more than M/(N-1) of the times. I would be very surprised if an optimizing compiler were to make this transformation on its own. If there were a pragma that permitted the programmer to inform the compiler of the expected frequency of 0 return values then the compiler could attempt the above computation to decide whether the transformation improves things. This should not be confused with the common optimization where boolean expressions are short-circuited even when non-short-circuit operators are used. This can be done when the expressions don't have side-effects (just like above). However, since boolean expressions only have two possible values, true and false, it is quite common for compiler implementors to assume that each value will occur a significant fraction of the time. Under this assumption, the transformation of if <complex-expr> & <simple-expr> then <result> into if <simple-expr> then if <complex-expr> then <result> is reasonable. Barry Margolin Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
nevin1@ihlpf.ATT.COM (00704a-Liber) (07/26/88)
In article <561@etive.ed.ac.uk> db@lfcs.ed.ac.uk (Dave Berry) writes: |Would it be sensible to optimise the following code: |a := function (); |b := (complicated expression) * a; |to the equivalent of: |a:= function (); |if (a == 0) | b := 0; |else | (complicated expression) * a; |Assuming: | the result returned by "function" is a very small integer (including 0), | and isn't known at compile-time | "complicated_expression" has no side-effects and would take much longer | to evaluate than a comparison with 0. | the time taken to compile a program isn't considered. This is not a good optimization, unless the compiler can determine at compile time that 'a' is much more likely to be 0 than not. Since this is very difficult to determine at compile time, it is usually not done. Note: if you know that 'a' is going to be 0 most of the time it might pay to recode your first example into your second example. But you have to make the choice, not the compiler, since this involves information that is not (usually) known to the compiler. -- _ __ NEVIN J. LIBER ..!att!ihlpf!nevin1 (312) 979-???? ' ) ) You are in a maze of little twisting / / _ , __o ____ email paths, all different. / (_</_\/ <__/ / <_ These are solely MY opinions, not AT&T's, blah blah blah
firth@sei.cmu.edu (Robert Firth) (07/26/88)
In article <5423@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704a-Liber,N.J.) writes: >In article <561@etive.ed.ac.uk> db@lfcs.ed.ac.uk (Dave Berry) writes: > >|Would it be sensible to optimise the following code: > >|a := function (); >|b := (complicated expression) * a; > >|to the equivalent of: > >|a:= function (); >|if (a == 0) >| b := 0; >|else >| (complicated expression) * a; > >This is not a good optimization, unless the compiler can determine at compile >time that 'a' is much more likely to be 0 than not. Since this is very >difficult to determine at compile time, it is usually not done. This question seems interesting enough to warrant looking at the probable code. If we take the customary Vax, and pick up just after the return from the function, it looks like this: ; result in R0 MOVL R0, A <evaluate complicated expression into Rx> MULL2 Rx, R0 MOVL R0, B Now, with the optimisation (and here I think the compiler can do better than Dave), we have: ; result in R0 MOVL R0, A BEQL 1$ <evaluate complicated expression into Rx> MULL2 Rx, R0 1$: MOVL R0, B So the cost is a single conditional branch not taken, and the benefit is presumably the cost of the "complicated expression". Making a wild guess, the benefit will be at least 10x the cost, (the multiply alone accounts for half that), and substantially greater if the complicated expression includes things like two-dimensional array elements. So it is probably wrong to assert that A should be "much more likely to be 0 than not"; it should be good enough if A is zero 10% of the time. But that still seems an unlikely thing for a compiler to be able to deduce, so the optimisation is still probably unachievable without some clue. In sum, this seems like an excellent example of an optimisation that a good execution profiler would find, but that a compiler cannot. (Note: the above code assumes integer data; the floating code looks almost identical but the benefit is greater)
nevin1@ihlpb.ATT.COM (Liber) (08/02/88)
In article <6376@aw.sei.cmu.edu> firth@bd.sei.cmu.edu.UUCP (Robert Firth) writes: |In article <5423@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (Nevin's evil twin brother :-)) writes: ||In article <561@etive.ed.ac.uk> db@lfcs.ed.ac.uk (Dave Berry) writes: |||Would it be sensible to optimise the following code: |||a := function (); |||b := (complicated expression) * a; |||to the equivalent of: |||a:= function (); |||if (a == 0) ||| b := 0; |||else ||| (complicated expression) * a; |So the cost is a single conditional branch not taken, and the benefit is |presumably the cost of the "complicated expression". Making a wild guess, |the benefit will be at least 10x the cost, (the multiply alone accounts |for half that), and substantially greater if the complicated expression |includes things like two-dimensional array elements. |So it is probably wrong to assert that A should be "much more likely to |be 0 than not"; it should be good enough if A is zero 10% of the time. |But that still seems an unlikely thing for a compiler to be able to |deduce, so the optimisation is still probably unachievable without some |clue. I think that you are correct. However, this brings up another question: is this truly an optimization? The way I (used to?) think about optimization with respect to execution speed is that speed of optimized code >= speed of unoptimized code for *all instantiations* of a given source statement. Under this definition, things like constant folding, common subexpression elimination, etc., can all be considered optimizations. The proposed optimization (explicit check for 0) does not meet my criteria, however. Assuming it can be determined that 'a' will be '0' at least 10% of the time (and 'complicated expression' has no side effects, of course), and that 10% is the point at which this optimization is worth making, the *overall* execution speed of the program will be faster (also assuming that this program is run often enough so that statistics actually applies to it), but any given *instantiation* of the program might be slower. Because of this, I (currently) consider this to be an algorithm design issue and NOT a compiler optimization issue; ie, I *don't* want my compiler doing this for me. -- _ __ NEVIN J. LIBER ..!att!ihlpb!nevin1 (312) 979-???? ' ) ) I got a new job and account but no office, *phone*, or / / _ , __o ____ paycheck; is AT&T is trying to tell me something? :-) / (_</_\/ <__/ / <_ These are solely MY opinions, not AT&T's, blah blah blah
firth@sei.cmu.edu (Robert Firth) (08/02/88)
In article <8445@ihlpb.ATT.COM> nevin1@ihlpb.UUCP (55528-Liber,N.J.) writes: ["optimisation" of multiply when one operand is zero] > However, this brings up another question: >is this truly an optimization? The way I (used to?) think about >optimization with respect to execution speed is that > > speed of optimized code >= speed of unoptimized code > >for *all instantiations* of a given source statement. Under this definition, >things like constant folding, common subexpression elimination, etc., can >all be considered optimizations. That's a very good point. I'm inclined to agree, but I think current practice is against us. For example, it is considered an optimisation to take the CSE out of if b1() then x := expression(); if b2() then y := expression(); even though that costs more when both conditions are false. Likewise, compilers cheerfully hoist code out of loops that might be obeyed zero times. What do others think? (Note: I'm aware there are also compiler correctness issues here, but that's not the point) Incidentally, I found a great example where constant folding is a pessimisation (it was on the PE3200), and a colleague found another on the Mips R2000. It can be faster on some machines to add two "short" constants than to load a "long" one, eg on PE3200 this x := 12 y := 12+4 is faster than x := 12 y := 16 Computers are strange creatures.
smryan@garth.UUCP (Steven Ryan) (08/03/88)
> However, this brings up another question: >is this truly an optimization? The way I (used to?) think about >optimization with respect to execution speed is that > > speed of optimized code >= speed of unoptimized code > >for *all instantiations* of a given source statement. The fine print has always read that it is attempt to speedup a program that helps in most cases, but once in while hurts. The various optimisations interact in a nonlinear way so that any attempt to get better code will sometimes produce worse code.
hjm@cernvax.UUCP (hjm) (08/08/88)
If my memory serves me, some of the new RISCy chips do an explicit test for zero as they are tweaked for multiplication by small integers, so the compiler needn't do the optimisation anyway for these chips :-). Hubert Matthews