[comp.compilers] Compiler optimizations

mccalpin@vax1.udel.edu, , 7 Jun 90 08:46:40 -0400 (MCCALPIN) (06/08/90)

In article <1990Jun7.010349.2097@esegue.segue.boston.ma.us> James Larus writes:
>Let's make this argument more concrete.  Consider the example of summing the
>elements of a vector:
>	for i <- 1 to n do
>	  sum <- sum + A[i]
>....on a parallel computer it can [use] a parallel prefix
>algorithm that builds up a tree of additions.  The problem is that this
>optimization requires addition to be associative.  Integer and floating point
>addition isn't associative.  However, how many programmers have carefully
>analyzed their algorithm and convinced themselves that summing the elements
>from 1 to N will produce the least error? 

This code will already be modified by a number of current compilers.
I have seen the following "interpretations" of the above on existing
compilers:
(1) Calculate the sum in 80-bit precision on the 8087 stack and truncate
	on completion of the loop.
(2) Add the two numbers in 80-bit precision, then truncate, store, and
	re-load the 'sum' variable at each iteration of the loop.
(3) Add the numbers in order in unnormalized 128-bit floating-point format.
	(Cyber 205)
(4) Unroll the loop to some depth (typically 4) and calculate: (MIPS?)
	for i <- 1 to n by 4 do
	  sum <- sum + A[i] + A[i+1] + A[i+2] + A[i+3]
(5) Stripmine the loop by groups of 64 and add them separately: (Cray 1)
	set temp[i]=0.0, i=1..64
	for isect <- 1 to (n/64-1) do
	  for i <- 1 to 64 do 
	    temp[i] <- temp[i] + A[i+(isect-1)*64]+A[i+(isect*64)]
	  end_for
	end_for
	for i <- 1 to 64 do
	  sum <- sum + temp[i]
	end_for

So we might as well forget about the problems with non-associativity of FP
addition, since the compilers are going to do the calculations in a different
order than the one we chose anyway!  Unless optimization/vectorization is
turned off, you cannot count on any two machines adding that simple loop in
the same order or with the same rounding/truncation -- even if they nominally
use the same floating point rules and precisions.

>Of course, some languages such as Fortran 90 permit a programmer to express
>his intent in a manner that gives the compiler information about when order
>matters. Unfortunately, such languages are rare and only include a few common
>idioms.

Fortran-90 actually lets you express *less* about the actual order of the
calculations by allowing you to specify that order does not change the
*mathematical* result.  The array syntax includes the implicit assumption
that data dependencies do not exist, so the calculations can proceed in any
order.  This really does nothing for the problem of associativity.  It does
allow a much nicer interface to the data parallel programming paradigm.

>[I suppose there is merit to this suggestion, but the thought of an optimizer
>that second guesses the programmer and attempts to generate code for what the
>program would have said if the source language was different fills me with
>considerable unease. -John]

But that is exactly what vectorizers do, and the better ones do a pretty
good job.  The bad ones, though.... (I guess I shouldn't mention any brand
names for fear of reprisals!)

On a more philosophical note, this direction is precisely the one being taken
by the proponents of higher-level programming languages/environments.  In
order for computers to be really helpful extensions to the human intellect,
we will have to give them increasing autonomy in choosing exactly how to go
about solving a problem.  I understand that their is an active research
program at Purdue on expert systems for partial differential equations that
is attempting to do get the computer to choose (or advise the user to choose)
the discretization strategy and algorithm to solve a specified partial
differential equation.  That is a system that would probably fill John with
even more "considerable unease"!

-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.

pardo@cs.washington.edu (David Keppel) (06/08/90)

MCCALPIN <mccalpin@vax1.udel.edu> writes:
>[``ambiguous'' approach advocated by higher-level prog lang designers]

Many languages have operations that have domains for which they have
explicitly undefined behavior.  For example in C, the code 'a[i] = i++'
has two interpretations: `j=i++, a[j] = j' and `j=i++, a[i] = j'.

If we want to, we can define an iteration construct `forp i=1..n' that
executes all iterations in parallel.  If the programmer needs restricted
semantics, then the programmer restricts them explicitly.  The compiler does
not need to second-guess the programmer, the only question is whether the
compiler exploits the information given by the programmer.  So, for instance,
we can define `ford i=1..n' to be a `forp' in which all iterations are
executed in an order that satisfies the data dependencies but no other
constraints; again, no second- guessing is needed.

	;-D on  ( Second-guessing the compiler-compiler )  Pardo
-- 
		    pardo@cs.washington.edu
    {rutgers,cornell,ucsd,ubc-cs,tektronix}!uw-beaver!june!pardo
-- 
Send compilers articles to compilers@esegue.segue.boston.ma.us
{spdcc | ima | lotus}!esegue.  Meta-mail to compilers-request@esegue.
Please send responses to the author of the message, not the poster.