tmb@bambleweenie57.ai.mit.edu (Thomas M. Breuel) (12/10/90)
Many optimizations that are possible because the compiler can assume that two different names don't refer to the same location can be expressed portably introducing explicit temporaries. Vectorization and parallelization requires a declaration (or separate looping construct) that states to the compiler that there are no (or "harmless") sequential dependencies in a loop body. What I would like to ask: ignoring programming style, can you think of any optimizations that a FORTRAN compiler can carry out that cannot be expressed portably using temporaries and one or two compiler directives declaring absence of sequential dependencies in a loop body?
david@cs.washington.edu (David Callahan) (12/11/90)
In article <TMB.90Dec10035836@bambleweenie57.ai.mit.edu> tmb@bambleweenie57.ai.mit.edu (Thomas M. Breuel) writes: >Many optimizations that are possible because the compiler can assume >that two different names don't refer to the same location can be >expressed portably introducing explicit temporaries. ... >What I would like to ask: ignoring programming style, can you think of >any optimizations that a FORTRAN compiler can carry out that cannot be >expressed portably using temporaries and one or two compiler >directives declaring absence of sequential dependencies in a loop >body? I'm not entirely sure what you mean by ``sequential dependencies'' but I would note that restructuring compilers are not interested only in the absence of loop carried dependences (which inhibit simple conversion of a loop to to parallel form) but how the dependences which do exist can be used. Consider the loop: do i = 1,n do j = 1,m f(i,j) = f(i,j) + a(i,j)*f(i-1,j) + b(i,j)*f(i-2,j) a simple second order recurence which has been vectorized in one dimension. The output I would hope for from the current generation of research restructurers depends on the target machine. For a non-vector machine, something like: do all j = 1,m f2 = f(i-2,j) f1 = f(i-1,j) do i = 1,n f0 = f(i,j) f(i,j) = f0 + a(i,j)*f1 + b(i,j)*f2 f2 = f1 ; f1 = f0 where the assignments will later be eliminated by unrolling and subsumption. On a vector machine with registers of length L, the j loop is blocked and the f0,f1,f2 become vector temps. In addition to the parallelism, the loop now has 1/3 fewer memory operations. In this example, the critical information is that there is no alias between f and any other variable, not that the j loop carries no dependences. -- David Callahan (david@tera.com, david@june.cs.washington.edu,david@rice.edu) Tera Computer Co. 400 North 34th Street Seattle WA, 98103
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/11/90)
In article <14065@june.cs.washington.edu> david@june.cs.washington.edu (David Callahan) writes: > I'm not entirely sure what you mean by ``sequential dependencies'' That's exactly the problem I've been wrestling with. Wtf is a sequential dependency? Does it mean that the result of the loop is changed if the index goes backwards or in a random order? (This isn't good enough.) Does it mean that no memory location accessed during the loop is written during the loop? (This may be too restrictive, and it may not be good enough.) What does it mean? If the programmer writes forall(i = 0;i < 100;i++) x[i] = y[i]; then the compiler had better be able to conclude that x and y don't overlap. That's what we want out of the ``sequential dependency'' definition. But how can we achieve it? Your ball. ---Dan
tmb@ai.mit.edu (Thomas M. Breuel) (12/11/90)
In article <12873:Dec1021:09:2890@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: |> In article <14065@june.cs.washington.edu> david@june.cs.washington.edu (David Callahan) writes: |> > I'm not entirely sure what you mean by ``sequential dependencies'' |> |> That's exactly the problem I've been wrestling with. Wtf is a sequential |> dependency? Does it mean that the result of the loop is changed if the |> index goes backwards or in a random order? (This isn't good enough.) |> Does it mean that no memory location accessed during the loop is written |> during the loop? (This may be too restrictive, and it may not be good |> enough.) What does it mean? There are several useful abstractions: * all expressions inside the loop body are evaluated in parallel (in particular, assignments). If multiple assignments to the same location occur, the effect of the loop is undefined (useful for SIMD machines; "parallel model"). * if the effect of a loop depends in any way on the order in which the loop index assumes its values, the effect is undefined (i.e., need not even correspond to any order in which the index could assume its values). This seems to be a more restrictive version of the PARALLEL_DO loop ("random model"), and is useful as a least common denominator between vectorizing and parallel machines. * the loop index assumes its value sequentially, but the compiler is permitted to delay assignment to any location that is not explicitly declared "register" arbitrarily. This is a useful abstraction for vectorizing machines ("vector model"). (In C, register declarations do not force the compiler to put variables into a register; they do ensure that the location can never be aliased. The messiness of the "vector model" is simply a result of the fact that vectorizing computation itself is conceptually much messier than SIMD parallelism, since it allows for some sequential dependencies). |> If the programmer writes |> |> forall(i = 0;i < 100;i++) x[i] = y[i]; |> |> then the compiler had better be able to conclude that x and y don't |> overlap. No, they may overlap. In fact, if x and y point to the same location, this loop is correct under any of the three looping constructs. If x < y, then the loop is correct under the vector model. And, the loop is correct for any aliasing under the parallel model. |> That's what we want out of the ``sequential dependency'' |> definition. But how can we achieve it?
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/11/90)
In article <12337@life.ai.mit.edu> tmb@ai.mit.edu writes: > In article <12873:Dec1021:09:2890@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: [ how to define forall? ] > There are several useful abstractions: [ some possibilities ] But you aren't addressing the real problem. See below. > |> If the programmer writes > |> forall(i = 0;i < 100;i++) x[i] = y[i]; > |> then the compiler had better be able to conclude that x and y don't > |> overlap. > No, they may overlap. No! There are vector and parallel machines where the compiler must generate vastly different code when x and y overlap and when they don't. Parallel code when x = y is simply *wrong*. It will generate garbage results. Even if x and y are all filled with the same values, the above loop may produce new values if the arrays overlap and it is run in parallel! So any definition along the lines of ``the results must not depend on the order of execution'' is *not enough*. Two parallel writes may produce different results from any serial execution of the writes. *This* is the optimization problem we have to solve. An assertion of non-aliasing solves the problem, but forall would be a lot more convenient. How do we define it? ---Dan