[comp.lang.fortran] alternatives to "noalias"?

tmb@bambleweenie57.ai.mit.edu (Thomas M. Breuel) (12/10/90)

Many optimizations that are possible because the compiler can assume
that two different names don't refer to the same location can be
expressed portably introducing explicit temporaries. Vectorization and
parallelization requires a declaration (or separate looping construct)
that states to the compiler that there are no (or "harmless")
sequential dependencies in a loop body.

What I would like to ask: ignoring programming style, can you think of
any optimizations that a FORTRAN compiler can carry out that cannot be
expressed portably using temporaries and one or two compiler
directives declaring absence of sequential dependencies in a loop
body?

david@cs.washington.edu (David Callahan) (12/11/90)

In article <TMB.90Dec10035836@bambleweenie57.ai.mit.edu> tmb@bambleweenie57.ai.mit.edu (Thomas M. Breuel) writes:
>Many optimizations that are possible because the compiler can assume
>that two different names don't refer to the same location can be
>expressed portably introducing explicit temporaries. 
...
>What I would like to ask: ignoring programming style, can you think of
>any optimizations that a FORTRAN compiler can carry out that cannot be
>expressed portably using temporaries and one or two compiler
>directives declaring absence of sequential dependencies in a loop
>body?

I'm not entirely sure what you mean by ``sequential dependencies'' but
I would note that restructuring compilers are not interested only 
in the absence of loop carried dependences (which inhibit simple
conversion of a loop to to parallel form) but how the dependences
which do exist can be used. Consider the loop:

	do i = 1,n 
	    do j = 1,m
		f(i,j) = f(i,j) + a(i,j)*f(i-1,j) + b(i,j)*f(i-2,j)

a simple second order recurence which has been vectorized in one
dimension. The output I would hope for from the current generation of 
research restructurers depends on the target machine. For a non-vector
machine, something like:

        do all j = 1,m
	    f2 = f(i-2,j)
	    f1 = f(i-1,j)
	    do i = 1,n 
		f0 = f(i,j)
		f(i,j) = f0 + a(i,j)*f1 + b(i,j)*f2
		f2 = f1 ; f1 = f0

where the assignments will later be eliminated by unrolling and
subsumption. On a vector machine with registers of length L, the j
loop is blocked and the f0,f1,f2 become vector temps. In addition to
the parallelism, the loop now has 1/3 fewer memory operations.

In this example, the critical information is that there is no alias
between f and any other variable, not that the j loop carries no
dependences.

-- 
David Callahan  (david@tera.com, david@june.cs.washington.edu,david@rice.edu)
Tera Computer Co. 	400 North 34th Street  		Seattle WA, 98103

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/11/90)

In article <14065@june.cs.washington.edu> david@june.cs.washington.edu (David Callahan) writes:
> I'm not entirely sure what you mean by ``sequential dependencies''

That's exactly the problem I've been wrestling with. Wtf is a sequential
dependency? Does it mean that the result of the loop is changed if the
index goes backwards or in a random order? (This isn't good enough.)
Does it mean that no memory location accessed during the loop is written
during the loop? (This may be too restrictive, and it may not be good
enough.) What does it mean?

If the programmer writes

  forall(i = 0;i < 100;i++) x[i] = y[i];

then the compiler had better be able to conclude that x and y don't
overlap. That's what we want out of the ``sequential dependency''
definition. But how can we achieve it?

Your ball.

---Dan

tmb@ai.mit.edu (Thomas M. Breuel) (12/11/90)

In article <12873:Dec1021:09:2890@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
|> In article <14065@june.cs.washington.edu> david@june.cs.washington.edu (David Callahan) writes:
|> > I'm not entirely sure what you mean by ``sequential dependencies''
|> 
|> That's exactly the problem I've been wrestling with. Wtf is a sequential
|> dependency? Does it mean that the result of the loop is changed if the
|> index goes backwards or in a random order? (This isn't good enough.)
|> Does it mean that no memory location accessed during the loop is written
|> during the loop? (This may be too restrictive, and it may not be good
|> enough.) What does it mean?

There are several useful abstractions:

* all expressions inside the loop body are evaluated in parallel
  (in particular, assignments). If multiple assignments
  to the same location occur, the effect of the loop is undefined 
  (useful for SIMD machines; "parallel model").

* if the effect of a loop depends in any way on the order in which
  the loop index assumes its values, the effect is undefined (i.e.,
  need not even correspond to any order in which the index could assume
  its values). This seems to be a more restrictive version of
  the PARALLEL_DO loop ("random model"), and is useful as a least
  common denominator between vectorizing and parallel machines.

* the loop index assumes its value sequentially, but the compiler
  is permitted to delay assignment to any location that is not
  explicitly declared "register" arbitrarily. This is a useful
  abstraction for vectorizing machines ("vector model"). (In C,
  register declarations do not force the compiler to put variables
  into a register; they do ensure that the location can never
  be aliased. The messiness of the "vector model" is simply a
  result of the fact that vectorizing computation itself is conceptually
  much messier than SIMD parallelism, since it allows for some
  sequential dependencies).

|> If the programmer writes
|> 
|>   forall(i = 0;i < 100;i++) x[i] = y[i];
|> 
|> then the compiler had better be able to conclude that x and y don't
|> overlap. 

No, they may overlap. In fact, if x and y point to the same location,
this loop is correct under any of the three looping constructs. If
x < y, then the loop is correct under the vector model. And, the
loop is correct for any aliasing under the parallel model.

|> That's what we want out of the ``sequential dependency''
|> definition. But how can we achieve it?

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/11/90)

In article <12337@life.ai.mit.edu> tmb@ai.mit.edu writes:
> In article <12873:Dec1021:09:2890@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
    [ how to define forall? ]
> There are several useful abstractions:
  [ some possibilities ]

But you aren't addressing the real problem. See below.

> |> If the programmer writes
> |>   forall(i = 0;i < 100;i++) x[i] = y[i];
> |> then the compiler had better be able to conclude that x and y don't
> |> overlap. 
> No, they may overlap.

No! There are vector and parallel machines where the compiler must
generate vastly different code when x and y overlap and when they don't.
Parallel code when x = y is simply *wrong*. It will generate garbage
results. Even if x and y are all filled with the same values, the above
loop may produce new values if the arrays overlap and it is run in
parallel! So any definition along the lines of ``the results must not
depend on the order of execution'' is *not enough*. Two parallel writes
may produce different results from any serial execution of the writes.

*This* is the optimization problem we have to solve. An assertion of
non-aliasing solves the problem, but forall would be a lot more
convenient. How do we define it?

---Dan