[comp.lang.fortran] no noalias not negligible

chris@mimsy.UUCP (Chris Torek) (05/21/88)

In article <54080@sun.uucp> dgh%dgh@Sun.COM (David Hough) writes:
>The results for double precision linpack on Sun-4 using SunOS 4.0 and
>Fortran 1.1 were [edited to just `rolled' case]:
>Fortran	1080 KFLOPS
>C		 850 KFLOPS

>      subroutine daxpy(n,da,dx,dy)
>      doubleprecision dx(1),dy(1),da
>      integer i,n

Incidentally, as we have just seen in comp.arch, this Fortran version
is illegal: it should declare dx and dy as

	integer n
	double precision dx(n), dy(n)

>      do 30 i = 1,n
>        dy(i) = dy(i) + da*dx(i)
> 30   continue
>      return
>      end

>The corresponding rolled C code could be written with a for loop
>daxpy(n, da, dx, dy)
>        double          dx[], dy[], da;
>        int             n;
>{
>        int             i;
>
>        for (i = 0; i < n; i++) {
>                dy[i] = dy[i] + da * dx[i];

I suggest

		dy[i] += da * dx[i];

as it is easier to understand.  (In a reasonably optimal C compiler
it should produce the same code.)

>        }
>}
> 
>but [the] Sun compilers ... won't unroll [these] loops.... [Hand unrolling
>helped but not as much as expected.]

>Investigation revealed that the reason had to do with noalias:  [the
>Fortran [version is] defined by the Fortran standard to be "noalias",
>meaning a compiler may optimize code based on the assumption that [dy
>and dx are distinct].

[X3J11's `noalias' proposal was deleted for various reasons including]
>3) optimizing compilers should be able to figure out if aliasing
>exists, which is definitely false in a separate compilation environment
>(unless you want the linker to recompile everything, in which case the
>linker is the compiler, and you're back to no separate compilation).

This is not quite right: The linker is to be the *code generator*, not
the *compiler*.  Code generation is a (relatively) small subset of the
task of compilation.  Naturally, a code-generating linker will take
longer to run than a simple linking linker, which discourages this
somewhat.  The usual solution is to generate code in the compiler
proper only when it is told not to optimise.

>Anyway there is no portable way in draft ANSI C to say "this pointers
>are guaranteed to have no aliases".
>... you don't dare load dx[i+1] before you store dy[i] if there is
>any danger that they point to the same place.

True.

>What is to be done?

Ignore it.  (Unsatisfactory.)

Provide code-generating linkers.  (Good idea but hard to do.)

Provide `unsafe' optimisation levels.  (Generally a bad idea, but
easier than code generation at link time, and typically produces faster
compile times.)

Provide `#pragma's.  Some people claim that a pragma is not allowed to
declare such semantics as volatility or lack of aliasing; I disagree.
Short of the code-generating linker, with aliasing and register
allocation computed at `link' time, this seems to me the best solution.

	/*
	 * Double precision `ax + y' (d a*x plus y => daxpy).
	 */
	void
	daxpy(n, a, x, y)
		register int n;
		register double a, *x, *y;
	#pragma notaliased(x, y)
	/* or #pragma separate, or #pragma distinct, or... */
	{

		while (--n >= 0)
			*y++ += a * *x++;
	}

to write it in C-idiom-ese.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chris@mimsy.UUCP (Chris Torek) (05/21/88)

One I forgot: inline expansion.  The latest GCC compilers have inline
functions, and `inline' is easy to add to a C compiler (although the
resulting langauge is not C).  daxpy is certainly short enough to write
in line:

	inline void
	daxpy(int n, double a, double *x, double *y) {
		int i;
		for (i = 0; i < n; i++) y[i] += a * x[i];
	}

or (as short as I can make it :-) )

	inline void daxpy(int n, double a, double *x, double *y)
		{ while (--n >= 0) *y++ += a * *x++; }

To stick to C proper, although it introduces possible name collisions,
and makes daxpy() not an expression,

	#define daxpy(n, a, x, y) do { \
		register int daxpy_n = n; \
		register double daxpy_a = a, *daxpy_x = x, *daxpy_y = y; \
		while (--daxpy_n >= 0) *daxpy_y++ += a * *daxpy_x++; \
	} while (0)

The resulting expansion (either from the cleaner but nonstandard
`inline void' versions, or the macro version) will likely not only run
faster (no subroutine call overhead) but also be in a place where the
optimiser can see whether there are aliasing problems.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

Paul_L_Schauble@cup.portal.com (05/22/88)

Chris Torek writes...

>The linker is to be the code generator...

Not quite enough, Chris. If you view the conventional compiler as having
three phases: parse, optimize, generate (generate includes local optimization),
you need the "linker" to do the last two. Actually, I like the idea.

    Paul

tps@chem.ucsd.edu (Tom Stockfisch) (05/25/88)

In article <11609@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>One I forgot: inline expansion.
>daxpy is certainly short enough to write in line:
>
>The resulting expansion (either from the cleaner but nonstandard
>`inline void' versions, or the macro version) will likely not only run
>faster (no subroutine call overhead) but also be in a place where the
>optimiser can see whether there are aliasing problems.
>In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)

I don't think this will help much.  A routine which calls something as low-level
as daxpy() probably will be given pointer arguments itself rather than be
operating directly on external arrays.  Even if it isn't, most numerical
C programming uses malloc() to get arrays and matrices.  So the compiler still
will not be able to determine when pointers are aliases.

Given the awkwardness and limitations of #pragma directives, I think the
best course for vectorizing compiler writers is to experiment with their
own ideas of what noalias (and perhaps some other mechanism as well) should
mean.  People can always port code written using noalias by compiling with
"cc -Dnoalias="
-- 

|| Tom Stockfisch, UCSD Chemistry	tps@chem.ucsd.edu

ok@quintus.UUCP (Richard A. O'Keefe) (05/25/88)

In article <221@chem.ucsd.EDU>, tps@chem.ucsd.edu (Tom Stockfisch) writes:
> Given the awkwardness and limitations of #pragma directives, I think the
> best course for vectorizing compiler writers is to experiment with their
> own ideas of what noalias (and perhaps some other mechanism as well) should
> mean.  People can always port code written using noalias by compiling with
> "cc -Dnoalias="

#pragma is pretty open-ended, so I don't see why it should be awkward.
Apart from the "no change to the meaning of the program" bit which might
as well be thrown away, #pragma seems to me a much better means than
something like noalias.

Why?  Well, let's take the daxpy() example.  The property the compiler
needs is NOT that dx has no aliases or that dy has no aliases, but that
dx and dy do not overlap.  More generally a declaration
	#pragma disjoint(x, ..., z)
might tell a compiler "assume that anything accessed via 'x' is
distinct from anything accessed via ..., 'z'" and so on.

I might well have a program with four pointers
	p, q, r, s
where	#pragma disjoint(p, q)	/* or _disjoint (p,q), (r,s); */
	#pragma disjoint(r, s)	/* or whatever */
but p and r might be potentially aliased and so might q and s.
That shouldn't inhibit optimisation in regions where only p and q
appear or where only r and s appear.

This is not a proposal for a replacement for 'noalias'.  I merely point
out that since aliasing is a property of _sets_ of variables, trying to
hack it with declarations about _single_ variables doesn't seem like a
good approach.