[comp.lang.c] no noalias not negligible

chris@mimsy.UUCP (Chris Torek) (05/21/88)

In article <54080@sun.uucp> dgh%dgh@Sun.COM (David Hough) writes:
>The results for double precision linpack on Sun-4 using SunOS 4.0 and
>Fortran 1.1 were [edited to just `rolled' case]:
>Fortran	1080 KFLOPS
>C		 850 KFLOPS

>      subroutine daxpy(n,da,dx,dy)
>      doubleprecision dx(1),dy(1),da
>      integer i,n

Incidentally, as we have just seen in comp.arch, this Fortran version
is illegal: it should declare dx and dy as

	integer n
	double precision dx(n), dy(n)

>      do 30 i = 1,n
>        dy(i) = dy(i) + da*dx(i)
> 30   continue
>      return
>      end

>The corresponding rolled C code could be written with a for loop
>daxpy(n, da, dx, dy)
>        double          dx[], dy[], da;
>        int             n;
>{
>        int             i;
>
>        for (i = 0; i < n; i++) {
>                dy[i] = dy[i] + da * dx[i];

I suggest

		dy[i] += da * dx[i];

as it is easier to understand.  (In a reasonably optimal C compiler
it should produce the same code.)

>        }
>}
> 
>but [the] Sun compilers ... won't unroll [these] loops.... [Hand unrolling
>helped but not as much as expected.]

>Investigation revealed that the reason had to do with noalias:  [the
>Fortran [version is] defined by the Fortran standard to be "noalias",
>meaning a compiler may optimize code based on the assumption that [dy
>and dx are distinct].

[X3J11's `noalias' proposal was deleted for various reasons including]
>3) optimizing compilers should be able to figure out if aliasing
>exists, which is definitely false in a separate compilation environment
>(unless you want the linker to recompile everything, in which case the
>linker is the compiler, and you're back to no separate compilation).

This is not quite right: The linker is to be the *code generator*, not
the *compiler*.  Code generation is a (relatively) small subset of the
task of compilation.  Naturally, a code-generating linker will take
longer to run than a simple linking linker, which discourages this
somewhat.  The usual solution is to generate code in the compiler
proper only when it is told not to optimise.

>Anyway there is no portable way in draft ANSI C to say "this pointers
>are guaranteed to have no aliases".
>... you don't dare load dx[i+1] before you store dy[i] if there is
>any danger that they point to the same place.

True.

>What is to be done?

Ignore it.  (Unsatisfactory.)

Provide code-generating linkers.  (Good idea but hard to do.)

Provide `unsafe' optimisation levels.  (Generally a bad idea, but
easier than code generation at link time, and typically produces faster
compile times.)

Provide `#pragma's.  Some people claim that a pragma is not allowed to
declare such semantics as volatility or lack of aliasing; I disagree.
Short of the code-generating linker, with aliasing and register
allocation computed at `link' time, this seems to me the best solution.

	/*
	 * Double precision `ax + y' (d a*x plus y => daxpy).
	 */
	void
	daxpy(n, a, x, y)
		register int n;
		register double a, *x, *y;
	#pragma notaliased(x, y)
	/* or #pragma separate, or #pragma distinct, or... */
	{

		while (--n >= 0)
			*y++ += a * *x++;
	}

to write it in C-idiom-ese.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chris@mimsy.UUCP (Chris Torek) (05/21/88)

One I forgot: inline expansion.  The latest GCC compilers have inline
functions, and `inline' is easy to add to a C compiler (although the
resulting langauge is not C).  daxpy is certainly short enough to write
in line:

	inline void
	daxpy(int n, double a, double *x, double *y) {
		int i;
		for (i = 0; i < n; i++) y[i] += a * x[i];
	}

or (as short as I can make it :-) )

	inline void daxpy(int n, double a, double *x, double *y)
		{ while (--n >= 0) *y++ += a * *x++; }

To stick to C proper, although it introduces possible name collisions,
and makes daxpy() not an expression,

	#define daxpy(n, a, x, y) do { \
		register int daxpy_n = n; \
		register double daxpy_a = a, *daxpy_x = x, *daxpy_y = y; \
		while (--daxpy_n >= 0) *daxpy_y++ += a * *daxpy_x++; \
	} while (0)

The resulting expansion (either from the cleaner but nonstandard
`inline void' versions, or the macro version) will likely not only run
faster (no subroutine call overhead) but also be in a place where the
optimiser can see whether there are aliasing problems.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/21/88)

In article <11608@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>Provide `#pragma's.  Some people claim that a pragma is not allowed to
>declare such semantics as volatility or lack of aliasing; I disagree.

Although you can find X3J11 committee members on both sides of this
debate, apparently all the editors of official documents are in
agreement that the proposed Standard does not permit (and certainly
is not intended to permit) #pragma to allow violation of any portion
of the rest of the Standard specifications.  Since the rest of the
proposed Standard requires honoring of aliasing, #pragma cannot be
used to change this.

I used to think otherwise but have been convinced of this.  I agree
that it is not obvious.

dmr@alice.UUCP (05/22/88)

Since the noalias issue has surfaced again, I offer a few thoughts on the
issue.

The comments of Hough, and the views of all the people on X3J11 who
wanted some way to express the "noalias" concept, are worth paying
attention to.  When a function with two pointer arguments is called,
the pointers are allowed to point to overlapping things, and this
inhibits otherwise plausible optimizations (vectorization especially)
in the function.  Other possibilities for pointers to overlapping
things occur, like externals vs. parameters, but in practice
Hough's example represents the problem that purveyors of vector
machines worry most about.

The problem with the noalias proposal, as embodied in the January
draft, was not at all that it attacked a nonexistent problem, but
that it did far too much.  If it had merely provided a way of
saying, "I promise that this pointer can be trusted not to point to
data accessed by another path," and if the scope of the promise
were limited reasonably, it would have been accepted without
any real quarrel; there might have been mutterings from mossbacks
like me about balancing the burden of language complexity against
the benefit, but no outcry.

Instead, the concept was illegitimately bound to the notion
of data type, and was made very dangerous; the rules as constituted
would have forced programmers into promises they didn't understand
and couldn't keep.  Some of the examples are in the screed I
posted some months ago.

I don't think there are easy answers to the problem.  As it
stands, C is hard to vectorize because of aliasing.  Fortran
is easier because of some global rules: for example, two parameters,
or a parameter and a COMMON, are not allowed to be aliased.
X3J11 was understandably unwilling to introduce Fortran-style
rules because it would represent a subtle and dangerous change
in the interpretation of existing programs.  Indeed, the Fortran
rules are subtle and to some extent dangerous even for Fortran.
Many years ago I was involved in a large system (Altran)
written in Fortran, and its most stubborn bugs owed to unsuspecting
violation of Fortran's aliasing (rather, noaliasing) rules.

I think there is little doubt that the best solution for C is to
use a #pragma, and that it would have been best for X3J11 to suggest one.
Because I thought it was absolutely essential to get rid of the
January version of noalias, and no variations on it worked any better,
I made a calculated decision not to propose an alternative; no idea
seemed attractive enough to avoid further controversy and consequent
distraction from the problems with the draft's version.  Gwyn,
by the way, seems to be correct in observing that the rules for #pragma,
as written, prohibit using it to make promises about aliasing.
Thus making a formal #pragma proposal would have opened up a wrangle about
#pragmas in general.  There was not enough time to do the job properly.
If all this had happened two years ago, something could have been worked
out, but it was too late for this standard.

	Dennis Ritchie
	research!dmr
	dmr@research.att.com

Paul_L_Schauble@cup.portal.com (05/22/88)

Chris Torek writes...

>The linker is to be the code generator...

Not quite enough, Chris. If you view the conventional compiler as having
three phases: parse, optimize, generate (generate includes local optimization),
you need the "linker" to do the last two. Actually, I like the idea.

    Paul

henry@utzoo.uucp (Henry Spencer) (05/23/88)

> ... all the editors of official documents are in
> agreement that the proposed Standard does not permit (and certainly
> is not intended to permit) #pragma to allow violation of any portion
> of the rest of the Standard specifications...

This is, of course, yet another example of why most ANSI C compilers
will have an option setting with two values:  "strict conformance" and
"do it right".  The fact is that #pragma is the obvious way to declare
such additional information about programs, and that's what most compiler
writers will use.  The alternative of having "__magic_cookie" identifiers
is much less satisfactory; with #pragma there is a reasonable chance that
the code remains portable (although its efficiency may not).  Compilers
are required to ignore unrecognized #pragmas (although I for one think
a warning message is in order...), but "__magic_cookie" is a different
and messier situation.

Also, after some study of the 2nd-public-comment draft, I fear I can't
support the editors' view.  Depending on how one interprets the meaning
of various words, one *can* come up with that conclusion -- but it is
*NOT* the "interpretation of least astonishment".  The description of
#pragma just says "causes the implementation to behave in an implemen-
tation-defined manner".  While this arguably doesn't allow alteration of
the semantics of other parts of the program, it seems to me that it does
allow the #pragma itself to turn into "abort();" or "printf("foo\n");" or
"#include <somethingfunny>".  This being the case, no strictly-conforming
program can contain a #pragma, since this would make its output depend on
implementation-defined characteristics.  THAT being the case, strange
semantics for #pragma are an extension which does not alter the behavior
of strictly-conforming programs -- and this sort of extension is explicitly
legal for conforming compilers.

I can find no hole in the above argument without violating the Law of Least
Astonishment... which is the principle that persons other than X3J11 members
will necessarily rely on to interpret the standard.

If X3J11 wishes #pragma to be constrained not to alter the semantics of
the rest of the language, it is going to have to make this explicit.  Any
such effort should recognize that this restriction will definitely be in
the running for "clause most frequently ignored for good reason".
-- 
NASA is to spaceflight as            |  Henry Spencer @ U of Toronto Zoology
the Post Office is to mail.          | {ihnp4,decvax,uunet!mnetor}!utzoo!henry

karl@haddock.ISC.COM (Karl Heuer) (05/24/88)

In article <1988May21.030207.25063@light.uucp> bvs@light.UUCP (Bakul Shah) writes:
>In article <54080@sun.uucp> dgh%dgh@Sun.COM (David Hough) writes:
>>Anyway there is no portable way in draft ANSI C to say "this pointers are
>>guaranteed to have no aliases".
>
>How about adding a test before the for loop?  Something like:
>#define overlap(x,y,n)    (!(x + n <= y || y + n <= x))
>	if (overlap(dx, dy, n))
>		return complain("overlapping arrays\n");
>
>Now a smart compiler can figure out that dx, dy don't overlap ...

The information is there, and a human reader can prove it, but I don't think
they make compilers that smart yet.

>note that in a sense this is an explicit translation of fortran's [rule]

Actually, a better translation might be
	assert(!overlap(dx, dy, n));
(though we'd need a slight change to the definition of assert to make this
useful for optimization).

>extern int daxpy(int n, double da, double dx[], double dy[])
>		/* call */if (!overlap(dx, dy, n));
>The syntax is unambiguous...

But it differs from currently-legal syntax only by the absence of a semicolon
separating the lines.  For that reason, i'd prefer something else.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

dik@cwi.nl (Dik T. Winter) (05/24/88)

In article <1988May21.030207.25063@light.uucp> bvs@light.UUCP (Bakul Shah) writes:
 >In article <54080@sun.uucp> dgh%dgh@Sun.COM (David Hough) writes:
 >>Anyway there is no portable way in draft ANSI C to say "this pointers are
 >>guaranteed to have no aliases".
 >
 >How about adding a test before the for loop?  Something like:
 >#define overlap(x,y,n)    (!(x + n <= y || y + n <= x))
 >	if (overlap(dx, dy, n))
 >		return complain("overlapping arrays\n");
 >
Note however that in many cases it is much more difficult to ascertain
overlap (i.e. 2 columns of matrices).  And there are cases where it
is just as difficult as the operations themselves.
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

tps@chem.ucsd.edu (Tom Stockfisch) (05/25/88)

In article <11609@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>One I forgot: inline expansion.
>daxpy is certainly short enough to write in line:
>
>The resulting expansion (either from the cleaner but nonstandard
>`inline void' versions, or the macro version) will likely not only run
>faster (no subroutine call overhead) but also be in a place where the
>optimiser can see whether there are aliasing problems.
>In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)

I don't think this will help much.  A routine which calls something as low-level
as daxpy() probably will be given pointer arguments itself rather than be
operating directly on external arrays.  Even if it isn't, most numerical
C programming uses malloc() to get arrays and matrices.  So the compiler still
will not be able to determine when pointers are aliases.

Given the awkwardness and limitations of #pragma directives, I think the
best course for vectorizing compiler writers is to experiment with their
own ideas of what noalias (and perhaps some other mechanism as well) should
mean.  People can always port code written using noalias by compiling with
"cc -Dnoalias="
-- 

|| Tom Stockfisch, UCSD Chemistry	tps@chem.ucsd.edu

faustus@ic.Berkeley.EDU (Wayne A. Christopher) (05/25/88)

In article <4152@haddock.ISC.COM>, karl@haddock.ISC.COM (Karl Heuer) writes:
> >How about adding a test before the for loop?  Something like:
> >#define overlap(x,y,n)    (!(x + n <= y || y + n <= x))
> >	if (overlap(dx, dy, n))
> >		return complain("overlapping arrays\n");
> >
> >Now a smart compiler can figure out that dx, dy don't overlap ...
> 
> The information is there, and a human reader can prove it, but I don't think
> they make compilers that smart yet.

The compler doesn't have to see it.  Probably a better way of saying it is
that the compiler will translate

	<conservative, un-optimized code>

into

	if ( <conditions like noalias hold> ) {
		<highly-optimized, vectorized code>
	} else {
		<conservative, un-optimized code>
	}

Are there any constructs that couldn't be figured out at run-time without a
large penalty?  Maybe if you were re-directing through an array of pointers...

	Wayne

ok@quintus.UUCP (Richard A. O'Keefe) (05/25/88)

In article <221@chem.ucsd.EDU>, tps@chem.ucsd.edu (Tom Stockfisch) writes:
> Given the awkwardness and limitations of #pragma directives, I think the
> best course for vectorizing compiler writers is to experiment with their
> own ideas of what noalias (and perhaps some other mechanism as well) should
> mean.  People can always port code written using noalias by compiling with
> "cc -Dnoalias="

#pragma is pretty open-ended, so I don't see why it should be awkward.
Apart from the "no change to the meaning of the program" bit which might
as well be thrown away, #pragma seems to me a much better means than
something like noalias.

Why?  Well, let's take the daxpy() example.  The property the compiler
needs is NOT that dx has no aliases or that dy has no aliases, but that
dx and dy do not overlap.  More generally a declaration
	#pragma disjoint(x, ..., z)
might tell a compiler "assume that anything accessed via 'x' is
distinct from anything accessed via ..., 'z'" and so on.

I might well have a program with four pointers
	p, q, r, s
where	#pragma disjoint(p, q)	/* or _disjoint (p,q), (r,s); */
	#pragma disjoint(r, s)	/* or whatever */
but p and r might be potentially aliased and so might q and s.
That shouldn't inhibit optimisation in regions where only p and q
appear or where only r and s appear.

This is not a proposal for a replacement for 'noalias'.  I merely point
out that since aliasing is a property of _sets_ of variables, trying to
hack it with declarations about _single_ variables doesn't seem like a
good approach.

turner@sdti.UUCP (Prescott K. Turner) (05/27/88)

In article <1988May21.030207.25063@light.uucp> bvs@light.UUCP (Bakul Shah) writes:
>In article <54080@sun.uucp> dgh%dgh@Sun.COM (David Hough) writes:
>>Anyway there is no portable way in draft ANSI C to say "this pointers are
>>guaranteed to have no aliases".
>
>How about adding a test before the for loop?  Something like:
>#define overlap(x,y,n)    (!(x + n <= y || y + n <= x))
>       if (overlap(dx, dy, n))
>               return complain("overlapping arrays\n");
>
>Now a smart compiler can figure out that dx, dy don't overlap ...

This test make non-standard assumptions.  In draft ANSI C,
those pointer comparisons are undefined (meaning your program might
crash and burn) unless "x" and "y" point within the same object.
Anyone know a machine where this could be experienced?
--
Prescott K. Turner, Jr.
Software Development Technologies, Inc.
375 Dutton Rd., Sudbury, MA 01776 USA        (617) 443-5779
UUCP:genrad!mrst!sdti!turner