chris@mimsy.UUCP (Chris Torek) (05/21/88)
In article <54080@sun.uucp> dgh%dgh@Sun.COM (David Hough) writes: >The results for double precision linpack on Sun-4 using SunOS 4.0 and >Fortran 1.1 were [edited to just `rolled' case]: >Fortran 1080 KFLOPS >C 850 KFLOPS > subroutine daxpy(n,da,dx,dy) > doubleprecision dx(1),dy(1),da > integer i,n Incidentally, as we have just seen in comp.arch, this Fortran version is illegal: it should declare dx and dy as integer n double precision dx(n), dy(n) > do 30 i = 1,n > dy(i) = dy(i) + da*dx(i) > 30 continue > return > end >The corresponding rolled C code could be written with a for loop >daxpy(n, da, dx, dy) > double dx[], dy[], da; > int n; >{ > int i; > > for (i = 0; i < n; i++) { > dy[i] = dy[i] + da * dx[i]; I suggest dy[i] += da * dx[i]; as it is easier to understand. (In a reasonably optimal C compiler it should produce the same code.) > } >} > >but [the] Sun compilers ... won't unroll [these] loops.... [Hand unrolling >helped but not as much as expected.] >Investigation revealed that the reason had to do with noalias: [the >Fortran [version is] defined by the Fortran standard to be "noalias", >meaning a compiler may optimize code based on the assumption that [dy >and dx are distinct]. [X3J11's `noalias' proposal was deleted for various reasons including] >3) optimizing compilers should be able to figure out if aliasing >exists, which is definitely false in a separate compilation environment >(unless you want the linker to recompile everything, in which case the >linker is the compiler, and you're back to no separate compilation). This is not quite right: The linker is to be the *code generator*, not the *compiler*. Code generation is a (relatively) small subset of the task of compilation. Naturally, a code-generating linker will take longer to run than a simple linking linker, which discourages this somewhat. The usual solution is to generate code in the compiler proper only when it is told not to optimise. >Anyway there is no portable way in draft ANSI C to say "this pointers >are guaranteed to have no aliases". >... you don't dare load dx[i+1] before you store dy[i] if there is >any danger that they point to the same place. True. >What is to be done? Ignore it. (Unsatisfactory.) Provide code-generating linkers. (Good idea but hard to do.) Provide `unsafe' optimisation levels. (Generally a bad idea, but easier than code generation at link time, and typically produces faster compile times.) Provide `#pragma's. Some people claim that a pragma is not allowed to declare such semantics as volatility or lack of aliasing; I disagree. Short of the code-generating linker, with aliasing and register allocation computed at `link' time, this seems to me the best solution. /* * Double precision `ax + y' (d a*x plus y => daxpy). */ void daxpy(n, a, x, y) register int n; register double a, *x, *y; #pragma notaliased(x, y) /* or #pragma separate, or #pragma distinct, or... */ { while (--n >= 0) *y++ += a * *x++; } to write it in C-idiom-ese. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
chris@mimsy.UUCP (Chris Torek) (05/21/88)
One I forgot: inline expansion. The latest GCC compilers have inline functions, and `inline' is easy to add to a C compiler (although the resulting langauge is not C). daxpy is certainly short enough to write in line: inline void daxpy(int n, double a, double *x, double *y) { int i; for (i = 0; i < n; i++) y[i] += a * x[i]; } or (as short as I can make it :-) ) inline void daxpy(int n, double a, double *x, double *y) { while (--n >= 0) *y++ += a * *x++; } To stick to C proper, although it introduces possible name collisions, and makes daxpy() not an expression, #define daxpy(n, a, x, y) do { \ register int daxpy_n = n; \ register double daxpy_a = a, *daxpy_x = x, *daxpy_y = y; \ while (--daxpy_n >= 0) *daxpy_y++ += a * *daxpy_x++; \ } while (0) The resulting expansion (either from the cleaner but nonstandard `inline void' versions, or the macro version) will likely not only run faster (no subroutine call overhead) but also be in a place where the optimiser can see whether there are aliasing problems. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
Paul_L_Schauble@cup.portal.com (05/22/88)
Chris Torek writes...
>The linker is to be the code generator...
Not quite enough, Chris. If you view the conventional compiler as having
three phases: parse, optimize, generate (generate includes local optimization),
you need the "linker" to do the last two. Actually, I like the idea.
Paul
tps@chem.ucsd.edu (Tom Stockfisch) (05/25/88)
In article <11609@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >One I forgot: inline expansion. >daxpy is certainly short enough to write in line: > >The resulting expansion (either from the cleaner but nonstandard >`inline void' versions, or the macro version) will likely not only run >faster (no subroutine call overhead) but also be in a place where the >optimiser can see whether there are aliasing problems. >In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) I don't think this will help much. A routine which calls something as low-level as daxpy() probably will be given pointer arguments itself rather than be operating directly on external arrays. Even if it isn't, most numerical C programming uses malloc() to get arrays and matrices. So the compiler still will not be able to determine when pointers are aliases. Given the awkwardness and limitations of #pragma directives, I think the best course for vectorizing compiler writers is to experiment with their own ideas of what noalias (and perhaps some other mechanism as well) should mean. People can always port code written using noalias by compiling with "cc -Dnoalias=" -- || Tom Stockfisch, UCSD Chemistry tps@chem.ucsd.edu
ok@quintus.UUCP (Richard A. O'Keefe) (05/25/88)
In article <221@chem.ucsd.EDU>, tps@chem.ucsd.edu (Tom Stockfisch) writes: > Given the awkwardness and limitations of #pragma directives, I think the > best course for vectorizing compiler writers is to experiment with their > own ideas of what noalias (and perhaps some other mechanism as well) should > mean. People can always port code written using noalias by compiling with > "cc -Dnoalias=" #pragma is pretty open-ended, so I don't see why it should be awkward. Apart from the "no change to the meaning of the program" bit which might as well be thrown away, #pragma seems to me a much better means than something like noalias. Why? Well, let's take the daxpy() example. The property the compiler needs is NOT that dx has no aliases or that dy has no aliases, but that dx and dy do not overlap. More generally a declaration #pragma disjoint(x, ..., z) might tell a compiler "assume that anything accessed via 'x' is distinct from anything accessed via ..., 'z'" and so on. I might well have a program with four pointers p, q, r, s where #pragma disjoint(p, q) /* or _disjoint (p,q), (r,s); */ #pragma disjoint(r, s) /* or whatever */ but p and r might be potentially aliased and so might q and s. That shouldn't inhibit optimisation in regions where only p and q appear or where only r and s appear. This is not a proposal for a replacement for 'noalias'. I merely point out that since aliasing is a property of _sets_ of variables, trying to hack it with declarations about _single_ variables doesn't seem like a good approach.