chris@mimsy.UUCP (Chris Torek) (05/21/88)
In article <54080@sun.uucp> dgh%dgh@Sun.COM (David Hough) writes: >The results for double precision linpack on Sun-4 using SunOS 4.0 and >Fortran 1.1 were [edited to just `rolled' case]: >Fortran 1080 KFLOPS >C 850 KFLOPS > subroutine daxpy(n,da,dx,dy) > doubleprecision dx(1),dy(1),da > integer i,n Incidentally, as we have just seen in comp.arch, this Fortran version is illegal: it should declare dx and dy as integer n double precision dx(n), dy(n) > do 30 i = 1,n > dy(i) = dy(i) + da*dx(i) > 30 continue > return > end >The corresponding rolled C code could be written with a for loop >daxpy(n, da, dx, dy) > double dx[], dy[], da; > int n; >{ > int i; > > for (i = 0; i < n; i++) { > dy[i] = dy[i] + da * dx[i]; I suggest dy[i] += da * dx[i]; as it is easier to understand. (In a reasonably optimal C compiler it should produce the same code.) > } >} > >but [the] Sun compilers ... won't unroll [these] loops.... [Hand unrolling >helped but not as much as expected.] >Investigation revealed that the reason had to do with noalias: [the >Fortran [version is] defined by the Fortran standard to be "noalias", >meaning a compiler may optimize code based on the assumption that [dy >and dx are distinct]. [X3J11's `noalias' proposal was deleted for various reasons including] >3) optimizing compilers should be able to figure out if aliasing >exists, which is definitely false in a separate compilation environment >(unless you want the linker to recompile everything, in which case the >linker is the compiler, and you're back to no separate compilation). This is not quite right: The linker is to be the *code generator*, not the *compiler*. Code generation is a (relatively) small subset of the task of compilation. Naturally, a code-generating linker will take longer to run than a simple linking linker, which discourages this somewhat. The usual solution is to generate code in the compiler proper only when it is told not to optimise. >Anyway there is no portable way in draft ANSI C to say "this pointers >are guaranteed to have no aliases". >... you don't dare load dx[i+1] before you store dy[i] if there is >any danger that they point to the same place. True. >What is to be done? Ignore it. (Unsatisfactory.) Provide code-generating linkers. (Good idea but hard to do.) Provide `unsafe' optimisation levels. (Generally a bad idea, but easier than code generation at link time, and typically produces faster compile times.) Provide `#pragma's. Some people claim that a pragma is not allowed to declare such semantics as volatility or lack of aliasing; I disagree. Short of the code-generating linker, with aliasing and register allocation computed at `link' time, this seems to me the best solution. /* * Double precision `ax + y' (d a*x plus y => daxpy). */ void daxpy(n, a, x, y) register int n; register double a, *x, *y; #pragma notaliased(x, y) /* or #pragma separate, or #pragma distinct, or... */ { while (--n >= 0) *y++ += a * *x++; } to write it in C-idiom-ese. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
chris@mimsy.UUCP (Chris Torek) (05/21/88)
One I forgot: inline expansion. The latest GCC compilers have inline functions, and `inline' is easy to add to a C compiler (although the resulting langauge is not C). daxpy is certainly short enough to write in line: inline void daxpy(int n, double a, double *x, double *y) { int i; for (i = 0; i < n; i++) y[i] += a * x[i]; } or (as short as I can make it :-) ) inline void daxpy(int n, double a, double *x, double *y) { while (--n >= 0) *y++ += a * *x++; } To stick to C proper, although it introduces possible name collisions, and makes daxpy() not an expression, #define daxpy(n, a, x, y) do { \ register int daxpy_n = n; \ register double daxpy_a = a, *daxpy_x = x, *daxpy_y = y; \ while (--daxpy_n >= 0) *daxpy_y++ += a * *daxpy_x++; \ } while (0) The resulting expansion (either from the cleaner but nonstandard `inline void' versions, or the macro version) will likely not only run faster (no subroutine call overhead) but also be in a place where the optimiser can see whether there are aliasing problems. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/21/88)
In article <11608@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >Provide `#pragma's. Some people claim that a pragma is not allowed to >declare such semantics as volatility or lack of aliasing; I disagree. Although you can find X3J11 committee members on both sides of this debate, apparently all the editors of official documents are in agreement that the proposed Standard does not permit (and certainly is not intended to permit) #pragma to allow violation of any portion of the rest of the Standard specifications. Since the rest of the proposed Standard requires honoring of aliasing, #pragma cannot be used to change this. I used to think otherwise but have been convinced of this. I agree that it is not obvious.
dmr@alice.UUCP (05/22/88)
Since the noalias issue has surfaced again, I offer a few thoughts on the issue. The comments of Hough, and the views of all the people on X3J11 who wanted some way to express the "noalias" concept, are worth paying attention to. When a function with two pointer arguments is called, the pointers are allowed to point to overlapping things, and this inhibits otherwise plausible optimizations (vectorization especially) in the function. Other possibilities for pointers to overlapping things occur, like externals vs. parameters, but in practice Hough's example represents the problem that purveyors of vector machines worry most about. The problem with the noalias proposal, as embodied in the January draft, was not at all that it attacked a nonexistent problem, but that it did far too much. If it had merely provided a way of saying, "I promise that this pointer can be trusted not to point to data accessed by another path," and if the scope of the promise were limited reasonably, it would have been accepted without any real quarrel; there might have been mutterings from mossbacks like me about balancing the burden of language complexity against the benefit, but no outcry. Instead, the concept was illegitimately bound to the notion of data type, and was made very dangerous; the rules as constituted would have forced programmers into promises they didn't understand and couldn't keep. Some of the examples are in the screed I posted some months ago. I don't think there are easy answers to the problem. As it stands, C is hard to vectorize because of aliasing. Fortran is easier because of some global rules: for example, two parameters, or a parameter and a COMMON, are not allowed to be aliased. X3J11 was understandably unwilling to introduce Fortran-style rules because it would represent a subtle and dangerous change in the interpretation of existing programs. Indeed, the Fortran rules are subtle and to some extent dangerous even for Fortran. Many years ago I was involved in a large system (Altran) written in Fortran, and its most stubborn bugs owed to unsuspecting violation of Fortran's aliasing (rather, noaliasing) rules. I think there is little doubt that the best solution for C is to use a #pragma, and that it would have been best for X3J11 to suggest one. Because I thought it was absolutely essential to get rid of the January version of noalias, and no variations on it worked any better, I made a calculated decision not to propose an alternative; no idea seemed attractive enough to avoid further controversy and consequent distraction from the problems with the draft's version. Gwyn, by the way, seems to be correct in observing that the rules for #pragma, as written, prohibit using it to make promises about aliasing. Thus making a formal #pragma proposal would have opened up a wrangle about #pragmas in general. There was not enough time to do the job properly. If all this had happened two years ago, something could have been worked out, but it was too late for this standard. Dennis Ritchie research!dmr dmr@research.att.com
Paul_L_Schauble@cup.portal.com (05/22/88)
Chris Torek writes...
>The linker is to be the code generator...
Not quite enough, Chris. If you view the conventional compiler as having
three phases: parse, optimize, generate (generate includes local optimization),
you need the "linker" to do the last two. Actually, I like the idea.
Paul
henry@utzoo.uucp (Henry Spencer) (05/23/88)
> ... all the editors of official documents are in > agreement that the proposed Standard does not permit (and certainly > is not intended to permit) #pragma to allow violation of any portion > of the rest of the Standard specifications... This is, of course, yet another example of why most ANSI C compilers will have an option setting with two values: "strict conformance" and "do it right". The fact is that #pragma is the obvious way to declare such additional information about programs, and that's what most compiler writers will use. The alternative of having "__magic_cookie" identifiers is much less satisfactory; with #pragma there is a reasonable chance that the code remains portable (although its efficiency may not). Compilers are required to ignore unrecognized #pragmas (although I for one think a warning message is in order...), but "__magic_cookie" is a different and messier situation. Also, after some study of the 2nd-public-comment draft, I fear I can't support the editors' view. Depending on how one interprets the meaning of various words, one *can* come up with that conclusion -- but it is *NOT* the "interpretation of least astonishment". The description of #pragma just says "causes the implementation to behave in an implemen- tation-defined manner". While this arguably doesn't allow alteration of the semantics of other parts of the program, it seems to me that it does allow the #pragma itself to turn into "abort();" or "printf("foo\n");" or "#include <somethingfunny>". This being the case, no strictly-conforming program can contain a #pragma, since this would make its output depend on implementation-defined characteristics. THAT being the case, strange semantics for #pragma are an extension which does not alter the behavior of strictly-conforming programs -- and this sort of extension is explicitly legal for conforming compilers. I can find no hole in the above argument without violating the Law of Least Astonishment... which is the principle that persons other than X3J11 members will necessarily rely on to interpret the standard. If X3J11 wishes #pragma to be constrained not to alter the semantics of the rest of the language, it is going to have to make this explicit. Any such effort should recognize that this restriction will definitely be in the running for "clause most frequently ignored for good reason". -- NASA is to spaceflight as | Henry Spencer @ U of Toronto Zoology the Post Office is to mail. | {ihnp4,decvax,uunet!mnetor}!utzoo!henry
karl@haddock.ISC.COM (Karl Heuer) (05/24/88)
In article <1988May21.030207.25063@light.uucp> bvs@light.UUCP (Bakul Shah) writes: >In article <54080@sun.uucp> dgh%dgh@Sun.COM (David Hough) writes: >>Anyway there is no portable way in draft ANSI C to say "this pointers are >>guaranteed to have no aliases". > >How about adding a test before the for loop? Something like: >#define overlap(x,y,n) (!(x + n <= y || y + n <= x)) > if (overlap(dx, dy, n)) > return complain("overlapping arrays\n"); > >Now a smart compiler can figure out that dx, dy don't overlap ... The information is there, and a human reader can prove it, but I don't think they make compilers that smart yet. >note that in a sense this is an explicit translation of fortran's [rule] Actually, a better translation might be assert(!overlap(dx, dy, n)); (though we'd need a slight change to the definition of assert to make this useful for optimization). >extern int daxpy(int n, double da, double dx[], double dy[]) > /* call */if (!overlap(dx, dy, n)); >The syntax is unambiguous... But it differs from currently-legal syntax only by the absence of a semicolon separating the lines. For that reason, i'd prefer something else. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
dik@cwi.nl (Dik T. Winter) (05/24/88)
In article <1988May21.030207.25063@light.uucp> bvs@light.UUCP (Bakul Shah) writes: >In article <54080@sun.uucp> dgh%dgh@Sun.COM (David Hough) writes: >>Anyway there is no portable way in draft ANSI C to say "this pointers are >>guaranteed to have no aliases". > >How about adding a test before the for loop? Something like: >#define overlap(x,y,n) (!(x + n <= y || y + n <= x)) > if (overlap(dx, dy, n)) > return complain("overlapping arrays\n"); > Note however that in many cases it is much more difficult to ascertain overlap (i.e. 2 columns of matrices). And there are cases where it is just as difficult as the operations themselves. -- dik t. winter, cwi, amsterdam, nederland INTERNET : dik@cwi.nl BITNET/EARN: dik@mcvax
tps@chem.ucsd.edu (Tom Stockfisch) (05/25/88)
In article <11609@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >One I forgot: inline expansion. >daxpy is certainly short enough to write in line: > >The resulting expansion (either from the cleaner but nonstandard >`inline void' versions, or the macro version) will likely not only run >faster (no subroutine call overhead) but also be in a place where the >optimiser can see whether there are aliasing problems. >In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) I don't think this will help much. A routine which calls something as low-level as daxpy() probably will be given pointer arguments itself rather than be operating directly on external arrays. Even if it isn't, most numerical C programming uses malloc() to get arrays and matrices. So the compiler still will not be able to determine when pointers are aliases. Given the awkwardness and limitations of #pragma directives, I think the best course for vectorizing compiler writers is to experiment with their own ideas of what noalias (and perhaps some other mechanism as well) should mean. People can always port code written using noalias by compiling with "cc -Dnoalias=" -- || Tom Stockfisch, UCSD Chemistry tps@chem.ucsd.edu
faustus@ic.Berkeley.EDU (Wayne A. Christopher) (05/25/88)
In article <4152@haddock.ISC.COM>, karl@haddock.ISC.COM (Karl Heuer) writes: > >How about adding a test before the for loop? Something like: > >#define overlap(x,y,n) (!(x + n <= y || y + n <= x)) > > if (overlap(dx, dy, n)) > > return complain("overlapping arrays\n"); > > > >Now a smart compiler can figure out that dx, dy don't overlap ... > > The information is there, and a human reader can prove it, but I don't think > they make compilers that smart yet. The compler doesn't have to see it. Probably a better way of saying it is that the compiler will translate <conservative, un-optimized code> into if ( <conditions like noalias hold> ) { <highly-optimized, vectorized code> } else { <conservative, un-optimized code> } Are there any constructs that couldn't be figured out at run-time without a large penalty? Maybe if you were re-directing through an array of pointers... Wayne
ok@quintus.UUCP (Richard A. O'Keefe) (05/25/88)
In article <221@chem.ucsd.EDU>, tps@chem.ucsd.edu (Tom Stockfisch) writes: > Given the awkwardness and limitations of #pragma directives, I think the > best course for vectorizing compiler writers is to experiment with their > own ideas of what noalias (and perhaps some other mechanism as well) should > mean. People can always port code written using noalias by compiling with > "cc -Dnoalias=" #pragma is pretty open-ended, so I don't see why it should be awkward. Apart from the "no change to the meaning of the program" bit which might as well be thrown away, #pragma seems to me a much better means than something like noalias. Why? Well, let's take the daxpy() example. The property the compiler needs is NOT that dx has no aliases or that dy has no aliases, but that dx and dy do not overlap. More generally a declaration #pragma disjoint(x, ..., z) might tell a compiler "assume that anything accessed via 'x' is distinct from anything accessed via ..., 'z'" and so on. I might well have a program with four pointers p, q, r, s where #pragma disjoint(p, q) /* or _disjoint (p,q), (r,s); */ #pragma disjoint(r, s) /* or whatever */ but p and r might be potentially aliased and so might q and s. That shouldn't inhibit optimisation in regions where only p and q appear or where only r and s appear. This is not a proposal for a replacement for 'noalias'. I merely point out that since aliasing is a property of _sets_ of variables, trying to hack it with declarations about _single_ variables doesn't seem like a good approach.
turner@sdti.UUCP (Prescott K. Turner) (05/27/88)
In article <1988May21.030207.25063@light.uucp> bvs@light.UUCP (Bakul Shah) writes: >In article <54080@sun.uucp> dgh%dgh@Sun.COM (David Hough) writes: >>Anyway there is no portable way in draft ANSI C to say "this pointers are >>guaranteed to have no aliases". > >How about adding a test before the for loop? Something like: >#define overlap(x,y,n) (!(x + n <= y || y + n <= x)) > if (overlap(dx, dy, n)) > return complain("overlapping arrays\n"); > >Now a smart compiler can figure out that dx, dy don't overlap ... This test make non-standard assumptions. In draft ANSI C, those pointer comparisons are undefined (meaning your program might crash and burn) unless "x" and "y" point within the same object. Anyone know a machine where this could be experienced? -- Prescott K. Turner, Jr. Software Development Technologies, Inc. 375 Dutton Rd., Sudbury, MA 01776 USA (617) 443-5779 UUCP:genrad!mrst!sdti!turner