rig@hall.cray.com (Richard Gisselquist) (01/13/88)
After following the discussion about the "noalias" for several weeks,
some of the implications of this keyword became clear to me. Of
course, after I figured it out, I discovered that Fred Gwyn had also stated
it to the network. Let me try to summarize the situation with vector
instruction sets (read Cray Research) and the C language.
First statement. An automatic vectorizing compiler is a marketing requirement
at Cray.
First example. (roughly translated from the Fortran vectorization handbook)
void
sub1( a, b, c, i )
int *a, *b, *c;
int i;
{
int j;
for( j=0; j<i; j++) *a++ = *b++ + *c++;
}
A Fortran compiler is permitted to generate vector code for a program like
this. Why? Because the Fortran standard states that hidden equivalences are
not permitted. Thus, if a,b and c appear to refer to different locations in
memory, in Fortran they do. However, a C compiler is not permitted to assume
that pointers reference disjoint areas; indeed, sub1( a+1, a+1, a, N) will
sum a vector into its last element.
There are several different ways to get around this difficulty. They include:
A) Compiler directives, for example "pragma(SAFE_VECTORIZING)".
The difficulty here is that the code that comprises the program
is in two different places: in the program itself and in the
side-band information that instructs the compiler how to generate
instructions. Of course, a compiler that depends on compiler
directives is not quite automatic in its vectorization.
B) Vectorize only the safe constructions. We have a compiler like
this. It generates exactly three vectorized loops every time we
rebuild all the system software. An average C program just does
not contain a lot of arrays that are completely seen by the
compiler in the module where they are processed. This is an
automatic vectorizing compiler, but it has limited utility.
C) Change the language to include the "vector" concept. This has
the advantage of directness. It has been tried [SIGPLAN Notices,
Kuo-Cheng Li in the January, 86 issue and myself in one of the
late autumn issues]. However, this is not automatic vectorization.
D) Generate both code paths and determine at run-time if the
array references are disjoint enough to permit using the path
with the vector instructions. Don't worry about the size, we've
got real memory bigger than most computers have virtual. This is
certainly automatic vectorization, but complicated.
E) Generate vector code, but decide at run-time on a maximum
value of the vector length such that vector references are
sufficiently disjoint. This is automatic vectorization and it
has been attempted, but the inner scalar loops that have been
executed using vector registers with a vector length of one
have left a bad taste.
"But there was a third possibility" (Arlo Guthrie, Alice's Restaurant). That
being to use the new "noalias" keyword to mean that loops involving this
variable may be safely vectorized if it looks like they may be vectorized.
The change to the standard still will not permit "dusty deck" C programs
to be vectorized without changes, but it will permit vectorized code after
only "standard" changes.
The "noalias" keyword has meaning for scalar optimization and for other
non-vector parallel constructs. I write because the "aha" experience
made clear to me the relationship between "noalias" and automatic
vectorization.
Solely the personal opinions of Richard Gisselquist.
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/13/88)
In article <2942@hall.cray.com> rig@hall.cray.com (Richard Gisselquist) writes: >I discovered that Fred Gwyn had also stated it to the network. I thought my name was Douglas. Good article otherwise.
ado@elsie.UUCP (Arthur David Olson) (01/14/88)
In article <7073@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: > In article <2942@hall.cray.com> rig@hall.cray.com (Richard Gisselquist) writes: > >I discovered that Fred Gwyn had also stated it to the network. > > I thought my name was Douglas. . . Of course such mistakes won't happen once we have "noalias". -- ado@vax2.nlm.nih.gov ADO, VAX, and NIH are Ampex and DEC trademarks
ok@quintus.UUCP (Richard A. O'Keefe) (01/14/88)
In article <2942@hall.cray.com>, rig@hall.cray.com (Richard Gisselquist) gives the following example: > void sub1( a, b, c, i ) > int *a, *b, *c; > int i; > { > int j; > for (j = 0; j < i; j++) *a++ = *b++ + *c++; > } and points out that for the CRAY you would really like to vectorise this. His statement that a "safe" vectorising compiler currently finds exactly three safe places in "all the system software" is the kind of evidence I was asking for that noalias would be useful. Except that (a) it would be rather surprising if system code DID contain a lot of vectorisable code. Compilers, for example, often spend a lot of time walking around trees, which doesn't vectorise. So before we can evaluate this evidence, we need to know how much better the compiler could do if noalias were used, and how many noalias declarations were needed. (b) if anything, this is evidence that aliasing should be the exception, with noalias the default. Why? Because most programmers would find it hard to say what the example actually does. Gisselquist says > sub1(a+1, a+1, a, N) will sum a vector into its last element. Yes, but what *else* does it do? You have 59 seconds left... (c) Each of the combinations a==b, b==c, c==a, and a==b==c makes sense and should be vectorisable. In fact, what we need for this to work is assert((a+i >= b || b >= a)); assert((a+i >= c || c >= a)); That is, if b or c overlaps a, it is above a. If I want to add a vector c to a vector a, I don't see why I shouldn't call sub1(a, a, c, n); But that violates 'noalias'. assert() seems to me to be a far better mechanism, because it makes clear to the human reader WHY the transformation is safe. Maintaining code that relies on 'noalias' is going to be a nightmare. (Remember maintenance?) (d) If the assertions above are true, *this* loop can be vectorised, whatever *other* aliasing may be done to these arrays. I've finally realised why I dislike 'noalias'. Maintenance. The original programmer may well understand his algorithm well enough to know where a 'noalias' is safe and where it isn't. But the maintainer may not. It is very easy to make barely noticeable changes to a program which violate the *unstated* assumptions which justify 'noalias'. So, professional C programmers who use 'noalias' declarations will always include in the code clear and explicit statements saying why the 'noalias' is used and when it is actually valid. The best of them will use 'assert' for as much of this as they can. But in the presence of 'assert', 'noalias' is redundant. Indeed, 'assert' can handle index aliasing, which 'noalias' can't. Is there any reason why C compilers should not be allowed to assume the truth of assertions even in the presence of NDEBUG?
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/14/88)
In article <531@cresswell.quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >His statement that a "safe" vectorising compiler currently finds exactly >three safe places in "all the system software" is the kind of evidence I >was asking for that noalias would be useful. No, it's no such thing! That's three places where "noalias" was NOT necessary in order to vectorize. Because he didn't HAVE "noalias", he couldn't vectorize anywhere else. He didn't tally the missed opportunities! >The original programmer may well understand his algorithm well enough >to know where a 'noalias' is safe and where it isn't. But the >maintainer may not. The appearance of "noalias" in interface specifications is a powerful aid. It lets you know that a function is not designed to be safely used in alias situations. For example, it warns that strcpy() must not be used to copy a string into an overlapping portion of the same char array.
nevin1@ihlpf.ATT.COM (00704A-Liber) (01/19/88)
In article <7088@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >The appearance of "noalias" in interface specifications is a powerful >aid. It lets you know that a function is not designed to be safely >used in alias situations. For example, it warns that strcpy() must >not be used to copy a string into an overlapping portion of the same >char array. Actually, from what I understand, all that noalias would mean here is that s1 and s2 don't point to EXACTLY the same place (although in most situations the compiler cannot enforce this restriction till run-time). Overlap would still be allowed. (BTW, strcpy() WILL work in most implementations if s1 == s2.) -- _ __ NEVIN J. LIBER ..!ihnp4!ihlpf!nevin1 (312) 510-6194 ' ) ) "The secret compartment of my ring I fill / / _ , __o ____ with an Underdog super-energy pill." / (_</_\/ <__/ / <_ These are solely MY opinions, not AT&T's, blah blah blah
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/20/88)
In article <3419@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704A-Liber,N.) writes:
-Actually, from what I understand, all that noalias would mean here is that s1
-and s2 don't point to EXACTLY the same place ... Overlap would still
-be allowed.
Not the way I understand it.
nevin1@ihlpf.ATT.COM (00704A-Liber) (01/21/88)
In article <7142@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: >In article <3419@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704A-Liber,N.) writes: >-Actually, from what I understand, all that noalias would mean here is that s1 >-and s2 don't point to EXACTLY the same place ... Overlap would still >-be allowed. > >Not the way I understand it. Are you telling me that noaliasing a pointer to char means that I cannot have overlap anywhere past the character that I am pointing to? Do you think that noalias believes that all uses of pointers to char are for null-terminated strings (which is not the only way to define strings)? Since string operations are made by function calls, they are technically NOT part of the semantics of the C language, and noalias cannot determine what is meant by overlap (except for EXACTLY pointing to the same place). I simply cannot believe that noalias would not allow pointing to two different elements of the same array. -- _ __ NEVIN J. LIBER ..!ihnp4!ihlpf!nevin1 (312) 510-6194 ' ) ) "The secret compartment of my ring I fill / / _ , __o ____ with an Underdog super-energy pill." / (_</_\/ <__/ / <_ These are solely MY opinions, not AT&T's, blah blah blah
dik@cwi.nl (Dik T. Winter) (01/22/88)
In article <3458@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704A-Liber,N.) writes: > In article <7142@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: > >In article <3419@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704A-Liber,N.) writes: > >-Actually, from what I understand, all that noalias would mean here is that s1 > >-and s2 don't point to EXACTLY the same place ... Overlap would still > >-be allowed. > > > >Not the way I understand it. > > Are you telling me that noaliasing a pointer to char means that I cannot have > overlap anywhere past the character that I am pointing to? Do you think > that noalias believes that all uses of pointers to char are for null-terminated > strings (which is not the only way to define strings)? Since string operations What I think is meant is: if in a routine pointer p1 is declared noalias, there will be no instance during the execution of the routine that p1 will point to anything else accessible by the routine. This is very useful to vectorize statements like p1++ = p2++ + p3++; It ensures that during a calculation of the array element that p1 points to, no previous calculated element of the array is used. Without this assurance the statement cannot be vectorized. We can have even more complicated situations like: void vector_scatter(float *v1, *v2, int *index, int n) { noalias float *p1; while(n--) { /* instead of v1[*index++] = *v2++ */ p1 = &v1[*index++]; *p1 = *v2++; } } Note the kludge with p1 to ensure proper information to the compiler; noaliasing v1 does not work I think. Without noalias this will not vectorize, with noalias it will. Note also there is no easy way to check the validity of the noalias, and it is also not possible to do the same using assertions (as has been suggested). This is one of the most common operations in vectorized programs, available as a single instruction (for the loop) on most vector machines. (If I erred about the semantics of noalias it is all my fault.) -- dik t. winter, cwi, amsterdam, nederland INTERNET : dik@cwi.nl BITNET/EARN: dik@mcvax
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/23/88)
In article <3458@ihlpf.ATT.COM> nevin1@ihlpf.UUCP (00704A-Liber,N.) writes: >Are you telling me that noaliasing a pointer to char means that I cannot have >overlap anywhere past the character that I am pointing to? Let's be careful here. "noalias", like the other type qualifiers, can appear at different levels within a complex declaration, and it applies at the corresponding object level, not to the declaration as a whole. A pointer to a noalias array of chars is not the same as a noalias pointer; the latter is comparitively uninteresting. Check the "handles" explanation I recently posted to see if it makes things clearer.