quiroz@cs.rochester.edu (Cesar Quiroz) (01/11/88)
0. [Beginning with parenthetical material? Perhaps this should be at the end, but I have marked the text of this long note with a place to `n' it, so I have to move this to the front. Please excuse the mess. Prudence suggests not jumping into the raging debate on `noalias' until a time when one has seen the exact text of the draft. It is my opinion that Doug Gwyn's leakage of the general idea should be a welcomed occasion to think about it, but not cause to jump to the neck of the committee. Shouldn't we give them the benefit of doubt? (I don't like new keywords in principle, for instance, but I will wait until I have *something* to shoot at, `noalias' might be a reasonable idea after all.)] 1. Aliasing is not restricted to variables. The real reason for this article is the sub-discussion as to whether aliasing is/is-not/is-as-in-Fortran/is-not/is-too a problem in C. This conversation can be joined even without the `noalias' proposal. I would like to remind those who might have forgotten or overlooked this, that aliasing "of variables" is not the only (or worst) form of aliasing from the perspective of giving hints to the compiler. For instance, Richard O'Keefe (BTW, let's put some support behind his suggestion that the proponents of anti-aliasing hints present some hard evidence that their proposals really help.) recently said: :Just how much of a problem do we really have here, anyway? :The problem of aliasing is a problem of having multiple legal access :paths to the same variable. How often is the controlled variable of :a loop global? How often does it have its address taken? I claim some precision is gained by saying that the problem of aliasing is that of having multiple access to the same *object*, whether or not it is directly named by a variable. If you knew this all along, please bail out here. The rest of this long note is some argument in support of considering aliasing (and anti-aliasing hints) for objects not directly named by variables. My hope is that it will persuade those who think aliasing is not a problem in {C, FORTRAN} to take a second look at their position. 2. Examples of aliases of objects other than variables. Consider access to array elements (those familiar with the Fortran literature already must have thought of this, but I don't see references :-) in the recent debate). In something like: for (...) { a[<hard to grok expr>] = <something>; ... foo(a[<another hard expr>]); } there might be a win if we can hint the compiler that both `hard' expressions 1) Never refer to the same element of a. 2) Or, they might refer to the same element only in different iterations. E.g.: 2.1) The second expression always refers to an element that will be altered in a future iteration, so the original value can still be used in this for ... etc. In this case, although each element of a is not endowed with an exclusive identifier (variable), it is still alias-able through its `name' in C, the index expression that refers to it. Notice that this hint, to be useful, has to refer to the uses and definitions that are explicitly said to be safe. There is no point in giving hints about `a'! It is the behavior of _some_ elements of `a' in the context of a very specific _scope_ that might be exploited. For all we care, both `a' and its components might have many other aliases in other contexts without damaging the optimizations possible here. So, we end up willing to hint the compiler things like "`a[3*i-j*j]' is never an alias for `a[i*j]'" or some such thing that is data dependent. For relatively simple integer indices (like those of Fortran) one can do automatic disambiguation to a great extent. For other cases (and here we must consider general pointer arithmetic in C) having the possibility of hinting the compiler might be useful, but the statistics O'Keefe demands are needed before one believes that. A similar scenario appears when singly-linked lists are manipulated by means of 2 pointer variables that chase each other at a stride of 1. Say we have: struct cell {struct cell *link, char *datum}; ... insert (struct cell *list, char *thing) { struct cell *this, *next; ... } After checking the sanity of the input and dealing with degenerate cases, I want to assert that `this' and `next' are never aliases *of each other*, not * That they don't have aliases * That they are not aliases of `list'. The useful hint is on an _interaction_ of handles to unnamed objects, and this hint is valid only inside a _context_, that might be smaller than the intersection of the lifespans of the handles involved in the hint. To make this clear, the second ellipsis above could contain something like this: if (!list) { /* Deal with empty list */ ... } else if (!list->link) { /* Deal with trivial list */ ... } else { /* Deal with full-formed list */ this=list; next=list->link; UNEQUIVALENCE (this, next) { /* Go ahead, optimize me */ ... } /* `this' and `next' may point to the same object here! */ } [Disclaimer: The gross flashback to Fortran is on intention, so no one believes I am proposing *this syntax*. This is just an example.] The idea is to permit the compiler to do something interesting in the scope where this is safe. We are saying in fact "Dear Compiler: if it helps you, be aware of such and such interaction or lack thereof for the following expressions." It doesn't say more, so if an optimization depends on, for example, `this' not having aliases in that scope, then our hint won't be powerful enough to permit the optimization. Indeed, the example above might not be too fruitful in optimizations anyway, but you get the idea. It is easy to find other examples of aliasing on objects that are not directly named by variables. In a distributed tree structure, you might want to say that things that happen in disjoint subtrees do not affect each other, etc. I am looking forward to see the actual text for the `noalias' feature and plan to figure out how well it fares for those simple examples before developing an opinion on its technical merit (although I must admit that it already seems ugly from a merely esthetic viewpoint). -- Cesar Augusto Quiroz Gonzalez Department of Computer Science ...allegra!rochester!quiroz University of Rochester or Rochester, NY 14627 quiroz@cs.rochester.edu
wesommer@athena.mit.edu (William E. Sommerfeld) (01/12/88)
In article <5772@sol.ARPA> quiroz@ROCHESTER.UUCP (Cesar Quiroz) writes: >[Disclaimer: The gross flashback to Fortran is on intention, so no >one believes I am proposing *this syntax*. This is just an >example.] The idea is to permit the compiler to do something >interesting in the scope where this is safe. We are saying in fact >"Dear Compiler: if it helps you, be aware of such and such >interaction or lack thereof for the following expressions." It >doesn't say more, so if an optimization depends on, for example, >`this' not having aliases in that scope, then our hint won't be >powerful enough to permit the optimization. Indeed, the example >above might not be too fruitful in optimizations anyway, but you get >the idea. This is more of an idea of `how to give hints to optimizers' than `what should be done to c'... It might be reasonable to use something like `assert' for this; it would be different from the `normal' assert (which aborts if the condition is false); To reuse his example, the last section could be written as: { /* Deal with full-formed list */ for (this = link, next = link -> next; next; this = next; next = next -> link) { assert(this != next); assert(this != NULL); /* body which does operations on *this and *next */ } /* `this' and `next' may point to the same object here! */ } This has the advantage that it can be handled two ways: when debugging and testing, it can be used to check for program bugs, and when running in `production' mode, the assert can be used by the optimizer to make assumptions about things (that storing through `this' will not affect things fetched through `next'). It seems more important to include assertions about arguments to functions and their return values along the lines of: extern char *strcpy (char * to, const char * from) assert (to != NULL && from != NULL && strcpy() == to); extern void *malloc (long length) assert (length > 0); extern void *xmalloc (long length) assert (length > 0 && xmalloc() != NULL); extern blackhole abort (); it might make for both better-optimized _and_ more correct programs, since the declarations can also be checked against the source to the routines themselves; Given the above declarations (and assuming that `s' was not null), the compiler (or an extended "lint") could justifiably complain about: strcpy((char *)malloc(strlen(s)+1), s); (since the first argument could be NULL), whereas it would not complain about: if ((temp = malloc (strlen (s) + 1)) == NULL) abort(); strcpy(temp, s); or strcpy (xmalloc (strlen(s)+1), s);