[comp.lang.c] alias/noalias etc.

quiroz@cs.rochester.edu (Cesar Quiroz) (01/11/88)

0. [Beginning with parenthetical material?  Perhaps this should be
at the end, but I have marked the text of this long note with a
place to `n' it, so I have to move this to the front.  Please excuse
the mess.

    Prudence suggests not jumping into the raging debate on
    `noalias' until a time when one has seen the exact text of the
    draft.  It is my opinion that Doug Gwyn's leakage of the general
    idea should be a welcomed occasion to think about it, but not
    cause to jump to the neck of the committee.  Shouldn't we give
    them the benefit of doubt?  (I don't like new keywords in
    principle, for instance, but I will wait until I have
    *something* to shoot at, `noalias' might be a reasonable idea
    after all.)]

1. Aliasing is not restricted to variables.  

The real reason for this article is the sub-discussion as to whether
aliasing is/is-not/is-as-in-Fortran/is-not/is-too a problem in C.
This conversation can be joined even without the `noalias' proposal.
I would like to remind those who might have forgotten or overlooked
this, that aliasing "of variables" is not the only (or worst) form
of aliasing from the perspective of giving hints to the compiler.

For instance, Richard O'Keefe (BTW, let's put some support behind
his suggestion that the proponents of anti-aliasing hints present
some hard evidence that their proposals really help.) recently
said:
:Just how much of a problem do we really have here, anyway?
:The problem of aliasing is a problem of having multiple legal access
:paths to the same variable.  How often is the controlled variable of
:a loop global?  How often does it have its address taken?

I claim some precision is gained by saying that the problem of
aliasing is that of having multiple access to the same *object*,
whether or not it is directly named by a variable.  If you knew
this all along, please bail out here.  The rest of this long note is
some argument in support of considering aliasing (and anti-aliasing
hints) for objects not directly named by variables.  My hope is that
it will persuade those who think aliasing is not a problem in {C,
FORTRAN} to take a second look at their position.

2. Examples of aliases of objects other than variables.

Consider access to array elements (those familiar with the Fortran
literature already must have thought of this, but I don't see
references :-) in the recent debate).  In something like:

    for (...) {
        a[<hard to grok expr>] = <something>;
        ...
        foo(a[<another hard expr>]);
    }

there might be a win if we can hint the compiler that both `hard'
expressions
    1)  Never refer to the same element of a.
    2)  Or, they might refer to the same element only in different
        iterations.  E.g.:
        2.1)    The second expression always refers to an element that
                will be altered in a future iteration, so the
                original value can still be used in this for ...
etc.  In this case, although each element of a is not endowed with an
exclusive identifier (variable), it is still alias-able through its
`name' in C, the index expression that refers to it.  Notice that
this hint, to be useful, has to refer to the uses and definitions that
are explicitly said to be safe.  There is no point in giving hints
about `a'!  It is the behavior of _some_ elements of `a' in the
context of a very specific _scope_ that might be exploited.  For all
we care, both `a' and its components might have many other aliases
in other contexts without damaging the optimizations possible here.

So, we end up willing to hint the compiler things like "`a[3*i-j*j]'
is never an alias for `a[i*j]'" or some such thing that is data
dependent.  For relatively simple integer indices (like those of
Fortran) one can do automatic disambiguation to a great extent.  For
other cases (and here we must consider general pointer arithmetic in
C) having the possibility of hinting the compiler might be useful,
but the statistics O'Keefe demands are needed before one believes
that. 

A similar scenario appears when singly-linked lists are manipulated
by means of 2 pointer variables that chase each other at a stride of
1.  Say we have:

    struct cell {struct cell *link, char *datum};
    ...
    insert (struct cell *list, char *thing)

    {
        struct cell *this, *next;
        ...
    }

After checking the sanity of the input and dealing with degenerate
cases, I want to assert that `this' and `next' are never aliases *of
each other*, not
        * That they don't have aliases
        * That they are not aliases of `list'.
The useful hint is on an _interaction_ of handles to unnamed
objects, and this hint is valid only inside a _context_, that might
be smaller than the intersection of the lifespans of the handles
involved in the hint.  To make this clear, the second ellipsis above
could contain something like this:

        if (!list) {                /* Deal with empty list */
            ...
        }
        else if (!list->link) {     /* Deal with trivial list */
            ...
        }
        else {                      /* Deal with full-formed list */
            this=list; next=list->link;
            UNEQUIVALENCE (this, next) {
                /* Go ahead, optimize me */
                ...
            }
            /* `this' and `next' may point to the same object here! */
        }

[Disclaimer: The gross flashback to Fortran is on intention, so no
one believes I am proposing *this syntax*.  This is just an
example.]  The idea is to permit the compiler to do something
interesting in the scope where this is safe.  We are saying in fact
"Dear Compiler: if it helps you, be aware of such and such
interaction or lack thereof for the following expressions."  It
doesn't say more, so if an optimization depends on, for example,
`this' not having aliases in that scope, then our hint won't be
powerful enough to permit the optimization.  Indeed, the example
above might not be too fruitful in optimizations anyway, but you get
the idea.
        
It is easy to find other examples of aliasing on objects that are
not directly named by variables.  In a distributed tree structure,
you might want to say that things that happen in disjoint subtrees
do not affect each other, etc.  I am looking forward to see the
actual text for the `noalias' feature and plan to figure out how
well it fares for those simple examples before developing an opinion
on its technical merit (although I must admit that it already seems
ugly from a merely esthetic viewpoint).

-- 
Cesar Augusto  Quiroz Gonzalez
Department of Computer Science     ...allegra!rochester!quiroz
University of Rochester            or
Rochester,  NY 14627               quiroz@cs.rochester.edu

wesommer@athena.mit.edu (William E. Sommerfeld) (01/12/88)

In article <5772@sol.ARPA> quiroz@ROCHESTER.UUCP (Cesar Quiroz) writes:
>[Disclaimer: The gross flashback to Fortran is on intention, so no
>one believes I am proposing *this syntax*.  This is just an
>example.]  The idea is to permit the compiler to do something
>interesting in the scope where this is safe.  We are saying in fact
>"Dear Compiler: if it helps you, be aware of such and such
>interaction or lack thereof for the following expressions."  It
>doesn't say more, so if an optimization depends on, for example,
>`this' not having aliases in that scope, then our hint won't be
>powerful enough to permit the optimization.  Indeed, the example
>above might not be too fruitful in optimizations anyway, but you get
>the idea.

This is more of an idea of `how to give hints to optimizers' than
`what should be done to c'...

It might be reasonable to use something like `assert' for this; it
would be different from the `normal' assert (which aborts if the
condition is false);

To reuse his example, the last section could be written as:

        {                      /* Deal with full-formed list */
            for (this = link, next = link -> next; 
		 next; 
		 this = next; next = next -> link) {
                assert(this != next);
		assert(this != NULL); 
		/* body which does operations on *this and *next */
            }
            /* `this' and `next' may point to the same object here! */
        }

This has the advantage that it can be handled two ways: when debugging
and testing, it can be used to check for program bugs, and when
running in `production' mode, the assert can be used by the optimizer
to make assumptions about things (that storing through `this' will not
affect things fetched through `next').

It seems more important to include assertions about arguments to
functions and their return values along the lines of:

extern char *strcpy (char * to, const char * from) 
	assert (to != NULL && from != NULL && strcpy() == to);

extern void *malloc (long length) assert (length > 0);
extern void *xmalloc (long length) assert (length > 0 && xmalloc() != NULL);
extern blackhole abort ();

it might make for both better-optimized _and_ more correct programs,
since the declarations can also be checked against the source to the
routines themselves; 

Given the above declarations (and assuming that `s' was not null), the
compiler (or an extended "lint") could justifiably complain about:

	strcpy((char *)malloc(strlen(s)+1), s);

(since the first argument could be NULL), whereas it would not
complain about:

	if ((temp = malloc (strlen (s) + 1)) == NULL)
		abort();
	strcpy(temp, s);
or
	strcpy (xmalloc (strlen(s)+1), s);