[comp.lang.c] alias and noalias

mrt7455@evtprp0b (Michael R Tucker) (12/13/90)

I have been following most of the thread about C versus FORTRAN and
there is something I don't understand.  I have no idea what people 
mean when they talk of 'alias' and 'noalias'.  I don't have any idea 
of the merits of either.  Could someone explain to me what alias mean, 
good and bad points, et. al.

Thank you in advance


Michael Tucker                                   Boeing Computer Services
(206) 342-6168                                   M/S 03-87  P.O.Box 24346
...uunet!bcstec!evtprp0b!mrt7455                 Seattle, Washington
                                                 98124-0346

burley@pogo.ai.mit.edu (Craig Burley) (12/15/90)

[couldn't follow up via email; it bounced.]

In article <224@evtprp0b.UUCP> mrt7455@evtprp0b (Michael R Tucker) writes:

   I have been following most of the thread about C versus FORTRAN and
   there is something I don't understand.  I have no idea what people 
   mean when they talk of 'alias' and 'noalias'.  I don't have any idea 
   of the merits of either.  Could someone explain to me what alias mean, 
   good and bad points, et. al.

This is a short answer, i.e. there's lots of issues, but basically
here is one issue that is fairly important to the topic.  Given the
following subroutine written in Fortran:

      SUBROUTINE X(A,B,C,N)
      INTEGER N
      REAL A(N),B(N),C(N)
      INTEGER I
      DO 10 I=1,N
10       C(I) = A(I) + B(I)
      END

and the corresponding function written in C (corresponding in that,
compiler-dependence issues aside, one could normally be a direct
replacement for the other):

void X(float *A,float *B,float *C,int N)
{
int I;

for (I = 0; I < N; ++I)
  C[I] = A[I] + B[I];
}

The Fortran version often can be compiled to faster code than the
C function on pipelined and/or vector machines.  In particular, the
Fortran version can be constructed so that while C(1) is being written,
A(2) + B(2) is being computed, and A(3) and B(3) are being read from
memory.  On a machine with a suitable architecture, such as most
supercomputers, this can mean that an iteration of the loop is completed
in one or two machine cycles instead of, say, 12 to 20, which is a huge
speedup; and on some machines, it can happen almost instantaneously
if they have as many vector adder units as there are elements (N) to
add.

The C version has no such luxury.

Why is this?  Because the following Fortran call to the Fortran version
is not standard-conforming (i.e. it isn't real "Fortran"):

      REAL A(100),B(100)
      ...
      CALL X(A,B,A)
      ...

This attempts to have the sums put back in one of the operand arrays.
On many machines, this might work anyway, but this won't:

      CALL X(A,B,A(2))

Either a given system is likely to add A(1) to B(1), store the result in
A(2), then read A(2) to add it to B(2) and store the sum in A(3), thus
accumulating the sum, or it will not accumulate the sum because by the
time A(2) is written with the result of A(1) + B(1), the previous value
in A(2) has already been read to compute A(2) + B(2), and so on.

So this is a restriction the programmer must observe and which offers
the Fortran compiler implementor the opportunity to "safely" implement
supercomputer-related optimizations.

On the other hand, the C equivalents of the above calls are perfectly
valid in standard C.  So a vectorizing C compiler would either have to
avoid fully utilizing the power of the target processor in compiling
the function X, or it would have to permit (and the programmer would have
to provide) some way of indicating that A, B, and C, as pointers, never
"overlap" (i.e. they could be pointing to physically separate memories and
all calls to the procedure adhere to the restrictions, just like the Fortran
case).  This is, I believe, the purpose behind the proposed "noalias"
feature rejected by ANSI C.

Since a vast quantity of scientific number-crunching code consists of
array calculations like that shown above, and most of it is written as
subroutines or functions in FORTRAN, converting it over to C without
making use of a non-portable construct like noalias or #pragma would
result in a significant loss of performance for many applications,
especially on "performance-sensitive" machines (machines where people
paid $$$ to get high performance, so they'll be very upset if a change
in programming language takes away performance they are used to enjoying).

Hope this answers your question!
--

James Craig Burley, Software Craftsperson    burley@ai.mit.edu