[comp.os.vms] C puzzle

hamm@BIOVAX.RUTGERS.EDU (02/16/88)

Here's one for all you C virtuosos.

We recently had to port a C program from Unix to VMS, and one of the many 
things which exploded was a line looking like this:

    for (i=0;i<LIMIT;) c3[i] = c2[i] = c1[i++];         /*1*/

Now, I'm a complete novice at C, but this certainly looks like awful code
to me.  It's not entirely clear what the line is intended to do.  From the 
context, I assumed that what was intended was:

    for (i=0;i<LIMIT;i++) c3[i] = c2[i] = c1[i];        /*2*/

whereas what I thought would happen based on my limited knowledge of C was:

    for (i=0;i<LIMIT;i++) c3[i+1] = c2[i+1] = c1[i];    /*3*/

which would certainly explain the resulting walk in address space when the
three arrays are of size [LIMIT].

A test program shows that /*3*/ is indeed what happens under VMS, but that
/*2*/ is what happens under (Sun) Unix.

I've looked in vain (though perhaps not thoroughly enough) in K&R for a 
precise definition of when increments such as i++ or ++i should be done;  
the best I can find are statements which simply say "after evaluation" vs. 
"before evaluation" - but what's the granularity of "evaluation"?  Should
it be interpreted to mean "of a complete statement" or to mean "of a
complete expression", in this case, an array index expression?

Is the apparent (Sun) Unix interpretation consistently held in all cases
on that system?  On other Unix systems?  Is VMS actually *wrong*, or is
this one of those "left to implementation" decisions?  How does the
impending C standard define this?

Any and all informed comments on this would be welcome.  I attach the
test program below in case you want to try it.

Greg
------------------------------------------------------------------------------
Gregory H. Hamm                           || Phone:  (201)932-4864
Director, Molecular Biology Computing Lab ||  
Waksman Institute/CABM                    || BITNET: hamm@biovax
P.O. Box 759, Rutgers University          || ARPA:   hamm@biovax.rutgers.edu
Piscataway, NJ 08854 * USA                ||
------------------------------------------------------------------------------

#include stdio
#define LIMIT  4        /* to prevent walking off the end of arrays */
main ()
{
    int i; 
    int c1[LIMIT+1], c2[LIMIT+1], c3[LIMIT+1];

    for (i=0;i<LIMIT;i++) c1[i] = i;
    for (i=0;i<LIMIT;i++) c2[i] = 0;
    for (i=0;i<LIMIT;i++) c3[i] = 0;

    showc(c1,c2,c3);
    for (i=0;i<LIMIT;) c3[i] = c2[i] = c1[i++];     /* what does this do? */
    showc(c1,c2,c3);
}
showc(c1,c2,c3)
    int c1[], c2[], c3[];
{
    int i; 
    
    printf ("c1: ");
    for (i=0;i<LIMIT;i++) printf("%d ",c1[i]);    printf ("\n");
    printf ("c2: ");
    for (i=0;i<LIMIT;i++) printf("%d ",c2[i]);    printf ("\n");
    printf ("c3: ");
    for (i=0;i<LIMIT;i++) printf("%d ",c3[i]);    printf ("\n");
}

     
Results on Vax/VMS :
     
c1: 0 1 2 3
c2: 0 0 0 0
c3: 0 0 0 0
c1: 0 1 2 3
c2: 0 0 1 2
c3: 0 0 1 2
     
Results on Sun :
     
c1: 0 1 2 3
c2: 0 0 0 0
c3: 0 0 0 0
c1: 0 1 2 3
c2: 0 1 2 3
c3: 0 1 2 3
     
------

darin@laic.UUCP (Darin Johnson) (02/24/88)

In article <8802221044.AA21745@ucbvax.Berkeley.EDU>, hamm@BIOVAX.RUTGERS.EDU writes:
> We recently had to port a C program from Unix to VMS, and one of the many 
> things which exploded was a line looking like this:
> 
>     for (i=0;i<LIMIT;) c3[i] = c2[i] = c1[i++];         /*1*/
> 
> From the context, I assumed that what was intended was:
> 
>     for (i=0;i<LIMIT;i++) c3[i] = c2[i] = c1[i];        /*2*/
> 
> whereas what I thought would happen based on my limited knowledge of C was:
> 
>     for (i=0;i<LIMIT;i++) c3[i+1] = c2[i+1] = c1[i];    /*3*/
> 
> which would certainly explain the resulting walk in address space when the
> three arrays are of size [LIMIT].
> 
> A test program shows that /*3*/ is indeed what happens under VMS, but that
> /*2*/ is what happens under (Sun) Unix.
> 
> I've looked in vain (though perhaps not thoroughly enough) in K&R for a 
> precise definition of when increments such as i++ or ++i should be done;  

This is a very common portability problem.  The problem is not with
the compilers, but the fact that the original code made assumptions
about the compiler.  The time at which "i++" is evaluated can not
be determined (this may not be emphasized enough in K&R)!!  Different
compilers will do this at different times.
Some people may assume that "i++" means i gets incremented after the 
expression is finished, but this is false for a lot of compilers.

Since VAX C optimizes code much better than Sun it re-ordered the code
so that the increment was done before looking at c2 or c3.  Note that
this problem is NOT unique to VMS.  Similar (but different) problems
will also be found between the Suns compiler and a VAX BSD compiler.

Probably, the main reason code like this abounds, is that a lot of
UNIX compilers don't optimize well, and the programmers tend to 
optimize the code themselves (even worse is the problem of
UNIX implementor's who know about the compiler's internals and program
accordingly).  Perhaps if people porting UNIX were more concerned with
good compilers than with getting UNIX running ASAP....

Anyway, back to the problem.  The only way you can know what the code
intended is to examine the results of the same program compiled on
the original machine it was written on (assuming it ever worked
'correctly').  If this is a pain, then look at the context and
comments (What? No comments!? :-)

PS: I tried this out on a Sun with "lint" and I indeed get the message -
      test.c(14): warning: i evaluation order undefined
-- 
Darin Johnson (...ucbvax!sun!sunncal!leadsv!laic!darin)
              (...lll-lcc.arpa!leadsv!laic!darin)
	All aboard the DOOMED express!

ram@actnyc.UUCP (Ray Milkey) (02/24/88)

The pending C standard states that a post increment operator can be deferred
until the next sequence point, where a sequence point occurs at a statement
end.  Both implementations are "correct", and several others would be also.
Kids, don't try this at home.

scjones@sdrc.UUCP (Larry Jones) (02/24/88)

In article <8802221044.AA21745@ucbvax.Berkeley.EDU>, hamm@BIOVAX.RUTGERS.EDU writes:
< We recently had to port a C program from Unix to VMS, and one of the many 
< things which exploded was a line looking like this:
< 
<     for (i=0;i<LIMIT;) c3[i] = c2[i] = c1[i++];         /*1*/
< 
< Now, I'm a complete novice at C, but this certainly looks like awful code
< to me.  It's not entirely clear what the line is intended to do.  

Yep, that's awful code alright.

< From the 
< context, I assumed that what was intended was:
< 
<     for (i=0;i<LIMIT;i++) c3[i] = c2[i] = c1[i];        /*2*/
< 
< whereas what I thought would happen based on my limited knowledge of C was:
< 
<     for (i=0;i<LIMIT;i++) c3[i+1] = c2[i+1] = c1[i];    /*3*/
< 
< which would certainly explain the resulting walk in address space when the
< three arrays are of size [LIMIT].
< 
< A test program shows that /*3*/ is indeed what happens under VMS, but that
< /*2*/ is what happens under (Sun) Unix.
< 
< I've looked in vain (though perhaps not thoroughly enough) in K&R for a 
< precise definition of when increments such as i++ or ++i should be done;  
< the best I can find are statements which simply say "after evaluation" vs. 
< "before evaluation" - but what's the granularity of "evaluation"?  Should
< it be interpreted to mean "of a complete statement" or to mean "of a
< complete expression", in this case, an array index expression?

There is no precise definition.  For efficiency, these operations can be
defered until the end of the statement.

----
Larry Jones                         UUCP: uunet!sdrc!scjones
SDRC                                MAIL: 2000 Eastman Dr., Milford, OH  45150
                                    AT&T: (513) 576-2070
"When all else fails, read the directions."