hamm@BIOVAX.RUTGERS.EDU (02/16/88)
Here's one for all you C virtuosos. We recently had to port a C program from Unix to VMS, and one of the many things which exploded was a line looking like this: for (i=0;i<LIMIT;) c3[i] = c2[i] = c1[i++]; /*1*/ Now, I'm a complete novice at C, but this certainly looks like awful code to me. It's not entirely clear what the line is intended to do. From the context, I assumed that what was intended was: for (i=0;i<LIMIT;i++) c3[i] = c2[i] = c1[i]; /*2*/ whereas what I thought would happen based on my limited knowledge of C was: for (i=0;i<LIMIT;i++) c3[i+1] = c2[i+1] = c1[i]; /*3*/ which would certainly explain the resulting walk in address space when the three arrays are of size [LIMIT]. A test program shows that /*3*/ is indeed what happens under VMS, but that /*2*/ is what happens under (Sun) Unix. I've looked in vain (though perhaps not thoroughly enough) in K&R for a precise definition of when increments such as i++ or ++i should be done; the best I can find are statements which simply say "after evaluation" vs. "before evaluation" - but what's the granularity of "evaluation"? Should it be interpreted to mean "of a complete statement" or to mean "of a complete expression", in this case, an array index expression? Is the apparent (Sun) Unix interpretation consistently held in all cases on that system? On other Unix systems? Is VMS actually *wrong*, or is this one of those "left to implementation" decisions? How does the impending C standard define this? Any and all informed comments on this would be welcome. I attach the test program below in case you want to try it. Greg ------------------------------------------------------------------------------ Gregory H. Hamm || Phone: (201)932-4864 Director, Molecular Biology Computing Lab || Waksman Institute/CABM || BITNET: hamm@biovax P.O. Box 759, Rutgers University || ARPA: hamm@biovax.rutgers.edu Piscataway, NJ 08854 * USA || ------------------------------------------------------------------------------ #include stdio #define LIMIT 4 /* to prevent walking off the end of arrays */ main () { int i; int c1[LIMIT+1], c2[LIMIT+1], c3[LIMIT+1]; for (i=0;i<LIMIT;i++) c1[i] = i; for (i=0;i<LIMIT;i++) c2[i] = 0; for (i=0;i<LIMIT;i++) c3[i] = 0; showc(c1,c2,c3); for (i=0;i<LIMIT;) c3[i] = c2[i] = c1[i++]; /* what does this do? */ showc(c1,c2,c3); } showc(c1,c2,c3) int c1[], c2[], c3[]; { int i; printf ("c1: "); for (i=0;i<LIMIT;i++) printf("%d ",c1[i]); printf ("\n"); printf ("c2: "); for (i=0;i<LIMIT;i++) printf("%d ",c2[i]); printf ("\n"); printf ("c3: "); for (i=0;i<LIMIT;i++) printf("%d ",c3[i]); printf ("\n"); } Results on Vax/VMS : c1: 0 1 2 3 c2: 0 0 0 0 c3: 0 0 0 0 c1: 0 1 2 3 c2: 0 0 1 2 c3: 0 0 1 2 Results on Sun : c1: 0 1 2 3 c2: 0 0 0 0 c3: 0 0 0 0 c1: 0 1 2 3 c2: 0 1 2 3 c3: 0 1 2 3 ------
darin@laic.UUCP (Darin Johnson) (02/24/88)
In article <8802221044.AA21745@ucbvax.Berkeley.EDU>, hamm@BIOVAX.RUTGERS.EDU writes: > We recently had to port a C program from Unix to VMS, and one of the many > things which exploded was a line looking like this: > > for (i=0;i<LIMIT;) c3[i] = c2[i] = c1[i++]; /*1*/ > > From the context, I assumed that what was intended was: > > for (i=0;i<LIMIT;i++) c3[i] = c2[i] = c1[i]; /*2*/ > > whereas what I thought would happen based on my limited knowledge of C was: > > for (i=0;i<LIMIT;i++) c3[i+1] = c2[i+1] = c1[i]; /*3*/ > > which would certainly explain the resulting walk in address space when the > three arrays are of size [LIMIT]. > > A test program shows that /*3*/ is indeed what happens under VMS, but that > /*2*/ is what happens under (Sun) Unix. > > I've looked in vain (though perhaps not thoroughly enough) in K&R for a > precise definition of when increments such as i++ or ++i should be done; This is a very common portability problem. The problem is not with the compilers, but the fact that the original code made assumptions about the compiler. The time at which "i++" is evaluated can not be determined (this may not be emphasized enough in K&R)!! Different compilers will do this at different times. Some people may assume that "i++" means i gets incremented after the expression is finished, but this is false for a lot of compilers. Since VAX C optimizes code much better than Sun it re-ordered the code so that the increment was done before looking at c2 or c3. Note that this problem is NOT unique to VMS. Similar (but different) problems will also be found between the Suns compiler and a VAX BSD compiler. Probably, the main reason code like this abounds, is that a lot of UNIX compilers don't optimize well, and the programmers tend to optimize the code themselves (even worse is the problem of UNIX implementor's who know about the compiler's internals and program accordingly). Perhaps if people porting UNIX were more concerned with good compilers than with getting UNIX running ASAP.... Anyway, back to the problem. The only way you can know what the code intended is to examine the results of the same program compiled on the original machine it was written on (assuming it ever worked 'correctly'). If this is a pain, then look at the context and comments (What? No comments!? :-) PS: I tried this out on a Sun with "lint" and I indeed get the message - test.c(14): warning: i evaluation order undefined -- Darin Johnson (...ucbvax!sun!sunncal!leadsv!laic!darin) (...lll-lcc.arpa!leadsv!laic!darin) All aboard the DOOMED express!
ram@actnyc.UUCP (Ray Milkey) (02/24/88)
The pending C standard states that a post increment operator can be deferred until the next sequence point, where a sequence point occurs at a statement end. Both implementations are "correct", and several others would be also. Kids, don't try this at home.
scjones@sdrc.UUCP (Larry Jones) (02/24/88)
In article <8802221044.AA21745@ucbvax.Berkeley.EDU>, hamm@BIOVAX.RUTGERS.EDU writes:
< We recently had to port a C program from Unix to VMS, and one of the many
< things which exploded was a line looking like this:
<
< for (i=0;i<LIMIT;) c3[i] = c2[i] = c1[i++]; /*1*/
<
< Now, I'm a complete novice at C, but this certainly looks like awful code
< to me. It's not entirely clear what the line is intended to do.
Yep, that's awful code alright.
< From the
< context, I assumed that what was intended was:
<
< for (i=0;i<LIMIT;i++) c3[i] = c2[i] = c1[i]; /*2*/
<
< whereas what I thought would happen based on my limited knowledge of C was:
<
< for (i=0;i<LIMIT;i++) c3[i+1] = c2[i+1] = c1[i]; /*3*/
<
< which would certainly explain the resulting walk in address space when the
< three arrays are of size [LIMIT].
<
< A test program shows that /*3*/ is indeed what happens under VMS, but that
< /*2*/ is what happens under (Sun) Unix.
<
< I've looked in vain (though perhaps not thoroughly enough) in K&R for a
< precise definition of when increments such as i++ or ++i should be done;
< the best I can find are statements which simply say "after evaluation" vs.
< "before evaluation" - but what's the granularity of "evaluation"? Should
< it be interpreted to mean "of a complete statement" or to mean "of a
< complete expression", in this case, an array index expression?
There is no precise definition. For efficiency, these operations can be
defered until the end of the statement.
----
Larry Jones UUCP: uunet!sdrc!scjones
SDRC MAIL: 2000 Eastman Dr., Milford, OH 45150
AT&T: (513) 576-2070
"When all else fails, read the directions."