NEVILLE%umass-cs.csnet@csnet-relay.arpa (Neville D. Newman) (02/02/86)
This is posted to unix-wizards instead of to net.lang because i believe
that it shows faulty semantics in the Unix C compiler. i don't know
the proper to get to arbitrary newsgroups, being an Internet person, so
if the moderator would kindly forward it there i would appreciate it.
While digging through the guts of the portable C compiler, i noticed
that it produced exactly the same code for two statements that i think
have different semantics. According to my C references, the unary
operators have precedence over binary operators (and are evaluated right
to left). The rule for pre- and post-increment (and -decrement) operators
is, of course, that a++ changes the value of a after the value of the
term is used, so that the change is a side-effect. ++a, on the other
hand changes the value of a before the term is used and is therefore
entirely equivalent to (a += 1) or (a = a+1).
The C compiler on our 4.2bsd system, however, seems to consider the "before"
and "after" to be relative to the higher level *expression* being evaluated
rather than just the individual term. In the assembly output for the simple
program that follows, the increment or decrement is always placed before or
after the block of statements that implement the C assignment, never in the
midst of that block where it should sometimes appear.
According to my books, if a is 5, then (a++ + a) ought to evaluate to 11.
On 4.2bsd (or any system using the pcc, i imagine) it evaluates to 10.
On VMS with VAX-C v2.1, it evaluates to 11.
On CP/M-68K with the Alcyon compiler, it is 10. The Alcyon compiler strives
for compatibility with version 7 Unix.
So the questions for the day are: Is pcc "right" because it is sort of the
defacto standard? (i have a friend who claims that BNF and such are useless,
the compiler is the only definition of a language that counts) Is this
discrepancy between Unix C's behaviour and description already widely known
and carefully worked around? Should i attempt to fix it and possibly break
some code or leave it alone for old time's sake?
This code should check several facets of the pre-/post- increment/decrement
problem. The pre-increments should be give results of 12 on all systems,
or else there's a *bad* problem. The post-increments give 10 on Unix, 11
on VMS. i think 11 is correct, based on the language description.
#include <stdio.h>
main() {
int a;
int b;
/* check post-increments */
a = 5;
b = a + a++;
printf("b = a + a++ yields %d\n",b);
a = 5;
b = a++ + a;
printf("b = a++ + a yields %d\n",b);
a = 5;
b = a + (a++);
printf("b = a + (a++) yields %d\n",b);
a = 5;
b = (a++) + a;
printf("b = (a++) + a yields %d\n",b);
/* check pre-increments */
a = 5;
b = a + ++a;
printf("b = a + ++a yields %d\n",b);
a = 5;
b = ++a + a;
printf("b = ++a + a yields %d\n",b);
a = 5;
b = a + (++a);
printf("b = a + (++a) yields %d\n",b);
a = 5;
b = (++a) + a;
printf("b = (++a) + a yields %d\n",b);
}
chris@umcp-cs.UUCP (Chris Torek) (02/02/86)
PCC is neither `right' nor `wrong'; the behaviour of that kind of code (`a++ + a') is specifically left undefined. (The ANSI draft has the notion of `sequence points' after which all side effects should have taken place. An addition within a single expression is not a sequence point.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
MRC%PANDA@sumex-aim.arpa (02/03/86)
On a DEC-20 running Stanford's KCC compiler, all the post-increments yield 11 and all the pre-increments yield 12, as follows: b = a + a++ yields 11 b = a++ + a yields 11 b = a + (a++) yields 11 b = (a++) + a yields 11 b = a + ++a yields 12 b = ++a + a yields 12 b = a + (++a) yields 12 b = (++a) + a yields 12 This would seem to correspond to the VMS C compiler and the formal definition. I think the discrepancy is that VMS C and KCC were written with a formal definition in mind, while Unix C was written as a kind of RatFor for PDP-11 assembly code. The basic form of the generated code was MOVEI 5,6 ; load constant 6 into register 5 MOVEM 5,-3(17) ; store constant in a ADD 5,-3(17) ; add a to a SUBI 5,1 ; subtract 1 (post-increment only) MOVEM 5,-2(17) ; store resulting value into b in all cases. The -n(17) stuff simply refers to variables allocated on the stack (PDP-10 stacks grow upwards). The only difference between the pre-increment and post-increment cases was that the pre-increment case didn't have the SUBI. This leads me to another question. This generated code does the job, but certainly isn't up to what an optimizing compiler can do, much less hand-coded assembly code. On the PDP-10, hand-coded assembly code could do the computations in 2 instructions if the value of a is unimportant afterwards (and if printf can take its argument in a register). We're talking a 50% slowdown in generated code, or more if we're in an inner loop and the compiler can recognize the pattern as a load-once constant. Has much been done in the technology of optimizing C compilations? -------
john@basser.oz (John Mackin) (02/03/86)
I originally wrote the following as a piece of mail to the poster of the article, but then I thought someone might be led to believe some of the hideous misstatements he made, so I am following-up instead... In article <2147@brl-tgr.ARPA> NEVILLE%umass-cs.csnet@csnet-relay.arpa (Neville D. Newman) writes: > While digging through the guts of the portable C compiler, i noticed > that it produced exactly the same code for two statements that i think > have different semantics. > According to my C references, the unary > operators have precedence over binary operators (and are evaluated right > to left). First problem. ``my C references'', ``my books'' (below): what are these meaningless terms? WHAT are you using as a reference? There is only ONE book that should be referred to in a case like this: ``The C Programming Language,'' by Brian W. Kernighan and Dennis M. Ritchie, Prentice-Hall, 1981, commonly referred to as ``K&R''. That is the document that defines the language, at least until the ANSI C standard is produced; and even then there will be ANSI C and K&R C, unless I miss my guess. If you are digging around in the internals of a C compiler and you haven't read K&R until you more or less know it by heart, go do so, and you don't need to read any more of this news item. If I seem to be belaboring the obvious, the reason will become clear very soon. > The rule for pre- and post-increment (and -decrement) operators > is, of course, that a++ changes the value of a after the value of the > term is used, so that the change is a side-effect. Correct. Read what you wrote ... ``IS A SIDE-EFFECT.'' Remember those words, we'll have cause to refer to them shortly. > ++a, on the other > hand changes the value of a before the term is used and is therefore > entirely equivalent to (a += 1) or (a = a+1). If by this you are trying to claim that the change to a in this case is NOT a side-effect, you're wrong. A side-effect is a change to a variable ``as a by-product of the evaluation of an expression'': K&R, page 50. > According to my books, if a is 5, then (a++ + a) ought to evaluate to 11. WHAT BOOKS? Any book which implies or states any such thing is just plain WRONG! Read K&R, page 50 (Section 2.12): In any expression involving SIDE EFFECTS, there can be subtle dependencies on the order in which variables taking part in the expression are stored. [Emphasis mine.] I won't quote it at greater length, it'd be too much to type, but they make it perfectly clear. What that expression ``ought to evaluate to'' is NOT DEFINED. The implementer of a given C compiler is free to evaluate it as they wish. Even lint knows that; applying it to your test program gives: xx.c(9): warning: a evaluation order undefined xx.c(12): warning: a evaluation order undefined xx.c(15): warning: a evaluation order undefined xx.c(18): warning: a evaluation order undefined xx.c(23): warning: a evaluation order undefined xx.c(26): warning: a evaluation order undefined xx.c(29): warning: a evaluation order undefined xx.c(32): warning: a evaluation order undefined > So the questions for the day are: Is pcc "right" because it is sort of the > defacto standard? (i have a friend who claims that BNF and such are useless, > the compiler is the only definition of a language that counts) There are cases, particularly with reference to the cpp, where this argument is valid. However, in this case it doesn't enter into the discussion, because the result is NOT DEFINED. > Is this > discrepancy between Unix C's behaviour and description already widely known > and carefully worked around? There IS no discrepancy. > Should i attempt to fix it and possibly break > some code or leave it alone for old time's sake? Like the old adage says: ``If it's not broken, DON'T fix it.'' > The pre-increments should be give results of 12 on all systems, > or else there's a *bad* problem. INCORRECT! The order of evaluation of such things IS NOT DEFINED! I'm sorry if I've been a bit over-vehement about this, but it does upset me when people don't read the documents in the case... John Mackin, Basser Department of Computer Science, University of Sydney, Sydney, Australia {seismo,ukc,mcvax,ubc-vision,prlb2}!munnari!basser.oz!john john%basser.oz@SEISMO.CSS.GOV CSNET: john@basser.oz
hans@erisun.UUCP (02/04/86)
This has no doubt been multiply reiterated over the years, but here goes again: The result of the statement b = ( a++ + a ); is not defined by the semantics of C. Evaluation of subexpressions may be performed in any order and, specifically, code which depends on subexpression evaluation order is erroneous. This is one trait C shares with most sequential assignment based procedural languages. As an aside, there is an operator, ',' , which defines evaluation order and not much else, and there are the && and || operators, of course, but these destroy arithmetic values in their course of duty. The above statement could be written as b = ( ( b = a++ ), b += a ) to produce one particular evaluation order, or b = ( ( b = a ), b += a++ ) for the other order, but neither appears to have any advantages compared to their sequential statement forms, b = a; b = a; b += ++a; b += a++; which are semantically indisputable. -- Two's complement, but three's an int. Hans Albertsson EIS, USENET/uucp: {decvax,philabs}!mcvax!enea!erix!erisun!hans
rjk@mrstve.UUCP (Richard Kuhns) (02/04/86)
In article <2147@brl-tgr.ARPA> NEVILLE%umass-cs.csnet@csnet-relay.arpa writes: (I shortened the note...) >This is posted to unix-wizards instead of to net.lang because i believe >that it shows faulty semantics in the Unix C compiler. i don't know >the proper to get to arbitrary newsgroups, being an Internet person, so >if the moderator would kindly forward it there i would appreciate it. > >According to my books, if a is 5, then (a++ + a) ought to evaluate to 11. >On 4.2bsd (or any system using the pcc, i imagine) it evaluates to 10. >On VMS with VAX-C v2.1, it evaluates to 11. > >So the questions for the day are: Is pcc "right" because it is sort of the >defacto standard? (i have a friend who claims that BNF and such are useless, >the compiler is the only definition of a language that counts) Is this >discrepancy between Unix C's behaviour and description already widely known >and carefully worked around? Should i attempt to fix it and possibly break >some code or leave it alone for old time's sake? > According to everything quote official unquote I've read on the subject, there is no discrepancy between Unix C's behaviour and description. Quoting from "A C PROGRAM CHECKER - lint" (dist. with AT&T SYSV.2), "In order that the efficiency of C language on a particular machine not be unduly compromised, the C language leaves the order of evaluation of complicated expressions up to the local compiler. ... *In particular, if any variable is changed by a side effect and also used elsewhere in the same expression, the result is explicitly undefined.* " -- Rich Kuhns {ihnp4, decvax, etc...}!pur-ee!pur-phy!mrstve!rjk