scs@adam.mit.edu (Steve Summit) (06/20/91)
This protracted debate has illustrated two subtly but significantly different ways of thinking about expressions such as (i = 1) == (i = 2) One school of thought says "the expression contains two assignments to the same object, therefore it's undefined. Period; end of report." The second school says "Yes, we understand that you can't tell whether (i = 1) or (i = 2) happens first, but it's still the case that it boils down to (1 == 2) which is always false, right?" The first school says "No, it's not an order of evaluation problem; the fact that there are two assignments renders the whole expression undefined, and anything can happen." The second school says "Yes, we understand that you can't tell what value i will end up with, but the value of each assignment is unambiguously its right-hand side, right?" And so it goes. The first school keeps saying "it's undefined!", assuming that that fully answers the question, and it can't understand why the second school keeps asking more questions. (Before I go any further, let me point out that I am not trying to cast any stones here. The first school, though correct, has been somewhat knee-jerk in its responses, myself included. The second school is displaying what ought to be a healthy curiosity about "what's really going on.") I have been leaning toward the first school ever since I was first learning C, when I read, in K&R, this line I keep quoting: The moral of this discussion is that writing code which depends on order of evaluation is bad programming practice in any language. Naturally, it is important to know what things to avoid, but if you don't know how they are done on various machines, that innocence may help to protect you. Now, I'll admit that I read into this statement a bit more than it explicitly says. Whenever I see *any* "fishy" expression, whether it's a[i] = i++ or printf("%c %c\n", getchar(), getchar()) or printf("%d\n", i++ * i++) or (i = 1) == (i = 2) , or anything else with potential multiple side effect or evaluation order ambiguities, a little alarm goes off that says "stay away!" That's all it takes. I don't start thinking about what the compiler might reasonably (or unreasonably) do, or looking at the assembly output, or reading through documentation trying to discover if some subpart of the expression might have a defined value. (I don't try to discover "how they are done on various machines.") I call this good, safe programming. I used the word "circumspect" in the Subject: line, but it could also be labeled "conservative." Someone will likely label it (pejoratively) as "paranoid," as if one shouldn't have to worry about such things, or as if one ought to be able to take advantage of unspecified or undefined nuances if the code in question doesn't have to be portable, or as if casting anything that even hints at undefinedness out of one's programming vocabulary would be unacceptably restrictive to one's creativity. I have found none of these restrictions stifling; in fact they are quite liberating, in that I almost never have to track down stupid, subtle bugs, or move mountains to port code. In an earlier article on this topic, I mentioned that "The comp.lang.c frequently-asked questions list has a bit to say about undefined order of evaluation." A number of people have taken me to task for this, saying that the FAQ list answer doesn't cover (i = 1) == (i = 2) at all. Now, I didn't claim that it answered the current question (in fact, it mentions "order of evaluation" which we've agreed this problem isn't), but I will admit that, to me, the FAQ list answer does cover both cases, in that the same alarm bell -- evoked by the same "innocence may serve to protect you" quote -- goes off either way. I hope this article doesn't sound too pompous, or holier-than- thou, or us vs. them. There are obviously quite a few people in what I have called the "second school," and it would be quite insensitive of me to just say that they should all think the way I do. (However, I do have to admit that wondering if there can be meaning in (i = 1) == (i = 2) , even though it's explicitly undefined, seems rather like wondering if one can be a little bit pregnant.) Now, it may be that some of the people who are keeping this thread alive aren't really worried about the (undefined) expression (i = 1) == (i = 2) at all, but are rather simply wondering whether the value of the expression i = 1 is "one" or "the value of i." (There have even been suggestions made that the answer is somehow different for ANSI C than "Classic" C, and that the ANSI Standard answer therefore isn't relevant for pre-ANSI compilers.) This starts looking like a hard question to answer, because you can't find words in the Standard (or in any number of C reference books) which explicitly answer it. The answer isn't written down explicitly because it's so simple: *it doesn't matter*. It is defined that the value of an assignment statement is the value of the right-hand side, cast to the type of the left-hand side. In a correct program (one which doesn't have multiple assignments, within the same expression, to the same object, in particular to the one on the left-hand side) there is absolutely no detectable difference between "the value of the right-hand side, cast to the type of the left-hand side" and "the value (after the assignment) of the left-hand side," because "the value of the right-hand side, cast to the type of the left-hand side" is precisely what gets assigned to the left-hand side. A compiler writer therefore has complete freedom to arrange to emit code which either re-fetches the left-hand side, or uses the coerced value of the right-hand side. As long as there can't be other intervening assignments to the left-hand side, it can't matter which choice is made. This is an excellent example of how an explicitly undefined area of the language (i.e. that it's undefined what happens if you modify the same object twice within one expression) allows the compiler writer a useful freedom, so that compiler writers are then likely to make use of that freedom, and write different compilers that implement the undefined areas in different ways, so that programmers are strongly advised to leave the undefined areas well alone, lest they break their side of the contract (i.e. the standard) and yank the rug out from under the compiler writer (and, more significantly, themselves) by instigating a case the compiler writer was allowed to assume "couldn't happen." This explains why the "first school" keeps harping on the "no multiple side effects to the same object" rule, which is really the relevant issue. If there aren't multiple side effects to the same object, assignment semantics aren't confusing (or worth talking about); and if there are, the expression is undefined, so it's really not worth talking about. (Note, too, that the situation is not any more undefined under the ANSI rules than it was before: compilers have always been free to -- and I am aware of pre-ANSI compilers which do -- implement (i = 1) == (i = 2) in the "surprising" or "wrong" way.) The final case, which has been raised by a few alert correspondents, concerns the value of i = 1 when i is volatile. The volatile qualifier is new with ANSI C (and C++), so there is not as much experience with it. As Chris (and perhaps others) have already pointed out, the semantics of volatile objects are themselves not very fully defined by the Standard, but are left to the implementation, so we can't answer this last question definitively. The value of i = 1 when i is volatile might be guaranteed to be one, or it might be guaranteed to be the fetched value of i (which is not necessarily one, even in the absence of intervening asynchronous stores to i, if i is a register with special read/write semantics). Presumably, a conscientious vendor will think about this case, define a reasonable behavior, and document it well. Steve Summit scs@adam.mit.edu
jon@maui.cs.ucla.edu (Jonathan Gingerich) (06/21/91)
First, let's get a clear understanding of the issue, without presuming anyone's position. What happens with a = v? Evaluation of a yields an address and evaluation of v yields a value. These evaluations can interleave and interfere as in a[i] = i++; The value is stored into the address and the value stored at the address is the value of the expression. The fundamental question is whether this latter sequence is one or two independent actions. If it is one, then (i=1) == (i=2) must be false. If it is two then obviously "order of evaluation" allows it to be true. Let's call this question "independence of side-effects". This question is subtle and not definitively answered under "assignment operators" in either K&R or ANSI. Tradition does suggest the latter interpretation but reasonable people can disagree with whether compiler writers received implicit permission to do this. Many people assumed the answer and missed the question which is why some of the discussion is so vehement. Now ANSI has cut the Gordian knot on this issue by declaring that any expression which writes twice or independently reads and writes a location is "undefined". This is something new, suggested by a decade of experience with C. The K&RI concept is really unspecified order of evaluation; There were areas which are ambiguously or not addressed, this being one. To see a difference, consider the statement: if ((i=1) == (i=2)) then i=3 else i=3; under ANSI and K&RI. The ANSI rule is a great help, and advice to stay away from such situations is solid. But it is inappropriate to include references to "sequence points" in the FAQ for comp.lang.c, especially when one cannot even find them in K&RII; and comp.lang.c is not reserved for advice on how to code - it's for explanations of C. The original example was not coded, but a product of a C++ compiler. I want to thank Steve for his work developing and maintaining the FAQ. It is an excellent idea. But the FAQ answer to "order of evaluation" would be greatly improved if it clearly delineates the "independence of side-effects", "order of evaluation", and "completion of side-effects" questions; admits to ambiguity in K&R and introduces the ANSI "undefined" and "sequence point" rules as a new, clear, and better approach to the question. Jon. Question for ANSI folks. Is f() + f() undefined if f() modifies a global?
berry@arcturus.uucp (Berry;Craig D.) (06/25/91)
scs@adam.mit.edu (Steve Summit) writes: >Now, it may be that some of the people who are keeping this >thread alive aren't really worried about the (undefined) >expression > (i = 1) == (i = 2) >at all, but are rather simply wondering whether the value of the >expression > i = 1 >is "one" or "the value of i." (There have even been suggestions >made that the answer is somehow different for ANSI C than >"Classic" C, and that the ANSI Standard answer therefore isn't >relevant for pre-ANSI compilers.) >This starts looking like a hard question to answer, because you >can't find words in the Standard (or in any number of C reference >books) which explicitly answer it. The answer isn't written down >explicitly because it's so simple: *it doesn't matter*. It is >defined that the value of an assignment statement is the value of >the right-hand side, cast to the type of the left-hand side. In >a correct program (one which doesn't have multiple assignments, >within the same expression, to the same object, in particular to >the one on the left-hand side) there is absolutely no detectable >difference between "the value of the right-hand side, cast to the >type of the left-hand side" and "the value (after the assignment) >of the left-hand side," because "the value of the right-hand >side, cast to the type of the left-hand side" is precisely what >gets assigned to the left-hand side. I am straining my memory somewhat here, but I recall reading an article somewhere (Dr. Dobb's? Computer Language?) on the semantics of C under ANSI standard floating point operations. One point raised was that ANSI C specifically removes the requirement for "knothole" casts to floats; e.g., if you have an 80-bit value in a floating point register, and you cast it to double (say, (double) (5.0 * 3.0)), the extra 16 bits (assuming 64-bit doubles) are *not necessarily* scraped off by the cast. This could effect the value of something like a * (double) (b + c). Now, assignment to a double *does* scrape off the excess bits, by definition. So, the question of whether you are looking at the LHS of an assignment or the (typecast) RHS as its value is semantically important in this case. Any comments? Have I overlooked something here?
torek@elf.ee.lbl.gov (Chris Torek) (06/26/91)
In article <1991Jun24.202840.26091@arcturus.uucp> berry@arcturus.uucp (Berry;Craig D.) writes: >... I recall reading an article somewhere ... on the semantics of C >under ANSI standard floating point operations. ANSI C says little about floating point operations (it leaves a lot of details up to the implementor, and no doubt leaves others undefined; it *does* constrain implementors to use binary representations). >One point raised was that ANSI C specifically removes the requirement >for "knothole" casts to floats; e.g., if you have an 80-bit value in a >floating point register, and you cast it to double ... the extra 16 >bits (assuming 64-bit doubles) are *not necessarily* scraped off by >the cast. This could [a]ffect the value of something like >a * (double) (b + c). This is correct (except that `specifically removes' is overstating the case). >Now, assignment to a double *does* scrape off the excess bits, by definition. This is not correct. The C standard does not say if or when `extra' precision vanishes. This is sometimes problematical. -- In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427) Berkeley, CA Domain: torek@ee.lbl.gov