[comp.lang.c] Circumspect programming

scs@adam.mit.edu (Steve Summit) (06/20/91)

This protracted debate has illustrated two subtly but
significantly different ways of thinking about expressions such
as

	(i = 1) == (i = 2)

One school of thought says "the expression contains two
assignments to the same object, therefore it's undefined.
Period; end of report."

The second school says "Yes, we understand that you can't tell
whether (i = 1) or (i = 2) happens first, but it's still the case
that it boils down to (1 == 2) which is always false, right?"

The first school says "No, it's not an order of evaluation
problem; the fact that there are two assignments renders the
whole expression undefined, and anything can happen."

The second school says "Yes, we understand that you can't tell
what value i will end up with, but the value of each assignment
is unambiguously its right-hand side, right?"

And so it goes.  The first school keeps saying "it's undefined!",
assuming that that fully answers the question, and it can't
understand why the second school keeps asking more questions.

(Before I go any further, let me point out that I am not trying
to cast any stones here.  The first school, though correct, has
been somewhat knee-jerk in its responses, myself included.  The
second school is displaying what ought to be a healthy curiosity
about "what's really going on.")

I have been leaning toward the first school ever since I was
first learning C, when I read, in K&R, this line I keep quoting:

	The moral of this discussion is that writing code which
	depends on order of evaluation is bad programming
	practice in any language.  Naturally, it is important to
	know what things to avoid, but if you don't know how they
	are done on various machines, that innocence may help to
	protect you.

Now, I'll admit that I read into this statement a bit more than
it explicitly says.  Whenever I see *any* "fishy" expression,
whether it's

	a[i] = i++

or

	printf("%c %c\n", getchar(), getchar())

or

	printf("%d\n", i++ * i++)

or

	(i = 1) == (i = 2)

, or anything else with potential multiple side effect or
evaluation order ambiguities, a little alarm goes off that says
"stay away!"  That's all it takes.  I don't start thinking about
what the compiler might reasonably (or unreasonably) do, or
looking at the assembly output, or reading through documentation
trying to discover if some subpart of the expression might have a
defined value.  (I don't try to discover "how they are done on
various machines.")

I call this good, safe programming.  I used the word "circumspect"
in the Subject: line, but it could also be labeled "conservative."
Someone will likely label it (pejoratively) as "paranoid," as if
one shouldn't have to worry about such things, or as if one ought
to be able to take advantage of unspecified or undefined nuances
if the code in question doesn't have to be portable, or as if
casting anything that even hints at undefinedness out of one's
programming vocabulary would be unacceptably restrictive to one's
creativity.  I have found none of these restrictions stifling; in
fact they are quite liberating, in that I almost never have to
track down stupid, subtle bugs, or move mountains to port code.

In an earlier article on this topic, I mentioned that "The
comp.lang.c frequently-asked questions list has a bit to say
about undefined order of evaluation."  A number of people have
taken me to task for this, saying that the FAQ list answer
doesn't cover

	(i = 1) == (i = 2)

at all.  Now, I didn't claim that it answered the current
question (in fact, it mentions "order of evaluation" which we've
agreed this problem isn't), but I will admit that, to me, the FAQ
list answer does cover both cases, in that the same alarm bell --
evoked by the same "innocence may serve to protect you" quote --
goes off either way.

I hope this article doesn't sound too pompous, or holier-than-
thou, or us vs. them.  There are obviously quite a few people in
what I have called the "second school," and it would be quite
insensitive of me to just say that they should all think the way
I do.  (However, I do have to admit that wondering if there can
be meaning in

	(i = 1) == (i = 2)

, even though it's explicitly undefined, seems rather like
wondering if one can be a little bit pregnant.)

Now, it may be that some of the people who are keeping this
thread alive aren't really worried about the (undefined)
expression

	(i = 1) == (i = 2)

at all, but are rather simply wondering whether the value of the
expression

	i = 1

is "one" or "the value of i."  (There have even been suggestions
made that the answer is somehow different for ANSI C than
"Classic" C, and that the ANSI Standard answer therefore isn't
relevant for pre-ANSI compilers.)

This starts looking like a hard question to answer, because you
can't find words in the Standard (or in any number of C reference
books) which explicitly answer it.  The answer isn't written down
explicitly because it's so simple: *it doesn't matter*.  It is
defined that the value of an assignment statement is the value of
the right-hand side, cast to the type of the left-hand side.  In
a correct program (one which doesn't have multiple assignments,
within the same expression, to the same object, in particular to
the one on the left-hand side) there is absolutely no detectable
difference between "the value of the right-hand side, cast to the
type of the left-hand side" and "the value (after the assignment)
of the left-hand side," because "the value of the right-hand
side, cast to the type of the left-hand side" is precisely what
gets assigned to the left-hand side.

A compiler writer therefore has complete freedom to arrange to
emit code which either re-fetches the left-hand side, or uses the
coerced value of the right-hand side.  As long as there can't be
other intervening assignments to the left-hand side, it can't
matter which choice is made.  This is an excellent example of how
an explicitly undefined area of the language (i.e. that it's
undefined what happens if you modify the same object twice within
one expression) allows the compiler writer a useful freedom, so
that compiler writers are then likely to make use of that
freedom, and write different compilers that implement the
undefined areas in different ways, so that programmers are
strongly advised to leave the undefined areas well alone, lest
they break their side of the contract (i.e. the standard) and
yank the rug out from under the compiler writer (and, more
significantly, themselves) by instigating a case the compiler
writer was allowed to assume "couldn't happen."

This explains why the "first school" keeps harping on the "no
multiple side effects to the same object" rule, which is really
the relevant issue.  If there aren't multiple side effects to the
same object, assignment semantics aren't confusing (or worth
talking about); and if there are, the expression is undefined, so
it's really not worth talking about.

(Note, too, that the situation is not any more undefined under
the ANSI rules than it was before: compilers have always been
free to -- and I am aware of pre-ANSI compilers which do --
implement

	(i = 1) == (i = 2)

in the "surprising" or "wrong" way.)

The final case, which has been raised by a few alert
correspondents, concerns the value of

	i = 1

when i is volatile.  The volatile qualifier is new with ANSI C
(and C++), so there is not as much experience with it.  As Chris
(and perhaps others) have already pointed out, the semantics of
volatile objects are themselves not very fully defined by the
Standard, but are left to the implementation, so we can't answer
this last question definitively.  The value of

	i = 1

when i is volatile might be guaranteed to be one, or it might be
guaranteed to be the fetched value of i (which is not
necessarily one, even in the absence of intervening asynchronous
stores to i, if i is a register with special read/write
semantics).  Presumably, a conscientious vendor will think about
this case, define a reasonable behavior, and document it well.

                                            Steve Summit
                                            scs@adam.mit.edu

jon@maui.cs.ucla.edu (Jonathan Gingerich) (06/21/91)

First, let's get a clear understanding of the issue, without presuming
anyone's position.

What happens with a = v?

Evaluation of a yields an address and evaluation of v yields a value.
These evaluations can interleave and interfere as in a[i] = i++;

The value is stored into the address and the value stored at the address is
the value of the expression.

The fundamental question is whether this latter sequence is one or two
independent actions.  If it is one, then

(i=1) == (i=2)

must be false.  If it is two then obviously "order of evaluation" allows
it to be true.  Let's call this question "independence of side-effects".
This question is subtle and not definitively answered under "assignment
operators" in either K&R or ANSI.  Tradition does suggest the latter
interpretation but reasonable people can disagree with whether compiler writers
received implicit permission to do this.  Many people assumed the answer
and missed the question which is why some of the discussion is so vehement.

Now ANSI has cut the Gordian knot on this issue by declaring that any
expression which writes twice or independently reads and writes a location
is "undefined".  This is something new, suggested by a decade of experience
with C.  The K&RI concept is really unspecified order of evaluation;  There
were areas which are ambiguously or not addressed, this being one.  To see a
difference, consider the statement:

if ((i=1) == (i=2)) then i=3 else i=3;

under ANSI and K&RI.

The ANSI rule is a great help, and advice to stay away from such
situations is solid.  But it is inappropriate to include references
to "sequence points" in the FAQ for comp.lang.c, especially when one
cannot even find them in K&RII;  and comp.lang.c is not reserved  for
advice on how to code - it's for explanations of C.  The original example
was not coded, but a product of a C++ compiler.

I want to thank Steve for his work developing and maintaining the FAQ.  It
is an excellent idea.  But the FAQ answer to "order of evaluation" would
be greatly improved if it clearly delineates the "independence of side-effects",
"order of evaluation", and "completion of side-effects" questions;  admits
to ambiguity in K&R and introduces the ANSI "undefined" and "sequence point"
rules as a new, clear, and better approach to the question.

Jon.

Question for ANSI folks.  Is f() + f() undefined if f() modifies a global?

berry@arcturus.uucp (Berry;Craig D.) (06/25/91)

scs@adam.mit.edu (Steve Summit) writes:

>Now, it may be that some of the people who are keeping this
>thread alive aren't really worried about the (undefined)
>expression

>	(i = 1) == (i = 2)

>at all, but are rather simply wondering whether the value of the
>expression

>	i = 1

>is "one" or "the value of i."  (There have even been suggestions
>made that the answer is somehow different for ANSI C than
>"Classic" C, and that the ANSI Standard answer therefore isn't
>relevant for pre-ANSI compilers.)

>This starts looking like a hard question to answer, because you
>can't find words in the Standard (or in any number of C reference
>books) which explicitly answer it.  The answer isn't written down
>explicitly because it's so simple: *it doesn't matter*.  It is
>defined that the value of an assignment statement is the value of
>the right-hand side, cast to the type of the left-hand side.  In
>a correct program (one which doesn't have multiple assignments,
>within the same expression, to the same object, in particular to
>the one on the left-hand side) there is absolutely no detectable
>difference between "the value of the right-hand side, cast to the
>type of the left-hand side" and "the value (after the assignment)
>of the left-hand side," because "the value of the right-hand
>side, cast to the type of the left-hand side" is precisely what
>gets assigned to the left-hand side.

I am straining my memory somewhat here, but I recall reading an 
article somewhere (Dr. Dobb's?  Computer Language?) on the semantics
of C under ANSI standard floating point operations.  One point raised
was that ANSI C specifically removes the requirement for "knothole"
casts to floats; e.g., if you have an 80-bit value in a floating point
register, and you cast it to double (say, (double) (5.0 * 3.0)), the
extra 16 bits (assuming 64-bit doubles) are *not necessarily* scraped
off by the cast.  This could effect the value of something like
a * (double) (b + c).  Now, assignment to a double *does* scrape off
the excess bits, by definition.  So, the question of whether you are
looking at the LHS of an assignment or the (typecast) RHS as its value
is semantically important in this case.

Any comments?  Have I overlooked something here?

torek@elf.ee.lbl.gov (Chris Torek) (06/26/91)

In article <1991Jun24.202840.26091@arcturus.uucp> berry@arcturus.uucp
(Berry;Craig D.) writes:
>... I recall reading an article somewhere ... on the semantics of C
>under ANSI standard floating point operations.

ANSI C says little about floating point operations (it leaves a lot of
details up to the implementor, and no doubt leaves others undefined; it
*does* constrain implementors to use binary representations).

>One point raised was that ANSI C specifically removes the requirement
>for "knothole" casts to floats; e.g., if you have an 80-bit value in a
>floating point register, and you cast it to double ... the extra 16
>bits (assuming 64-bit doubles) are *not necessarily* scraped off by
>the cast.  This could [a]ffect the value of something like
>a * (double) (b + c).

This is correct (except that `specifically removes' is overstating the
case).

>Now, assignment to a double *does* scrape off the excess bits, by definition.

This is not correct.  The C standard does not say if or when `extra'
precision vanishes.  This is sometimes problematical.
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov