[net.lang.c] Theory of Pure C, chapter 937

johnl (12/20/82)

Consider the following code fragment:

int x;  /* global definition */
...
foo()
{
	bar(x++);
}
...
bar(oldx)
{
	if(x == oldx) ...
}

Does bar see the old value of x or the new value?  The C Reference Manual
sheds little light.  It says that "After the result is noted, the object
is incremented ...."  One might claim that it's indeterminate, but a case
could be made that the value has to be noted before the call happens, so
the incrementation should happen, too.  Lint doesn't complain about this,
by the way.

In practice, anybody who does this sort of thing gets what he (it?)
deserves, but arguing is always fun.

Reply to me, if you must.

John Levine, decvax!yale-co!jrl, ucbvax!cbosgd!ima!johnl, Levine@YALE (arpa).

PS:  This reminds me of the problems of recursive I/O that snuck into
Fortran when general expressions were allowed as subscripts, since you
could put a function call into a subscript expression in an I/O list and
have I/O in the function.  Urrgh.

--------

tim (12/21/82)

It seems obvious that in the fragment

{ ... bar(x++) ... }
bar(arg) { ... oldx = arg; ... }

arg is bound to the old value of x. Noting the value of x in this
case  involves  binding it to the formal parameter (pushing it on
the stack and executing the call). How else could  this  possibly
work? What else could "noting" mean in this context?

                                        	Tim Maroney
                                        	unc!tim

johnl (12/23/82)

#R:ima:-26400:ima:3900001:000:1167
ima!johnl    Dec 22 12:37:00 1982

Several people have written to me saying that they miss the point of this
exercise, so here it is again, clearer:

int x;  /* global definition */

foo() {         bar(x++); }

bar(arg) {      if(x == arg) ... ; }

The question is what value bar() sees for the global variable "x".  The
argument to bar() is definitely the old value of x, but the code
generated could be either of these:

	(1)     tmp = x, ++x, bar(tmp);
	(2)     bar(x), ++x;

Note that in (1), bar() sees the new value of x, while in (2) it sees the
old value.  The reason I asked was that I was poking through PCC and
there seems to be hackery to force it to do (1), even though on many
machines the code for (2) would be shorter.  The official rule appears to
be that the ++ will be done at least as soon as the next comma operator
(not comma-argument-separator) or semicolon, so it's ambiguous just like:

	foo[i++] = i;

Are there lots of sleazy programs out there that depend on this kind of
thing?  Perhaps it's too close to Christmas for this much subtlety.

If you still don't get it, reply to me, not the net.

John Levine, decvax!yale-co!jrl, ucbvax!cbosgd!ima!johnl, Levine@YALE (arpa).

leichter (12/23/82)

It usually says somewhere in most language manuals that the arguments get
evaluated before the function is called.  I don't know off-hand whether
this particular wording appears in the C manuals, but it's a fairly obvious
way to describe how f(<complex-expression>) works.  In fact, something like
that is necessary, redundant as it sounds, if you allow side-effects in
function arguments - f(getc()), for example, had better call getc() before
f() gets started up.  (One COULD have a call-by-need semantics in which getc()
doesn't get evaluated until f() actually needs its value.  This sort of thing
works well in side-effect-free languages, but obviously fails here.)

SO, what John is really asking is:  What is part of "evaluating" x++ - just
the extraction of the old value, or both the extraction and the increment?
This is never clearly defined, as far as I can tell, in any C document,
although given the origin of the post-fix ++ - the PDP-11 auto-increment
instruction - it's clear that what was intended was for "evaluate" to
include both steps.  Hence, the function should see the updated value
when it looks in the global.

(If you don't make this the interpretation, things can get awfully muddy.
Consider:

	f(foo)
	int	foo;
\\\ cancel that, no editor;

	int *f(foo)
	int	foo;
\\\ugh, this isn't going to fly "on the fly" as it were.  The idea is:
Take John's function but also return the address of x; the make the
calling expression is:

	y = *f(x++);

Does y get the new or the old x?  If we decide that x gets incremented
AFTER f gets called (so that f sees the old global x) - how much after?
After or before the assignment to y?
						-- Jerry
					decvax!yale-comix!leichter
						leichter@yale

ajh (12/23/82)

Since I'm neither a C expert nor a language wizard,
I'm not sure how correct this is, but it would seem
logical to have:
	tmp:=x; x++; bar(tmp)
because if you had bar(a+b), you would evaluate a+b
first, and then call bar.  By the same reasoning,
you should "evaluate" x++ first, returning a value
of x and incrementing x, and then call bar.

			Alan J. Hu
			...ucbvax!sdcsvax!ajh (or something like that)

peachey (12/23/82)

	Regarding the side-effects of 'x++' used as function argument:

	The best solution is probably to make the order of the x++
	and the call to bar() "undefined".  I guess that this is
	what the existing language documents do.  To say it is
	"undefined" is to imply that only a fool would write a
	program which depended on the order, assuming he cared
	about portability.

	Of course, in practice everybody will go and write such
	programs anyway, because they are "only planning to use
	the programs once".  So best of all is a language/compiler
	which actively PREVENTS you from writing programs which
	depend on such things.

	Philosophically, I guess I believe this, but as a professional
	C programmer, I can't really see how to enforce these
	rules in my favorite language without reducing the joyous
	freedom of C programming.  Therefore, I suggest that the
	best compromise is to choose one or the other evaluation
	order and specify it as the only correct behavior.  This
	will certainly make some compiler writer's job harder,
	but is probably worth it.

	My final point is that maybe it makes sense for the full
	effects of 'x++' to take place before the call to bar(),
	since 'x++' as an expression must be evaluated before
	the call is made.  A very simple compiler could concern
	itself with generating code for 'x++', as a subtask
	of compiling 'bar(x++)'.  None of the effects of the
	inner subexpression would result in code generated
	after the code for the function call.  Under these rules
	'bar()' would see the new value of global variable 'x'.

				Darwyn Peachey
				Hospital Systems Study Group

				harpo!utah-cs!sask!hssg40!peachey

glm (12/28/82)

I'm not a C expert but in reference to the question of the result of

		foo(x++)

giving x or x incremented to the function foo(), consider the following:

		foo(x++ * x)

I would hope that x times x incremented would be sent to the function.

The increment should be after x is considered in the complex expression,
but everything should be done before the function is called.  For this to
work, the simple expression must also work the same way.  Therefore an
incremented x must be sent to the function.

					Gary Mann (we53!glm)

deg (01/06/83)

Problem:
 foo( x++ * x )

Data: (according to "The C Programming Language" K&R)
 p. 185: "expressions involving a commutative and associative
	operator ( *,+,&,|,^) may be rearranged arbitrarily by the
	compiler."
	so the x++ may be evaluated before or after the other x

 p. 187: "[for postfix ++] After the value is noted, the III) evaluate x (second x)
     evaluate x (first x)
     pass value of X*X to function
     after function returns store X+1 in x
     (if foo changes x, the change is lost)
     (This would require a bizzare compiler, but, I think, is legal)

I think all other possible linearizations produce one of the
above results.

David Good
(iwsl1!deg)

jhf (01/07/83)

Gary Mann (we53!glm) says he would expect that in the function call

	foo ( x++ * x)

the value "x times x incremented" be passed to foo.  In fact, he has
no good reason for expecting this, aside from demanding that his
compiler execute expressions strictly left-to-right.  As has been
pointed out previously in this discussion, the order of evaluation of
expressions, and the timing of side effects of an expression are \not/
defined.  (See section 7 of the C reference manual.)  Thus, in the above
example, if x is initially 2, the value passed to foo can be either
4 or 6.  All one can depend on is that when the next statement is executed,
x will have been incremented.

Another way of expressing the philosophy of side-effects in C is this --
It's all right for expressions to have side effects.  (In fact, since
C is an expression language, that's the only way we have any effects at all!)
But the effects of executing an expression must be independent of each
other and of the evaluation of the expression.

I would not want to have compiler writers forced to give some consistent
semantics to expressions such as the above; rather, they should be given
free rein in generating code for expressions, to produce compact and
efficient code.

zrm (01/09/83)

The pre/post increment/decrement operators of C were intended to reflect
the ability of the PDP11 to perform these operations in hardware. The
contexts in which this construct was intended to be used are array
subscripting, and, since C has pointer arithmetic, pointer
increment/decrement. Outside of obviously readable and semantically
clear cases and the above contexts, one should steer clear of these
operators. 

In fact, C is equipped with the += -= *= /= operators to make such
things as increment/decrement both clear and easy to type. (Are you a
touch typist or a void typist?)

Cheers,
Zig