[net.lang.c] *p++ = *p and more

ckk@g.cs.cmu.edu (Chris Koenigsberg) (03/26/86)

The question was "what does (*p++ = *p) do?"
The answer was that it could give one of two results, and both were legal
since the value is undefined, depending on whether the compiler decides to
increment before or after the fetch of the rvalue.

We ran into a similar problem with a section of code containing
	"i=0; while(i<N) a[i] = b[i++];"
and we found that the Sun 120 4.2 C compiler decided that a[0] gets b[1],
while the IBM RT PC ACIS 4.2 C compiler decided that a[1] gets b[1] and a[0]
gets nothing (remains null, if it was initialized this way). The error was
manifesting itself somewhere else, in a place where the char array a[]
was used, and the program behaved differently on the two machines.

So to expand the warning, you're not safe in reusing either a pointer
or an array index on both sides of an assignment statement, where one side
is unary incremented, no matter which side of the assignment the
increment operator is.

Chris Koenigsberg
ckk@g.cs.cmu.edu , or ckk%andrew@pt.cs.cmu.edu
{harvard,seismo,topaz,ucbvax}!g.cs.cmu.edu!ckk
Center for Design of Educational Computing
Carnegie-Mellon U.
Pgh, Pa. 15213

gardner@rochester.ARPA (Paul Gardner ) (03/27/86)

In article <357@g.cs.cmu.edu>, ckk@g.cs.cmu.edu (Chris Koenigsberg) writes:
> The question was "what does (*p++ = *p) do?"
> The answer was that it could give one of two results, and both were legal
> since the value is undefined, depending on whether the compiler decides to
> increment before or after the fetch of the rvalue.
> 
> We ran into a similar problem with a section of code containing
> 	"i=0; while(i<N) a[i] = b[i++];"
> and we found that the Sun 120 4.2 C compiler decided that a[0] gets b[1],
> while the IBM RT PC ACIS 4.2 C compiler decided that a[1] gets b[1] and a[0]
> gets nothing (remains null, if it was initialized this way).

Are you sure that's the way it happens? Assuming that the "a[i] = b[i++]" is
just ambiguous and not wrong (this seems to be the consensus anyway) then
either we index into a[] first or we index into b[] and increment i first.
In the first case a[0] gets b[0] and i == 1 after the assignment. In the
second case a[1] gets b[0] and i == 1 after the assignment.
In any case the results you mention make no sense when the postfix ++ is used.
Perhaps you meant:
  	"i=0; while(i<N) a[i] = b[++i];"
It seems to fit the symptoms.

---------------
Paul C. Gardner 
UUCP:  ..!{allegra,seismo,decvax,cmcl2}!rochester!gardner

wsmith@uiucdcsb.CS.UIUC.EDU (04/02/86)

I think compilers can do the post-increment anytime they feel like it
with in the statement.     The semantics of a[i] = b[i++];   isn't
defined so the compiler can do whatever it wants.  Even if that means
that both i's are evaluated before the increment is done.

f(i++,i++,i++,i++);    is a similar statement.   Most compilers do a 
reasonable thing although you don't know a priori if it is evaluated
left to right or right to left.  The Tartan C compiler optimizes it so
that there aren't even 4 increments done.   Only one increment per statement
per variable is unambiguous so that is all the compiler seems to do.

The moral of this story:   I something is undefined in a language, don't do it.
	You're adding compiler dependencies and are generally asking for 
	trouble.
(lint would complain about such a construct: "order of evaluation undefined")


Bill Smith
ihnp4!uiucdcs!wsmith

tainter@ihlpg.UUCP (Tainter) (04/04/86)

> I think compilers can do the post-increment anytime they feel like it
> with in the statement.     The semantics of a[i] = b[i++];   isn't
> defined so the compiler can do whatever it wants.  Even if that means
> that both i's are evaluated before the increment is done.
> Bill Smith
> ihnp4!uiucdcs!wsmith

According to K&R page 42 section 2.8
     ....But the  expression ++n increments n 'before' using its value, while
     n++ increments n after its value has been used.

SO any time they feel like it is not valid.  Indexing a[i] first or
indexing b[i] first IS valid, but b is indexed with the value of i BEFORE
i is incremented.  a can be index either before or after i is incremented.
--j.a.tainter

greg@utcsri.UUCP (Gregory Smith) (04/06/86)

In article <1771@ihlpg.UUCP> tainter@ihlpg.UUCP (Tainter) writes:
>> I think compilers can do the post-increment anytime they feel like it
>> with in the statement.     The semantics of a[i] = b[i++];   isn't
>> defined so the compiler can do whatever it wants.  Even if that means
>> that both i's are evaluated before the increment is done.
>> Bill Smith
>> ihnp4!uiucdcs!wsmith
>
>According to K&R page 42 section 2.8
>     ....But the  expression ++n increments n 'before' using its value, while
>     n++ increments n after its value has been used.
>
>SO any time they feel like it is not valid.  Indexing a[i] first or
>indexing b[i] first IS valid, but b is indexed with the value of i BEFORE
>i is incremented.  a can be index either before or after i is incremented.
>--j.a.tainter
b is indexed with ( the value of i BEFORE i is incremented ):	True
(b is indexed with the value of i ) BEFORE i is incremented:	False

This compiler *can* increment i before doing anything anything else; as
long as the *original* value of i is used.  This is why there are
quotes around 'before' in your quote from K&R.  I have an ( admittedly
brain-damaged ) C compiler for 8080 that *always* does i++ as (++i-1).
Here is a horribly contrived example where it makes a difference:

register struct { int wombat } *p,*q;

		*q = p[(p++)->wombat ];

If the increment is done after the indexing, as you say, this will be
the same address as p[p->wombat]; if the increment is done as (++p-1),
and p[..] uses the incremented value of p, then (p+1)[p->wombat] will
result. Actually, p++ need not be done as (++p-1) for this to happen.
In fact, it will only work as you say if the first 'p' mentioned is
used unincremented, which is unlikely if p is a register. I realize that
I have an operation (->, between p++ and []) that b[i++] doesn't have; the
point still stands.

The point is that there is no guarantee as to when the increment is
done in an expression like the one above. All you know is that p++ will
return the original value of p ( in my example ).
-- 
"If you aren't making any mistakes, you aren't doing anything".
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg

john@wvlpdp (04/17/86)

	K & R says on the last page (50) of Chapter 2.

	Function calls, nested assignment statements and increments and
	decrement operators cause "side effects" - some variable is changed
	as a byproduct of the evaluation.   In any expression involving side
	effects, there can be subtle dependencies on the order in which
	variables taking part in the expression are stored.  One unhappy
	situation is typified by the statement:

		a[i] = i++;

	The question is whether the subscript is the old value of i or the new.
	The compiler can do this in different ways, and generate different
	answers depending on its interpretation.  When side effect (assignment
	to actual variables) takes place is left to the discretion of the
	compiler, since the best order strongly depends on machine architeture.

	The moral of this discussion is that writing code which depends on
	order of evaluation is a bad programming practice in any language.
	Naturally, it is necessary to know what things to avoid, but if you
	don't know how they are done on various machines, that innocence may
	help to protect you.  (The C verifier lint will detect most dependencies
	on order of evaluation.)