[net.lang.c] & operator

david@iwu1a.UUCP (12/02/83)

Can anyone define what the following expression means?

char *in;
char *out;

out = &(*in++);


Should "out" be set to "in" or "in + 1?"


Different compilers seem to have different opinions.  I believe it should
be set to "in."  Of course I discourage such expressions in real programs.


David Scheibelhut

smh@mit-eddie.UUCP (12/04/83)

		char *in;
		char *out;
		out = &(*in++);
	Should "out" be set to "in" or "in + 1?"

Upon consideration of the types during the evaluation of the
right-hand-side, it seems clear that:
	- the & operator is applied to the value of the expression
	  inside the parentheses;
	- that value is type char, and that char is the one pointed
	  to by the original value of in;
	- the address of that character has type (char *) and indeed
	  is the original value of in.
Therefore, the code example ought to be equivalent to:
	out = in;
	*in++;
The dereferencing of pointer (the '*') seems nugatory, but may have
side effects such as causing a segmentation violation or accessing
a device register.

	Different compilers seem to have different opinions.  I believe
	it should be set to "in."

If this analysis is correct, which are the compilers with a different
opinion?  It would be nice to know!

Steve Haflich, MIT Experimental Music Studio

chris@umcp-cs.UUCP (12/06/83)

Regarding "char *in, *out; ... out = &*in++;":

"&" has meaning only when used before an expression which is an
lvalue.  In this case the expression *is* an lvalue (the character
that "in" used to point to before the increment), so the correct
behaviour would be to assign the old value of "in" to "out".

I would say that unless you feel like proving that the implementation
is (or is not) correct, you should avoid such constructs.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris.umcp-cs@CSNet-Relay

olson@fortune.UUCP (12/15/83)

#R:iwu1a:-16200:fortune:16200013:000:584
fortune!olson    Dec 14 15:36:00 1983

I would have to disagree with chris as to what should be assigned by:
	char *in, *out, buf[];
	in = buf;
	out = &(*in++);
since he forgot the parentheses.  Given the parentheses, out SHOULD
be set buf+1; that is, 'in' is incremented BEFORE the & operator is
applied.

(The original article did not include the second line of my example,
and asked whether out should be set to 'in', or 'in+1'.  The answer
is of course 'in'.  The question is not meaningful as stated.  The
question should be: does it point to the ORIGINAL value of 'in', or
the incremented value of 'in'.)
	Dave Olson

leiby@yeti.UUCP (Mike Leibensperger) (12/15/83)

Dave Olson says:

	I would have to disagree with chris as to what should be assigned by:
	    char *in, *out, buf[];
	    in = buf;
	    out = &(*in++);
	since he forgot the parentheses.  Given the parentheses, out SHOULD
	be set buf+1; that is, 'in' is incremented BEFORE the & operator is
	applied.

I beg to differ.  In evaluating the subexpression "in++", the value returned
is the original value of "in".  Thus, "out" will get the original value of
"in".

(Now to go try it for real!  Flame first, ask questions later... :-) ).

-- 
Mike Leibensperger,  Massachusetts Computer Corporation
...!{ucbcad,tektronix,harpo,decvax}!masscomp!leiby

olson@fortune.UUCP (12/16/83)

#R:iwu1a:-16200:fortune:16200014:000:943
fortune!olson    Dec 15 12:19:00 1983

(This was in reply to Paul Fox (eagle!hou5h!pgf), who sent me mail)

I agree with the part about in being incremented after used. What I
disagree_d_ with was the idea that &(*in++) is the same as &*in++.

Page 186 of K+R says (2nd para.) that the type and VALUE of a
parenthesized expression is identical to the unadorned expression.  In
this case the expression evaluates to the original CHARACTER pointed to
by in.

The more research and thought I give to this, the more I suspect that
my original position was wrong; the KEY POINT being the last sentence
of the above paragraph.  This is yet another case of not pondering the
question long enough before replying.

As a matter of interest, the compilers on my 4.1bsd vax, and on my
Fortune 32:16 both assign 'out' the UNincremented value of 'in',
indicating that those compiler writers agree with your interpretation.

	Dave Olson, Fortune Systems
	{ihnp4,harpo,ucbvax!amd70}!fortune!olson

keesan@bbncca.ARPA (Morris Keesan) (12/16/83)

-------------------------------
Dave Olson (fortune!olson) says:

>I would have to disagree with chris as to what should be assigned by:
>	 char *in, *out, buf[];
>	 in = buf;
>	 out = &(*in++);
>since he forgot the parentheses.  Given the parentheses, out SHOULD be
>set buf+1; that is, 'in' is incremented BEFORE the & operator is applied.

I would have to disagree.  Extending the parenthesization one step further, we
get the following three equivalent statements:
	out = &*in++;
	out = &(*in++);
	out = &(*(in++));   /* ++ and unary * and & group right to left */

    (in++) is the value of 'in' BEFORE it is incremented.  Once this value is
evaluated, 'in' can be incremented, but its new value has no effect on the
further evaluation of the expression.  Then (*(in++)) is the object pointed to
by 'in' before incrementation, and &(*(in++)), or (&*in++), is its address,
which is of course the original value of 'in', which is 'buf', not 'buf+1'.
-- 
					Morris M. Keesan
					{decvax,linus,wjh12}!bbncca!keesan
					keesan @ BBN-UNIX.ARPA

ntt@dciem.UUCP (Mark Brader) (12/17/83)

	I would have to disagree with chris as to what should be assigned by:
		char *in, *out, buf[];
		in = buf;
		out = &(*in++);
	since he forgot the parentheses.  Given the parentheses, out SHOULD
	be set buf+1; that is, 'in' is incremented BEFORE the & operator is
	applied.
	
No, Chris was right.  The parentheses here have no effect at all.
Parentheses only affect binding, and here the default binding is the same
as what they ask for.  It is analogous to this:
	x = (y++);
Here, x will get the OLD value of y.  And out will get buf, the old value of in.
The value of a variable that has ++ after it does NOT change until after it has
been used.  In this case that means that the old value of in is "passed to" *
and then to () and then to &.

Mark Brader, NTT Systems Inc.

west@sdcsla.UUCP (12/17/83)

<<<no bugs here>>>

Dave Olson (fortune!olson) said:

	    char *in, *out, buf[];
	    in = buf;
	    out = &(*in++);
    Given the parentheses, out SHOULD be set buf+1; that is, 'in' is
    incremented BEFORE the & operator is applied.

    (The original article did not include the second line of my example,
    and asked whether out should be set to 'in', or 'in+1'.  The answer
    is of course 'in'.  The question is not meaningful as stated.  The
    question should be: does it point to the ORIGINAL value of 'in', or
    the incremented value of 'in'.)
-------------< end quote > --------------

Referring to pages 48 & 91 of Kernighan & Ritchie (or 185-187), we
note that "++" and "*" are of equal precedence, but are evaluated
right-to-left.   So `in' should be incremented, and then what it
now points to should be ``looked at'', and then that address should
be taken and put into `out' ==> thus `out' gets loaded with the
incremented value of `in'.   The parentheses are superfluous.

So I tried it out and discovered that both 4.1c and 4.2 compilers
(Vax and Sun [68000]) disagreed with me.   Given the program:

    #include <stdio.h>
    char *in, *out, buf [ ] = "This string";
    main ()
    {
	    in = buf;          printf ( "in = %X\n", in );
	    out = &(*in++);    printf ( "in = %X, out = %X\n", in, out );
	    out = &*in++;      printf ( "in = %X, out = %X\n", in, out );
	    out = &*(in++);    printf ( "in = %X, out = %X\n", in, out );
    };

However, the results on both Vax and Sun went like this:
    in = 10004
    in = 10005, out = 10004
    in = 10006, out = 10005
    in = 10007, out = 10006

and so my original analysis is WRONG, too.   The reason can be found
on page 187 of K&R:
	``When postfix ++ is applied to an lvalue the result is the
	  value of the object referred to by the lvalue.  After the
	  result is noted, the object is incremented...''

Apparently, "After the result is noted" means after the statement is
evaluated, so parentheses don't matter.   `in' keeps its old value
until after the statement is finished.

			-- Larry West   UC San Diego
possible net addresses:
			-- ARPA:	west@NPRDC
			-- UUCP:	ucbvax!sdcsvax!sdcsla!west
			--	or	ucbvax:sdcsvax:sdcsla:west

smh@mit-eddie.UUCP (Steven M. Haflich) (12/18/83)

Regarding this fragment:

	    char *in, *out, buf[];
	    in = buf;
	    out = &(*in++);


Larry West cites K&R and notes:

		``When postfix ++ is applied to an lvalue the result is the
		  value of the object referred to by the lvalue.  After the
		  result is noted, the object is incremented...''

	Apparently, "After the result is noted" means after the statement is
	evaluated, so parentheses don't matter.   `in' keeps its old value
	until after the statement is finished.

Not quite... The term `statement' has a rather precise meaning to a
compiler.  The variable `in' is indeed modified at the same time that
the postfix ++ is evaluated, not after the entire statement.
Consider:
	char *in = "123";
	if (*in++ == *in++) ... ;	/* ought never be true */

Although except for the `,' `&&' and `||' operators C explicitly does
not specify order of evaluation within expressions (and therefore
expressions which use multiple `++' and `--' operators on the same
variable are usually crocky) I see no reason why the above conditional
could not be used to check whether the first two characters of a string
(known to be at least length two) are the same.

By the way, the comma separating function arguments is *not* the same
as the comma operator, and does *not* specify order of evaluation.
Thus, the following is implementation dependent:
	foo(*in++,*in++);

Steve Haflich
MIT Experimental Music Studio

andree@uokvax.UUCP (12/19/83)

#R:iwu1a:-16200:uokvax:3000012:000:840
uokvax!andree    Dec 16 09:38:00 1983

/***** uokvax:net.lang.c / fortune!olson /  9:57 pm  Dec 14, 1983 */
I would have to disagree with chris as to what should be assigned by:
	char *in, *out, buf[];
	in = buf;
	out = &(*in++);
since he forgot the parentheses.  Given the parentheses, out SHOULD
be set buf+1; that is, 'in' is incremented BEFORE the & operator is
applied.

(The original article did not include the second line of my example,
and asked whether out should be set to 'in', or 'in+1'.  The answer
is of course 'in'.  The question is not meaningful as stated.  The
question should be: does it point to the ORIGINAL value of 'in', or
the incremented value of 'in'.)
	Dave Olson
/* ---------- */

Yes, but the & operator is not being applied to `in', but to `*in', with
the increment happening AFTER the dereference. Therefore, you should get
buf, not buf+1.

	<mike

emjej@uokvax.UUCP (12/19/83)

#R:iwu1a:-16200:uokvax:3000013:000:318
uokvax!emjej    Dec 16 13:06:00 1983

Hold on--
		*in++
has mode char, and hence can't validly have & applied to it.
(Sorry about the Algol68 vocabulary, but it's the only way to
untangle the mess C makes of types.) If a particular C compiler
figures it can collapse &*, well, that's its business, but
it's certainly not to be counted on.

				James Jones

andree@uokvax.UUCP (12/19/83)

#R:iwu1a:-16200:uokvax:3000014:000:331
uokvax!andree    Dec 16 21:04:00 1983

James -

	In C, you can take the address of anything but a real, live
expression - a + b, c + d, etc. Since in++ is identical to in (with a
side effect), *in++ is identical to *in (with the same side effect).
Since &(*in) is valid (you can take the address of a char), &(*in++) is
valid, and SHOULD be the same as &(*in).

	<mike
	

ka@hou3c.UUCP (Kenneth Almquist) (12/19/83)

From Steven M. Haflich:
	The variable `in' is indeed modified at the same time that
	the postfix ++ is evaluated, not after the entire statement.
	Consider:
		char *in = "123";
		if (*in++ == *in++) ... ;	/* ought never be true */

	Although except for the `,' `&&' and `||' operators C explicitly does
	not specify order of evaluation within expressions (and therefore
	expressions which use multiple `++' and `--' operators on the same
	variable are usually crocky) I see no reason why the above conditional
	could not be used to check whether the first two characters of a string
	(known to be at least length two) are the same.

Although there may be some support for this view in the C manual, this is not
the interpretation of the people in Murrey Hill.  Be warned that C compilers
may not always perform the increment immediately after fetching the value of
the variable.  Thus the above statement could be compiled as
	temp = in;	/* save value of "in" before incrementing it */
	in += 2;	/* perform both post-increments with one addition */
	if (*temp = *temp) ... ;

Kenneth Almquist

dave@taurus.UUCP (Dave Lukes) (12/20/83)

Uh-uh:
	if(*in++ == *in++)

is NOT a sensible way to see if the first two characters of a string are
equal: ask lint, it will say something like: `in: evaluation order undefined'.

Also, for those of little faith, there are actually C compilers that
will take the non-obvious meaning of this (e.g. the PDP-11 cc), and do
the tests then do both increments in one go after the statement,
so BE WARNED,

		Yours in an arbitrary evaluation order
			Dave Lukes
			<United Kingdom>!ukc!hirst1!minotaur!dave