[comp.lang.c] cast changes meaning of auto-increment?

mikpe@mina.liu.se (Mikael Pettersson) (09/12/88)

Consider the following piece of code:
--begin-example--
{
	int i = 27;
	register char *cp, *oldcp;

	oldcp = cp = (char *)&i;
	printf("i == %d, ", *(int *)cp++);		/* should print `27' */
	printf("and cp increased by %d\n", cp-oldcp);	/* 4 or 1 ? */
}
--end-of-example--
My question is: what should be printed by the second printf, 4 or 1?
Many PCC-based m68k compilers (including SUN's) print `4' but I know of at
least two (GCC on our SUNs and GOULD's UTX/32 compiler) that insist on
printing `1'.

Since a cast doesn't produce an lvalue, I assume the ++ should be applied
to `cp' after the int was fetched (i.e., cp should increase by 1), but the
other (buggy?) behaviour could come in handy when writing fast C
versions of the bcopy/memcpy family.


/Mike
-- 
Mikael Pettersson           ! Internet:mpe@ida.liu.se
Dept of Comp & Info Science ! UUCP:    mpe@liuida.uucp  -or-
University of Linkoping     !          {mcvax,munnari,uunet}!enea!liuida!mpe
Sweden                      ! ARPA:    mpe%ida.liu.se@uunet.uu.net

gwyn@smoke.ARPA (Doug Gwyn ) (09/13/88)

In article <901@mina.liu.se> mikpe@mina.liu.se (Mikael Pettersson) writes:
>	register char *cp, *oldcp;
>	oldcp = cp = (char *)&i;
>	printf("i == %d, ", *(int *)cp++);		/* should print `27' */
>	printf("and cp increased by %d\n", cp-oldcp);	/* 4 or 1 ? */

>My question is: what should be printed by the second printf, 4 or 1?

1, of course.  It's the difference between the old (char *) and an
incremented-by-one (char *).

>Since a cast doesn't produce an lvalue, ...

This has nothing to do with lvalues.  The indirection operator, the
cast operator, and ++ are right-associative at the same level of
precedence in the C grammar.  The ++ operates on cp before the other
operators in this example.  (However, the incremented value is not
stored into cp until after the original value of cp is used, by the
very definition of the post-increment operator.)

chris@mimsy.UUCP (Chris Torek) (09/14/88)

In article <901@mina.liu.se> mikpe@mina.liu.se (Mikael Pettersson) writes:
>Consider the following piece of code:
>--begin-example--
>{
>	int i = 27;
>	register char *cp, *oldcp;
>
>	oldcp = cp = (char *)&i;
>	printf("i == %d, ", *(int *)cp++);		/* should print `27' */
>	printf("and cp increased by %d\n", cp-oldcp);	/* 4 or 1 ? */
>}
>--end-of-example--
>My question is: what should be printed by the second printf, 4 or 1?

The answer is 1; but:

>Many PCC-based m68k compilers (including SUN's) print `4' but I know of at
>least two (GCC on our SUNs and GOULD's UTX/32 compiler) that insist on
>printing `1'.

Any compiler that does this is buggy.  It is not even a question of
whether one can increment the result of a cast (many PCC compilers
allow this).

>Since a cast doesn't produce an lvalue, I assume the ++ should be applied
>to `cp' after the int was fetched (i.e., cp should increase by 1) ....

All you need do to answer this particular question is to look at the
operator precedence.  `++' operates before `cast'.  The expression
`i = *(int *)cp++' must thus be evaluated in the (virtual) order:

	cp		(lvalue, type `char *', value &i)
	postfix-++	(rvalue, type `char *', value &i)
			[side effect: last lvalue (cp) += sizeof(char)]
	(int *)		(rvalue, type `int *', value &i)
	*		(lvalue, type `int', value i)

The only way the bug would make any sense at all (other than as a
clear bug) would be if the expression were written

	i = *((int *)cp)++);

to force the order to

	cp		(lvalue, type `char *', value &i)
	(int *)		(rvalue, type `int *', value &i)
	postfix-++	(error, rvalue, type `int *', value &i)
			[side effect: last lvalue (none) += sizeof(int)]
	*		(rvalue, type `int', value i)

which is (as Mike notes above) illegal, but if the compiler forgets
to change `l' to `r' value after the cast would produce what Sun's PCC
does.

Note that the SunOS compilers that get

	*(int *)cp++

wrong get

	*(int *)++cp

right.  Also, if we run the compiler with debugging on, we can
even see that the problem is in the code generator, which is simply
overly liberal with its reg@+ addressing modes:

	f(){
		register char *cp;
		register int *ip, i;
		i = *(int *)cp++;
		i = *(int *)ip++;
	}
	(/lib/ccom -Xe)

	0x509d0) =, int, 17, 4
	    0x50850) REG, 0x0, 7, int, 17, 4
	    0x509b0) U*, int, 17, 4
		0x50910) ++, PTR int, 17, 4
		    0x508d0) REG, 0x0, 13, PTR char, 17, 2
		    0x508f0) ICON, 0x1, 16384, int, 0, 4
		movl	a5@+,d7

Note that a5 was supposed to be incremented by 1 (the value
of the ICON at 0x508f0), not 4.

	0x50b50) =, int, 17, 4
	    0x509f0) REG, 0x0, 7, int, 17, 4
	    0x50b30) U*, int, 17, 4
		0x50ab0) ++, PTR int, 17, 4
		    0x50a70) REG, 0x0, 12, PTR int, 17, 4
		    0x50a90) ICON, 0x4, 16384, int, 0, 4
		movl	a4@+,d7

This time, a4 was supposed to be incremented by 4 (and was).
(If I had compiler sources, I could probably even provide a fix,
but Sun will not sell those.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris