[comp.bugs.4bsd] Ultrix CC bug?

bl@infovax.UUCP (Bj|rn Larsson) (09/15/89)

Hello netters,

 yesterday I came upon a strange behaviour in Ultrix 'C'. The following
test program causes cc to complain about the left-hand side of the marked
expression.

	main()
	  {
	  short *p;
	  tst(p);
	  }

	tst (p)
	  short *p;
	  {
	  long	l;
	  l = *((long *) p)++;	/* <<< Here! */
	  }

  In my view,

	p			is a pointer to short,
	(long *) p		is a pointer to long,
	((long *) p)		is also a pointer to long,
	((long *) p)++		increments the above *long* pointer,
	*((long *) p)		is the long pointed to,

	thus

	*((long *) p)++		has the value of the *long* pointed
				to by p, and the *long* pointer is
				set to point to the next long (i.e.
				if sizeof (long)== 2*sizeof(short)
				then p will be incremented by two,
				counted in short's).

  Any objections? I could add that both Turbo C, MicroSoft C, and the
MicroTek C 68000 cross-compiler compile this as I believe 'correctly'.
And this is no problem, I fixed it at another place - but it's inte-
resting to hear what yous C compilers do...


							Bjorn

-- 
 ====================== InfoVox = Speech Technology =======================
 Bjorn Larsson, INFOVOX AB      :      ...seismo!mcvax!kth!sunic!infovax!bl
 Box 2503                       :         bl@infovox.se
 S-171 02 Solna, Sweden         :         Phone (+46) 8 735 80 90

maart@cs.vu.nl (Maarten Litmaath) (09/16/89)

bl@infovax.UUCP (Bj|rn Larsson) writes:
\...	*((long *) p)++		has the value of the *long* pointed
\				to by p, and the *long* pointer is
\				set to point to the next long (i.e.
\				if sizeof (long)== 2*sizeof(short)
\				then p will be incremented by two,
\				counted in short's).
\
\  Any objections? I could add that both Turbo C, MicroSoft C, and the
\MicroTek C 68000 cross-compiler compile this as I believe 'correctly'.

It seems all those compilers are wrong...!
Now where's that article I posted not too long ago to comp.lang.c?
Aha!  Here it is.
Allright.  A compiler that allows the abovementioned construct is wrong for
two reasons:

1)	It allows the `++' operator to be applied to an Rvalue expression;
	only Lvalue expressions may be operand of an increment operator, e.g.

		x++
		a[i]++

	This isn't so strange:

		x++
	
	is equivalent to

		x = x + 1

	which doesn't make sense for arbitrary (Rvalue) expressions.
2)	It increments the wrong variable; a cast is equivalent to an
	assignment to an invisible temporary variable (with the usual
	restrictions and conversions):

		foo	x;

		... (bar) x ...

	becomes

		foo	x;
		bar	cast_tmp;	/* `invisible' temp variable */

		... (cast_tmp = x) ...
	
	If we write the latter expression as

		(cast_tmp = x, cast_tmp)

	it's clearly `cast_tmp' which should be incremented (if a `++'
	operator is appended), were this possible at all; definitely NOT `x'.
-- 
   creat(2) shouldn't have been create(2): |Maarten Litmaath @ VU Amsterdam:
      it shouldn't have existed at all.    |maart@cs.vu.nl, mcvax!botter!maart

dupuy@cs.columbia.edu (Alexander Dupuy) (09/16/89)

In article <199@infovax.UUCP> bl@infovax.UUCP (Bj|rn Larsson) writes:
> 	*((long *) p)++		has the value of the *long* pointed
> 				to by p, and the *long* pointer is
> 				set to point to the next long (i.e.
> 				if sizeof (long)== 2*sizeof(short)
> 				then p will be incremented by two,
> 				counted in short's).
> 
>   Any objections? I could add that both Turbo C, MicroSoft C, and the
> MicroTek C 68000 cross-compiler compile this as I believe 'correctly'.

Not to contradict Maarten Litmath's analysis, which is entirely correct, but
there's a reason that those compilers accept this construct, and the Ultrix
and most other Vax compilers reject it - endianism.

On a big-endian machine like the 68000, the high order byte comes first, and
pointers to ints, shorts and chars all point to the high order byte, which is
the first byte of the int short or char.  All very convenient, and allows easy
and graceful punning from pointer type to pointer type, even if you bend the
rules of C.

On a little-endian machine like the VAX, the high order byte comes last, and
pointers to ints, shorts and chars all point to the high order byte, which,
since it is the last byte, varies depending on the size of the object.  Quoting
from the VAX architecture handbook, p. 33, "A word, two contiguous bytes,
starts on an arbitrary byte boundary... The bits are numbered from the right 0
through 15.  Words, longwords, quadwords and octawords are specified by their
address A, the address of the byte containing bit 0" (i.e. the last one).

So when you cast a (short *) to a (long *), you are in fact getting a pointer
to bytes which come before the (short *), rather than after, as you might
expect.  This causes the sort of type punning which works so nicely on
big-endian machines to fail miserably.  In order to save programmers from
having to track down such strange bugs, VAX compilers tend to be much stricter
about these sorts of things.

@alex
--
-- 
inet: dupuy@cs.columbia.edu
uucp: ...!rutgers!cs.columbia.edu!dupuy

ok@cs.mu.oz.au (Richard O'Keefe) (09/16/89)

In article <DUPUY.89Sep16000855@cs.cs.columbia.edu>, dupuy@cs.columbia.edu (Alexander Dupuy) writes:
> On a little-endian machine like the VAX, the high order byte comes last, and
> pointers to ints, shorts and chars all point to the high order byte, which,
					^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> since it is the last byte, varies depending on the size of the object.

False.  Pointers to char, short, and int on the VAX point to the LEAST
SIGNIFICANT BYTE (as the manual says, the byte containing bit 0, i.e.
the bit which represents 2**0).  If I have
	union { char c; short s; long l;} pun;
in a VAX C program, the memory layout is
	[pun+3] [pun+2] [pun+1] [pun+0] this is not C pointer addition!
	filler	filler	filler  c(all)
	filler  filler  s(msb)  s(lsb)
        l(msb)  l(b 2)  l(b 1)  l(lsb)
	<----------------------------- addresses increase right to left
so &pun, &pun.c, &pun.s, and &pun.l would all have the same address.
The beautiful thing about this is that the least significant bytes are
lined up, so that if I do pun.l = 0; pun.c = 27 then pun.l will be 27,
not some strange scrambling (C in general says _nothing_ about what will
happen in this case).

chris@mimsy.UUCP (Chris Torek) (09/17/89)

In article <DUPUY.89Sep16000855@cs.cs.columbia.edu> dupuy@cs.columbia.edu
(Alexander Dupuy) writes:
>... [a] big-endian machine ... allows easy and graceful punning from
>pointer type to pointer type, even if you bend the rules of C.
>... a little-endian machine [does not]

This is false, and indeed exactly backwards.  Both machines allow
easy puns from one pointer type to another, because all pointer types
have the same size and format.  But only little-endian machines can
get away with puns on data types.  (A type pun is a `conversion'
that consists of simply pretending a variable has a new type.  C
compilers can use puns if it generates correct machine code using
fewer instructions than would a true conversion.)

>[on a VAX] pointers to ints, shorts and chars all point to the high
>order byte

No, pointers point to the numerically lowest byte, which, on a VAX, is
least significant byte.  On a 680x0, it is the most significant byte.
We have to use the `Chinese writing method' to avoid confusion:

	location:	byte:
	   000c		 12
	   000d		 34
	   000e		 56
	   000f		 78

Value at location 000c:
	type:		on vax:		on 680x0:
	char		      12	      12
	short		    3412	    1234
	long		78563412	12345678

In expressions like `*(long *)p', both machines simply `fetch a longword
from the address', without really caring what was originally stored at
that address.  The trick comes in dealing with extended and narrowed
objects.  In C, parameters to (non-prototyped) functions are widened,
so that `char' and `short' both become `int'.  On a VAX, when one writes

	f(x) char x; { char *p = &x; ... }	

one gets a pun from the actual parameter (which has been `sent in' on
the stack as an int) to the desired parameter (a char), while on the
680x0, one gets a conversion.  In code:

	_f:	.word	0		# (vax)
	# ap+4 points to one longword holding the widened value of x
	# in memory we have (if x=='!'):
	#
	#	4(ap)	21
	#	5(ap)	00
	#	6(ap)	00
	#	7(ap)	00
	#
		movab	4(ap),r0	# p = &x

but

	_f:	link	a6,#0
	| a6+8 points to one longword holding the widened value of x
	| in memory we have (if x=='!'):
	|
	|	a6@(8)	00
	|	a6@(9)	00
	|	a6@(0a)	00
	|	a6@(0b)	00
	|
		lea	a6@(0b),a0	| p = &x

which (inside the compiler) required a conversion to go from `a6@(8)'
to `a6@(b)'.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chris@mimsy.UUCP (Chris Torek) (09/17/89)

In article <19626@mimsy.UUCP> I typed:
>	| in memory we have (if x=='!'):
>	|	a6@(8)	00
>	|	a6@(9)	00
>	|	a6@(0a)	00
>	|	a6@(0b)	00

Oops, too many `00's: a6@(0b) should be `21' (ASCII `!').

(Thanks to Tim Shepard at MIT for noticing this.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/17/89)

In article <199@infovax.UUCP> bl@infovax.UUCP (Bj|rn Larsson) writes:
>test program causes cc to complain about the left-hand side of the marked
>expression.
>	  l = *((long *) p)++;	/* <<< Here! */

I would expect the compiler to complain about the RIGHT-hand side.
You're not permitted to apply ++ to an rvalue (which is what you
have after the cast is applied)!

news@bbn.COM (News system owner ID) (09/18/89)

< ....  In C, parameters to (non-prototyped) functions are widened,
< so that `char' and `short' both become `int'.  On a VAX, when one writes
< 
< 	f(x) char x; { char *p = &x; ... }	
< 
< one gets a pun from the actual parameter (which has been `sent in' on
< the stack as an int) to the desired parameter (a char), while on the
< 680x0, one gets a conversion.  ...
< -- 
< In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
< Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

I would add one further though to this: avoid expressions like the
above f(x) like the plague.  This will, regretably, work on Vax,
68xxx, etc. "normal processors", but will fail miserably on machines
like a Pyramid where the first N arguments to a function are placed in
a register window.  Just go and try to take the address of a register
:-(.

I found out about this horridness the hard way, by dealing with the
memory allocator of csh, which assumes that it can correctly tell what
was really allocated and what wasn't, so the user can blithly xfree()
anything (even static strings), and xfree() can catch the "wrong
ones".  Ugh, feh, ick!

Every once in a while, I get the urge to hack gcc to produce correct,
but uncommon, code (like a pointer to a structure points to the _end_
of it, and all other elements are negitive offsets), just to make code
fail.

Don't take the address of a paramater, please.

		-- Paul Placeway
		   <pplaceway@bbn.com>, <paul@cis.ohio-state.edu>

rsalz@bbn.com (Rich Salz) (09/19/89)

In <45717@bbn.COM> pplaceway@izar.bbn.com (Paul W. Placeway) writes:
>Don't take the address of a paramater, please.
Any compiler which doesn't let me take the address of a parameter is
severely broken.  Even Pyramids and SPARC's let you do it:  it's the
compiler's responsibility to copy the param into stack space if
necessary.

Follow-ups to comp.lang.c, I guess.
	/r$
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.
Use a domain-based address or give alternate paths, or you may lose out.

gwyn@smoke.BRL.MIL (Doug Gwyn) (09/20/89)

In article <45717@bbn.COM> pplaceway@izar.bbn.com (Paul W. Placeway) writes:
>Don't take the address of a paramater, please.

There have been bugs in this in several implementations, but it's legal C.
Avoiding it is more a matter of maximizing portability across flaky
implementations.