bl@infovax.UUCP (Bj|rn Larsson) (09/15/89)
Hello netters, yesterday I came upon a strange behaviour in Ultrix 'C'. The following test program causes cc to complain about the left-hand side of the marked expression. main() { short *p; tst(p); } tst (p) short *p; { long l; l = *((long *) p)++; /* <<< Here! */ } In my view, p is a pointer to short, (long *) p is a pointer to long, ((long *) p) is also a pointer to long, ((long *) p)++ increments the above *long* pointer, *((long *) p) is the long pointed to, thus *((long *) p)++ has the value of the *long* pointed to by p, and the *long* pointer is set to point to the next long (i.e. if sizeof (long)== 2*sizeof(short) then p will be incremented by two, counted in short's). Any objections? I could add that both Turbo C, MicroSoft C, and the MicroTek C 68000 cross-compiler compile this as I believe 'correctly'. And this is no problem, I fixed it at another place - but it's inte- resting to hear what yous C compilers do... Bjorn -- ====================== InfoVox = Speech Technology ======================= Bjorn Larsson, INFOVOX AB : ...seismo!mcvax!kth!sunic!infovax!bl Box 2503 : bl@infovox.se S-171 02 Solna, Sweden : Phone (+46) 8 735 80 90
maart@cs.vu.nl (Maarten Litmaath) (09/16/89)
bl@infovax.UUCP (Bj|rn Larsson) writes:
\... *((long *) p)++ has the value of the *long* pointed
\ to by p, and the *long* pointer is
\ set to point to the next long (i.e.
\ if sizeof (long)== 2*sizeof(short)
\ then p will be incremented by two,
\ counted in short's).
\
\ Any objections? I could add that both Turbo C, MicroSoft C, and the
\MicroTek C 68000 cross-compiler compile this as I believe 'correctly'.
It seems all those compilers are wrong...!
Now where's that article I posted not too long ago to comp.lang.c?
Aha! Here it is.
Allright. A compiler that allows the abovementioned construct is wrong for
two reasons:
1) It allows the `++' operator to be applied to an Rvalue expression;
only Lvalue expressions may be operand of an increment operator, e.g.
x++
a[i]++
This isn't so strange:
x++
is equivalent to
x = x + 1
which doesn't make sense for arbitrary (Rvalue) expressions.
2) It increments the wrong variable; a cast is equivalent to an
assignment to an invisible temporary variable (with the usual
restrictions and conversions):
foo x;
... (bar) x ...
becomes
foo x;
bar cast_tmp; /* `invisible' temp variable */
... (cast_tmp = x) ...
If we write the latter expression as
(cast_tmp = x, cast_tmp)
it's clearly `cast_tmp' which should be incremented (if a `++'
operator is appended), were this possible at all; definitely NOT `x'.
--
creat(2) shouldn't have been create(2): |Maarten Litmaath @ VU Amsterdam:
it shouldn't have existed at all. |maart@cs.vu.nl, mcvax!botter!maart
dupuy@cs.columbia.edu (Alexander Dupuy) (09/16/89)
In article <199@infovax.UUCP> bl@infovax.UUCP (Bj|rn Larsson) writes: > *((long *) p)++ has the value of the *long* pointed > to by p, and the *long* pointer is > set to point to the next long (i.e. > if sizeof (long)== 2*sizeof(short) > then p will be incremented by two, > counted in short's). > > Any objections? I could add that both Turbo C, MicroSoft C, and the > MicroTek C 68000 cross-compiler compile this as I believe 'correctly'. Not to contradict Maarten Litmath's analysis, which is entirely correct, but there's a reason that those compilers accept this construct, and the Ultrix and most other Vax compilers reject it - endianism. On a big-endian machine like the 68000, the high order byte comes first, and pointers to ints, shorts and chars all point to the high order byte, which is the first byte of the int short or char. All very convenient, and allows easy and graceful punning from pointer type to pointer type, even if you bend the rules of C. On a little-endian machine like the VAX, the high order byte comes last, and pointers to ints, shorts and chars all point to the high order byte, which, since it is the last byte, varies depending on the size of the object. Quoting from the VAX architecture handbook, p. 33, "A word, two contiguous bytes, starts on an arbitrary byte boundary... The bits are numbered from the right 0 through 15. Words, longwords, quadwords and octawords are specified by their address A, the address of the byte containing bit 0" (i.e. the last one). So when you cast a (short *) to a (long *), you are in fact getting a pointer to bytes which come before the (short *), rather than after, as you might expect. This causes the sort of type punning which works so nicely on big-endian machines to fail miserably. In order to save programmers from having to track down such strange bugs, VAX compilers tend to be much stricter about these sorts of things. @alex -- -- inet: dupuy@cs.columbia.edu uucp: ...!rutgers!cs.columbia.edu!dupuy
ok@cs.mu.oz.au (Richard O'Keefe) (09/16/89)
In article <DUPUY.89Sep16000855@cs.cs.columbia.edu>, dupuy@cs.columbia.edu (Alexander Dupuy) writes: > On a little-endian machine like the VAX, the high order byte comes last, and > pointers to ints, shorts and chars all point to the high order byte, which, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > since it is the last byte, varies depending on the size of the object. False. Pointers to char, short, and int on the VAX point to the LEAST SIGNIFICANT BYTE (as the manual says, the byte containing bit 0, i.e. the bit which represents 2**0). If I have union { char c; short s; long l;} pun; in a VAX C program, the memory layout is [pun+3] [pun+2] [pun+1] [pun+0] this is not C pointer addition! filler filler filler c(all) filler filler s(msb) s(lsb) l(msb) l(b 2) l(b 1) l(lsb) <----------------------------- addresses increase right to left so &pun, &pun.c, &pun.s, and &pun.l would all have the same address. The beautiful thing about this is that the least significant bytes are lined up, so that if I do pun.l = 0; pun.c = 27 then pun.l will be 27, not some strange scrambling (C in general says _nothing_ about what will happen in this case).
chris@mimsy.UUCP (Chris Torek) (09/17/89)
In article <DUPUY.89Sep16000855@cs.cs.columbia.edu> dupuy@cs.columbia.edu (Alexander Dupuy) writes: >... [a] big-endian machine ... allows easy and graceful punning from >pointer type to pointer type, even if you bend the rules of C. >... a little-endian machine [does not] This is false, and indeed exactly backwards. Both machines allow easy puns from one pointer type to another, because all pointer types have the same size and format. But only little-endian machines can get away with puns on data types. (A type pun is a `conversion' that consists of simply pretending a variable has a new type. C compilers can use puns if it generates correct machine code using fewer instructions than would a true conversion.) >[on a VAX] pointers to ints, shorts and chars all point to the high >order byte No, pointers point to the numerically lowest byte, which, on a VAX, is least significant byte. On a 680x0, it is the most significant byte. We have to use the `Chinese writing method' to avoid confusion: location: byte: 000c 12 000d 34 000e 56 000f 78 Value at location 000c: type: on vax: on 680x0: char 12 12 short 3412 1234 long 78563412 12345678 In expressions like `*(long *)p', both machines simply `fetch a longword from the address', without really caring what was originally stored at that address. The trick comes in dealing with extended and narrowed objects. In C, parameters to (non-prototyped) functions are widened, so that `char' and `short' both become `int'. On a VAX, when one writes f(x) char x; { char *p = &x; ... } one gets a pun from the actual parameter (which has been `sent in' on the stack as an int) to the desired parameter (a char), while on the 680x0, one gets a conversion. In code: _f: .word 0 # (vax) # ap+4 points to one longword holding the widened value of x # in memory we have (if x=='!'): # # 4(ap) 21 # 5(ap) 00 # 6(ap) 00 # 7(ap) 00 # movab 4(ap),r0 # p = &x but _f: link a6,#0 | a6+8 points to one longword holding the widened value of x | in memory we have (if x=='!'): | | a6@(8) 00 | a6@(9) 00 | a6@(0a) 00 | a6@(0b) 00 | lea a6@(0b),a0 | p = &x which (inside the compiler) required a conversion to go from `a6@(8)' to `a6@(b)'. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
chris@mimsy.UUCP (Chris Torek) (09/17/89)
In article <19626@mimsy.UUCP> I typed: > | in memory we have (if x=='!'): > | a6@(8) 00 > | a6@(9) 00 > | a6@(0a) 00 > | a6@(0b) 00 Oops, too many `00's: a6@(0b) should be `21' (ASCII `!'). (Thanks to Tim Shepard at MIT for noticing this.) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
gwyn@smoke.BRL.MIL (Doug Gwyn) (09/17/89)
In article <199@infovax.UUCP> bl@infovax.UUCP (Bj|rn Larsson) writes: >test program causes cc to complain about the left-hand side of the marked >expression. > l = *((long *) p)++; /* <<< Here! */ I would expect the compiler to complain about the RIGHT-hand side. You're not permitted to apply ++ to an rvalue (which is what you have after the cast is applied)!
news@bbn.COM (News system owner ID) (09/18/89)
< .... In C, parameters to (non-prototyped) functions are widened, < so that `char' and `short' both become `int'. On a VAX, when one writes < < f(x) char x; { char *p = &x; ... } < < one gets a pun from the actual parameter (which has been `sent in' on < the stack as an int) to the desired parameter (a char), while on the < 680x0, one gets a conversion. ... < -- < In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) < Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris I would add one further though to this: avoid expressions like the above f(x) like the plague. This will, regretably, work on Vax, 68xxx, etc. "normal processors", but will fail miserably on machines like a Pyramid where the first N arguments to a function are placed in a register window. Just go and try to take the address of a register :-(. I found out about this horridness the hard way, by dealing with the memory allocator of csh, which assumes that it can correctly tell what was really allocated and what wasn't, so the user can blithly xfree() anything (even static strings), and xfree() can catch the "wrong ones". Ugh, feh, ick! Every once in a while, I get the urge to hack gcc to produce correct, but uncommon, code (like a pointer to a structure points to the _end_ of it, and all other elements are negitive offsets), just to make code fail. Don't take the address of a paramater, please. -- Paul Placeway <pplaceway@bbn.com>, <paul@cis.ohio-state.edu>
rsalz@bbn.com (Rich Salz) (09/19/89)
In <45717@bbn.COM> pplaceway@izar.bbn.com (Paul W. Placeway) writes: >Don't take the address of a paramater, please. Any compiler which doesn't let me take the address of a parameter is severely broken. Even Pyramids and SPARC's let you do it: it's the compiler's responsibility to copy the param into stack space if necessary. Follow-ups to comp.lang.c, I guess. /r$ -- Please send comp.sources.unix-related mail to rsalz@uunet.uu.net. Use a domain-based address or give alternate paths, or you may lose out.
gwyn@smoke.BRL.MIL (Doug Gwyn) (09/20/89)
In article <45717@bbn.COM> pplaceway@izar.bbn.com (Paul W. Placeway) writes: >Don't take the address of a paramater, please. There have been bugs in this in several implementations, but it's legal C. Avoiding it is more a matter of maximizing portability across flaky implementations.