msb@sq.UUCP (01/16/87)
Richard Stallman says [in effect]: > I am not sure whether the standard implies that, given "short in, out;", > > { char *inptr, *outptr; int i; > inptr = (char *) ∈ outptr = (char *) &out; > for (i = 0; i < sizeof (short); i++) outptr[i] = inptr[i]; } > > is defined and equivalent to "out = in;". and Doug Gwyn replies: $ No, this can't be guaranteed. For example, there may be bits $ in the short that are not covered by its chars. I'm pretty sure this is wrong. The draft proposed standard says: * Section 1.5, page 2, lines 33-34 and 38-40: # Byte - the unit of data storage in the execution environment large # enough to hold a single character in the character set of the execution # environment. ... # Except for bit-fields, objects are composed of contiguous sequences # of one or more bytes, the number, order, and encoding of which are # implementation-defined (except where explicitly specified). That rules out the interpretation Doug gives, and the following seems to me to lock in the fact that the above code segments ARE equivalent (barring interrupts): * Section 3.1.2.5, page 20, lines 47-48: # An object declared as a character (char) is large enough to store # any member of the execution character set. ... * Section 3.3.3.4, page 38, lines 13-16: # The sizeof operator yields the size (in bytes) of its operand, which # may be an expression or the parenthesized name of a type. # # When applied to an operand that has type char, unsigned char, or # signed char, the result is 1. ... Of course, this means that the draft proposed standard disallows any implementation using, say, 7-bit chars and 36-bit ints, which might be desirable on DECsystem-10's; but I think it is reasonable to do so, just as it is reasonable to disallow non-binary machines. Too much C assumes the underlying model stated in section 1.5 to proceed otherwise. Mark Brader "I'm not a lawyer, but I'm pedantic and that's just as good." utzoo!sq!msb -- D Gary Grady
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/17/87)
In article <1987Jan15.215225.9688@sq.uucp> msb@sq.UUCP (Mark Brader) writes: >Richard Stallman says [in effect]: >> I am not sure whether the standard implies that, given "short in, out;", >> { char *inptr, *outptr; int i; >> inptr = (char *) ∈ outptr = (char *) &out; >> for (i = 0; i < sizeof (short); i++) outptr[i] = inptr[i]; } >> is defined and equivalent to "out = in;". >and Doug Gwyn replies: >$ No, this can't be guaranteed. For example, there may be bits >$ in the short that are not covered by its chars. >I'm pretty sure this is wrong. The draft proposed standard says: Mark is, I think, correct in his assessment of the nature of bytes in the X3J11 model of C objects. However, I had something else in mind but due to interruptions while preparing my response I didn't get it worded correctly. (The extra bits I had in mind were tag bits; see below for a corrected version.) I'll try again.. The things that prevent RMS's approach from working portably are: The semantics of "(char *) &object" aren't guaranteed to produce anything that can be safely dereferenced to access a char. The only guarantee is that the opposite conversion can be made subsequently without losing information. This can be an issue for machines that don't support byte addressing; to keep pointer arithmetic simple, the high-order bits of a pointer may indicate the size of its dereferenced type; in such a case, if the cast is merely a word transfer without the bits being shifted and otherwise rearranged, the cast (char *) does not produce a useful address. Even if the resulting char pointer designates a char, it might not be the char that one would guess. On "little endian" machines it probably would be, but there may be "big endian" byte-addressed architectures where the numeric address of a word is not the lowest-valued address of the bytes within the word; in this case the loop in the example would copy the wrong collection of bytes (assuming again that the cast is implemented as a simple word transfer without being rearranged specifically to make such examples work, which would involve additional overhead). In a tagged architecture, the pointed-at object may not be referenced as the wrong type without causing a machine trap. In general, I believe X3J11 intended to strongly discourage ANY reliance on "type punning". P.S. Upon re-reading 3.3.4 Semantics, I see that RMS and I interpreted the use of the word "may" differently. Comparison with other sections of the document now leads me to believe that RMS was probably correct in thinking that pointer<->integer conversion via casts MUST be supported by a conforming implementation, although enough is left "implementation-defined" that an implementation could choose to make this a useless operation. This means that some restriction on use of externs in initializers really is necessary (to prevent having to support complete C-arithmetic in linkers) if the typical implementation is to give useful meaning to such conversions. This deficiency in the draft standard needs to be fixed.
gwyn@brl-smoke.ARPA (Doug Gwyn ) (01/18/87)
> The semantics of "(char *) &object" aren't guaranteed to produce anything > that can be safely dereferenced to access a char. > ... there may be "big endian" byte-addressed architectures where the > numeric address of a word is not the lowest-valued address of the bytes > within the word ... I occurs to me that X3J11 needs to add a guarantee that at least a cast to (void *) results in something representing the lowest-valued address of any byte in the object pointed at by whatever pointer is being converted; otherwise what good are the mem*() functions? Some of the issues raised by RMS are deeper than I at first realized.
brett@wjvax.UUCP (Brett Galloway) (01/21/87)
In article <5536@brl-smoke.ARPA> gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: >> The semantics of "(char *) &object" aren't guaranteed to produce anything >> that can be safely dereferenced to access a char. >> ... there may be "big endian" byte-addressed architectures where the >> numeric address of a word is not the lowest-valued address of the bytes >> within the word ... > >I occurs to me that X3J11 needs to add a guarantee that at least a cast >to (void *) results in something representing the lowest-valued address >of any byte in the object pointed at by whatever pointer is being >converted; otherwise what good are the mem*() functions? > >Some of the issues raised by RMS are deeper than I at first realized. I agree. It seems odd, though, that (void *) and (char *) would behave differently. I know that there is a lot of existing code that assumes that (char *) behaves this way. This assumption is necessary because (void *) doesn't exist, and bcopy (or memc?py) on (char *)&foo is too useful. Another example is writing data to a file -- how could one ever write anything but a character string to a file? For example, to write an object short foo; to a file, one must do something like fwrite((char *)&foo,sizeof(short),1,file) at least in 4.2BSD. In order to maintain this ability, it must be possible to obtain the "numeric address of an object which is the lowest-valued address of the bytes within the object." One could make (void *) this object, but that is still not correct -- fwrite needs to dereference the pointer to the object to get characters, not "voids". I suppose one could do (char *)(void *)(&foo), but that is ugly. -- ------------- Brett Galloway {pesnta,twg,ios,qubix,turtlevax,tymix,vecpyr,certes,isi}!wjvax!brett
gwyn@brl-smoke.UUCP (01/24/87)
In article <812@wjvax.wjvax.UUCP> brett@wjvax.UUCP (Brett Galloway) writes: >I agree. It seems odd, though, that (void *) and (char *) would behave >differently. I know that there is a lot of existing code that assumes >that (char *) behaves this way. This assumption is necessary because >(void *) doesn't exist, and bcopy (or memc?py) on (char *)&foo is too >useful. Another example is writing data to a file -- how could one ever >write anything but a character string to a file? For example, >to write an object > short foo; >to a file, one must do something like > fwrite((char *)&foo,sizeof(short),1,file) >at least in 4.2BSD. In order to maintain this ability, it must be >possible to obtain the "numeric address of an object which is the >lowest-valued address of the bytes within the object." One could make >(void *) this object, but that is still not correct -- fwrite needs to >dereference the pointer to the object to get characters, not "voids". >I suppose one could do (char *)(void *)(&foo), but that is ugly. Two points: Actually I agree with the gist of your comments. I briefly checked this with Larry Rosler (the X3J11 Redactor) at USENIX, and my impression of the outcome of our discussion is that X3J11 certainly intends that the conversion (char *) produces the address of the lowest-addressed char of the original referenced object. However, I wasn't able to find an explicit guarantee of this in the draft proposed standard. This seems like an oversight that needs to be remedied. You should also realize that I am a proponent of a modification to the standard to support chars that are more than one "byte" (basic storage object accessible unit). When I was drafting my proposal on this issue, I observed the interesting phenomenon that all the void * parameters in the draft referred to basic storage units and all the char *s were used for actual character text, except for fread/fwrite which I think need to be restricted to binary objects (basic storage units). (I would appreciate hearing of any significant use of these with textual data.) Since void * plays the role of a "lowest-common-denominator" (i.e., generic) pointer, it is appropriate for it to have magical properties, but if we establish the common idea of how chars (or other byte-sized objects, in my scheme) are packed inside objects, there is then no need for void * to be singled out specially in this regard.