[comp.lang.c] Do chars always fill words

rbutterworth@watmath.UUCP (01/27/87)
 > From: msb@sq.UUCP
 > > 1) BITS_PER_WORD%BITS_PER_BYTE need not necessarily be 0.
 > On the contrary, and contrary (it seems to me) to Doug Gwyn's earlier
 > posting (regarding byte-by-byte copying of objects), the Draft states
 > at section 1.5, page 2, lines 38-39 that:
 > # Except for bit-fields, objects are composed of contiguous sequences
 > # of one or more bytes ...
 > 
 > I claim that this is indeed desirable behavior, because it allows
 > functions such as memcpy() to work predictably.  Certainly it clashes
 > with architectures like the DECsystem-10's 7-bit chars in a 36-bit word,
 > but them's the breaks; I suspect that there are no C's now on such machines.

In his "unofficial" X3J11 response to Richard Stallman's original
article, Doug Gwyn indicated that copying an item simply by copying
its chars cannot be guaranteed to work.  This is because of the
existence of such machines.

My response to this was a request that X3J11 explicitly indicate
either that such machines cannot support ANSI C, or that they can
and explicitly state two of the implications.


 > From: ballou@brahms.Berkeley.EDU (Kenneth R. Ballou)
 > > 1) BITS_PER_WORD%BITS_PER_BYTE need not necessarily be 0.
 > I don't see how this can be, in view of section 1.5.  The last
 > sentence of the definition of "byte" reads:
 >     Except for bit-fields, objects are composed of contiguous
 >     sequences of one or more bytes, the number, order, and
 >     encoding of which are implementation-defined.

Perhaps some of the confusion arises from confusing bytes with chars.
Does the standard state that all bytes are equal?  For instance
if chars are 10 bits, and words are 32 bits, there are two bits
left over somewhere.  These are not in any of the chars making up
the word, but they must be part (in some implementation defined
manner) of one or two of the bytes within the word.

Whether this is a correct way of looking at things or not, I don't
know.  I do know (I received several mail messages) that there are
machines where the number of bits in a character is not an exact
divisor of the number of bits in a word.


 > From: drw@cullvax.UUCP (Dale Worley)
 > > 2) Given TypeA and TypeB, such that sizeof(TypeA) > sizeof(TypeB)
 > > and sizeof(TypeA) % sizeof(TypeB) is 0, then for
 > > union {TypeA a; TypeB b[sizeof(TypeA)/sizeof(TypeB)];} u;
 > > assigning values to all the individual members of array u.b will
 > > in some well-defined implementation-defined manner assign values
 > > to all the bits of u.a (but not necessarily of u itself).
 > Not necessarily!  The bit-pattern in 'a' may not be a valid value for
 > its type.
I didn't say that the resulting value in 'a' would be meaningful.
My concern was that all of the bits be assigned a well defined value.
Consider the following example:
  strptr=malloc(100);
  strptr[0]=strptr[3]=strptr[6]='a';
  strptr[1]=strptr[4]=strptr[7]='b';
  strptr[2]=strptr[5]=strptr[8]='\0';
  if (memcmp(&strptr[0],&strptr[3],6)) oops();
The words pointed to by the value returned by malloc will contain
random garbage.  If assigning values to the individual characters
of the data doesn't set all the bits, memcmp() could easily find
that the two areas of memory are different.  The definition of
memcmp() has a footnote warning about holes in structures, but it
says nothing about holes in words.  Either it should say something
about such holes, or the standard should prevent the existence of
such holes (by making the values of these holes well defined, as
I suggested above), and give the warning in my point 3 that functions
such as fread(), memcmp(), and memcpy() only work reliably on similarly
aligned data.

 > Also, suppose that although TypeB is considered to have, say, 8 bits,
 > only 7 of them are actually used in any operation.  Suppose the
 > implementation exploits this fact--it doesn't bother setting the 8th
 > bit of the location when it is inefficient to do so.  Then, there is
 > no guarantee that all of the bits of 'u.a' will be set.
Exactly.  That's why the standard must mandate this behaviour.

 > The moral of this is that point 2 isn't implied by point 1--it is a
 > complex and subtle constraint on the implementation that should
 > probably not be made.
It is implied by point 1, only if one wants functions such as memcmp()
to behave predictably.