[comp.lang.c] bytes don't fill words

rbutterworth@watmath.UUCP (01/23/87)

If it doesn't do so already, the ANSI C standard should explicitly
state the following:

1) BITS_PER_WORD%BITS_PER_BYTE need not necessarily be 0.

2) Given TypeA and TypeB, such that sizeof(TypeA) > sizeof(TypeB)
and sizeof(TypeA) % sizeof(TypeB) is 0, then for
union {TypeA a; TypeB b[sizeof(TypeA)/sizeof(TypeB)];} u;
assigning values to all the individual members of array u.b will
in some well-defined implementation-defined manner assign values
to all the bits of u.a (but not necessarily of u itself).

3) The behaviour of functions memcmp(), memcpy(), etc. is
undefined if the two arguments are not pointing at similarly
aligned data.

I realize that the last two are obvious given the first, but it
took me a long time to realize that they were obvious and it would
be nice if the standard explictly pointed them out.

ballou@brahms.Berkeley.EDU.UUCP (01/25/87)

In article <4603@watmath.UUCP> rbutterworth@watmath.UUCP (Ray Butterworth) writes:
>If it doesn't do so already, the ANSI C standard should explicitly
>state the following:
>
>1) BITS_PER_WORD%BITS_PER_BYTE need not necessarily be 0.

	I don't see how this can be, in view of section 1.5.  The last
sentence of the definition of "byte" reads:

	Except for bit-fields, objects are composed of contiguous
	sequences of one or more bytes, the number, order, and
	encoding of which are implementation-defined.

>3) The behaviour of functions memcmp(), memcpy(), etc. is
>undefined if the two arguments are not pointing at similarly
>aligned data.

	Again, I am missing something here.  Doesn't the
requirement that "It shall be possible to express the address
of each individual byte of an object uniquely" take care of
this?



--------
Kenneth R. Ballou			ARPA:  ballou@brahms.berkeley.edu
Department of Mathematics		UUCP:  ...!ucbvax!brahms!ballou
University of California
Berkeley, California  94720

drw@cullvax.UUCP (01/26/87)

rbutterworth@watmath.UUCP (Ray Butterworth) writes:
> If it doesn't do so already, the ANSI C standard should explicitly
> state the following:
> 
> 1) BITS_PER_WORD%BITS_PER_BYTE need not necessarily be 0.
> 
> 2) Given TypeA and TypeB, such that sizeof(TypeA) > sizeof(TypeB)
> and sizeof(TypeA) % sizeof(TypeB) is 0, then for
> union {TypeA a; TypeB b[sizeof(TypeA)/sizeof(TypeB)];} u;
> assigning values to all the individual members of array u.b will
> in some well-defined implementation-defined manner assign values
> to all the bits of u.a (but not necessarily of u itself).

Not necessarily!  The bit-pattern in 'a' may not be a valid value for
its type.  The crux is "will ... assign values to all the bits of
u.a".  It may not wind up assigning a processable value to it.
(Consider that TypeB is 'char' and TypeA is 'float', and that the
implementation assigns floats by loading and storing a float register,
and that invalid float formats cause the processor to trap...)

Also, suppose that although TypeB is considered to have, say, 8 bits,
only 7 of them are actually used in any operation.  Suppose the
implementation exploits this fact--it doesn't bother setting the 8th
bit of the location when it is inefficient to do so.  Then, there is
no guarantee that all of the bits of 'u.a' will be set.

The moral of this is that point 2 isn't implied by point 1--it is a
complex and subtle constraint on the implementation that should
probably not be made.

Dale

-- 
Dale Worley		Cullinet Software
UUCP: ...!seismo!harvard!mit-eddie!cullvax!drw
ARPA: cullvax!drw@eddie.mit.edu

msb@sq.UUCP (01/26/87)

Ray Butterworth says:
> If it doesn't do so already, the ANSI C standard should explicitly
> state the following:
> 
> 1) BITS_PER_WORD%BITS_PER_BYTE need not necessarily be 0.

On the contrary, and contrary (it seems to me) to Doug Gwyn's earlier
posting (regarding byte-by-byte copying of objects), the Draft states
at section 1.5, page 2, lines 38-39 that:

# Except for bit-fields, objects are composed of contiguous sequences
# of one or more bytes ...

I claim that this is indeed desirable behavior, because it allows
functions such as memcpy() to work predictably.  Certainly it clashes
with architectures like the DECsystem-10's 7-bit chars in a 36-bit word,
but them's the breaks; I suspect that there are no C's now on such machines.

This is rather similar to an article I posted just before USENIX;
I am posting again in case the other one expired before Doug and other
interested parties got back from there.

Mark Brader

gwyn@brl-smoke.UUCP (01/28/87)

>>1) BITS_PER_WORD%BITS_PER_BYTE need not necessarily be 0.

I am afraid this confusion is due to my mistake in an earlier posting.
Early in the draft proposed standard we guarantee that all objects are
collections of bytes.  This may mean that a byte has to be bigger than
one might want; for example on the (hypothetical) MINSK-3B the word
size might be 17 bits, which is prime, which means the smallest C
object (currently, a char) would have to be some multiple (1) of 17 bits.
It isn't allowed to make chars 8 bits on the MINSK-3B.  Sorry.

>>3) The behaviour of functions memcmp(), memcpy(), etc. is
>>undefined if the two arguments are not pointing at similarly
>>aligned data.

I believe the draft proposed standard needs to be beefed up a bit
to guarantee that a pointer conversion produce pointer to the
"lowest" portion of the object pointed to by the original pointer
(the one that is being operated on to produce a new pointer).
With this guarantee, the mem*() functions are safe.  However,
implementors should note that there can't be any "holes" in structs
that are unsafe to touch (i.e., no access violation for touching
uninitialized locations in the holes).

gwyn@brl-smoke.UUCP (01/28/87)

In article <1987Jan26.132911.9733@sq.uucp> msb@sq.UUCP (Mark Brader) writes:
># Except for bit-fields, objects are composed of contiguous sequences
># of one or more bytes ...

Sorry if people are getting tired of this, but since I posted the
original incorrect info (now you see why the "not an X3J11 position"
disclaimers!), I would like to limit the damage done:

Yes, all C objects (other than bit-fields) are composed of a
contiguous sequence of bytes, with no bits left over.

I should know this better than anyone, since I'm the proponent of
a clean separation between "byte" and "character".  I can only
explain my earlier lapse as being due to inadequate proof-reading
of my response to RMS (the preparation of which had been interrupted
while I was working on that section) before I sent it out.

I apologize for misleading people.  OBJECTS ARE MADE OF BYTES.
(Except for those &)^^&!$%)&*@ bit-fields.  Can we get rid of bit-fields?)