[comp.std.c] Alignment

msb@sq.sq.com (Mark Brader) (08/08/90)

A few weeks ago I wrote:
! Note that alignment is a function of the type only, and it isn't
! permitted for a type to have an alignment requirement larger than its
! own size -- e.g., chars couldn't be required to be on even addresses --
! because elements of an array are guaranteed to be adjacent (in 3.1.2.5).

Doug Gwyn replied:
> Chars could, however, in theory be constrained to even MACHINE addresses,
> although consecutive char* values in C programs would still be required
> to index consecutive char objects (each of which would, in such a case,
> occupy two machine storage units).

1.6 says that any object is a contiguous sequence of bytes, each of which
is individually addressable.  3.3.3.4 forces the size of type char to be
exactly 1 byte.  

If Doug is speaking of a machine with addressing in units smaller than
bytes, then yes, I agree it would be possible for chars to be constrained
to even machine addresses in that sense.  This doesn't contradict what
I said, because I was speaking of byte addresses.

But if Doug's "machine storage units" are bytes, then I think he's wrong.
The relevant excerpt from 3.1.2.5 is:

#  An "array type" describes a contiguously allocated nonempty set of
#  objects with a particular ... type, called the "element type".

I can't take "contiguously allocated" to mean anything else but that
the object declared by "char y[4];" occupies exactly 4 bytes, which have
consecutive addresses; sizeof y must be 4.  This interpretation is
confirmed by the footnote to 3.3.6, and by the last example in 3.3.3.4.
But, then, y[0] and y[1] are char objects not both on even addresses.

Doug, did I miss something, or am I not interpreting you correctly?


In subsequent discussion, Bob Larson noted that:
| Prime's C compiler stores char variables in the right half of a 2-byte
| "halfword", and character arrays are packed and start at the left. ...
| As far as I know, they havn't changed this behavor in their beta-test
| ansi compiler. ...
| union {
|     struct { char a, b, c, d;} x;
|     char y[4];
| }
| Code that assumes that x.a is the same as y[0] and x.d is y[3] will not
| work on Prime's C compiler.  Note that &x.a+1 != &x.b here.
[The last line was corrected by me.]

This is a different situation from what Doug describes; here y works as
I say above.  According to what I said before, such a compiler is non-
conforming with respect to the layout of x.  On rereading the relevant
section, however, I will soften only that to say that I *think* it's
non-conforming.

I'll repeat the wording that I quoted before, since it's probably
expired on most machines.

1.6 defines "alignment" as "a requirement that objects of a particular
type be located on storage boundaries with addresses that are particular
multiples of a byte address".

3.5.2.1 says that there may be padding within, or at the end of, a structure
"AS NECESSARY to achieve the appropriate alignment" -- my emphasis --
and mentions no other reason why there could be padding.

I argue that, if alignment is a requirment that can only be imposed on a
type, then this means that objects of the same type declared as consecutive
members of a struct can't have padding between them.

However, there is another sentence in 3.5.2.1 that gives me pause.  It
appears a little before the part dealing with padding, and says:

# Each non-bit-field member of a structure or union object is aligned
# in an implementation-defined manner appropriate to its type.

If 1.6 did not define "alignment" as it does, the last-quoted sentence
might be taken to be saying that struct members can have additional
alignment requirements beyond those imposed by the type, and I can see
that an interpretation ruling might say that it DOES have that meaning.
I do think we need a ruling on this.

-- 
Mark Brader			"Relax -- I know the procedures backwards."
SoftQuad Inc., Toronto		"Yeah, well, that's a quick way to get killed."
utzoo!sq!msb, msb@sq.com			-- Chris Boucher, Star Cops

This article is in the public domain.

richard@aiai.ed.ac.uk (Richard Tobin) (08/09/90)

>> Chars could, however, in theory be constrained to even MACHINE addresses,
>> although consecutive char* values in C programs would still be required
>> to index consecutive char objects (each of which would, in such a case,
>> occupy two machine storage units).

Indeed, it should be possible to deal with arbitrary machine
constraints by making C's pointers sufficiently unrelated to machine
addresses.  As an extreme, you could have C pointers just be integers,
and have tables (or arbitrarily complicated procedures) to translate
them to and from machine addresses.  This might of course be rather
slow.

There is work being done on a C compiler for the KCM, a machine which
can only address 8-byte objects, only 4 bytes of which contain normal
data (the other four contain among other things a data type tag).
I'll be interested to see what it looks like.

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/09/90)

In article <1990Aug8.012908.28364@sq.sq.com> msb@sq.sq.com (Mark Brader) writes:
>A few weeks ago I wrote:
>! Note that alignment is a function of the type only, and it isn't
>! permitted for a type to have an alignment requirement larger than its
>! own size -- e.g., chars couldn't be required to be on even addresses --
>! because elements of an array are guaranteed to be adjacent (in 3.1.2.5).
>Doug Gwyn replied:
>> Chars could, however, in theory be constrained to even MACHINE addresses,
>> although consecutive char* values in C programs would still be required
>> to index consecutive char objects (each of which would, in such a case,
>> occupy two machine storage units).
>1.6 says that any object is a contiguous sequence of bytes, each of which
>is individually addressable.  3.3.3.4 forces the size of type char to be
>exactly 1 byte.  

But the "byte" in the C language need not correspond with a location
addressed by a unit of MACHINE address space.  For example, on a machine
that permits individual addressing of 8-bit bytes, a C implementation
could choose to allocate 16 bits per "char".  It is possible, although
not too likely, that quirks of the machine architecture would make this
the best choice.  (It is more likely that such a choice would be due to
a desire to avoid having to use the wchar_t kludgery to handle large
character sets, although PORTABLE programs would not be able to rely on
this feature for all implementations, alas.)

>If Doug is speaking of a machine with addressing in units smaller than
>bytes, then yes, I agree it would be possible for chars to be constrained
>to even machine addresses in that sense.  This doesn't contradict what
>I said, because I was speaking of byte addresses.

To be more specific, you were talking about the official C standard
meaning for "byte address", not the common usage of the term.  While
these will often be the same for many C implementations, they need not be.

You're right about the logical consequences of the standard constraints
involving arrays of char, sizeof(char)==1, etc.  Chars can have padding,
but every other C object must have size some integral multiple of the
size of a (padded) char.

>If 1.6 did not define "alignment" as it does, the last-quoted sentence
>might be taken to be saying that struct members can have additional
>alignment requirements beyond those imposed by the type, and I can see
>that an interpretation ruling might say that it DOES have that meaning.

Alignment is something that should not be defined too specifically.
For example, function arguments often have different requirements
imposed by the call stack than for ordinary statically-allocated storage.
(This was true of the original implementation of C, on the PDP-11.)
What is "necessary" can thus depend on various implementation choices.
Since these can change between releases of a compiler on the same system,
programs ought not to rely very much on the details of padding and
alignment; they should merely be written with the understanding that
padding and alignment constraints MAY affect storage layout, other than
those aspects for which the C standard insists be done a certain way.
One such requirement is that
	struct { type_t m; } s;
	assert((char *)&s == (char *)&s.m);
which can in fact be usefully exploited in legitimate applications.

jsdy@hadron.COM (Joseph S. D. Yao) (08/10/90)

In article <1990Aug8.012908.28364@sq.sq.com> msb@sq.sq.com (Mark Brader) writes:
>1.6 says that any object is a contiguous sequence of bytes, each of which
>is individually addressable.  3.3.3.4 forces the size of type char to be
>exactly 1 byte.  

...

>I can't take "contiguously allocated" to mean anything else but that
>the object declared by "char y[4];" occupies exactly 4 bytes, which have
>consecutive addresses; sizeof y must be 4.

In recent years, for some reason, people have been assuming that "byte"
means "eight bits" (bit = binary unit of information).  The more
general definition that I learned at the beginning of my introduction
to computers was that it was a group of [contiguous] bits.  This was
reinforced by the existence of byte-handling instructions on machines
like the now-venerable PDP-10 and its successors.  For these instruc-
tions, one had to specify not only a byte address, but also a starting
bit and a size (0-36).

I am sure that the ANSI standard has a reasonable definition of "byte",
probably the one that Doug gave ... from where I'm sitting, I just
can't see a copy of it ...

	Joe Yao				jsdy@hadron.COM
	( jsdy%hadron.COM@{uunet.UU.NET,decuac.DEC.COM} )
	arc,arinc,att,avatar,blkcat,cos,decuac,\
	dtix,ecogong,grebyn,inco,insight,kcwc,  \
	lepton,lsw,netex,netxcom,phw5,research,  >!hadron!jsdy
	rlgvax,seismo,sms,smsdpg,sundc,telenet, /
	uunet				       /
(Last I counted ...)

gwyn@smoke.BRL.MIL (Doug Gwyn) (08/11/90)

In article <904@hadron.COM> jsdy@hadron.UUCP (Joseph S. D. Yao) writes:
>I am sure that the ANSI standard has a reasonable definition of "byte",

The C standard defines "byte" and "char" as effectively synonymous;
although they have slightly different connotations, they occupy the
same space and "really" mean exactly the same thing.  There have to
be at least 8 bits in a char|byte, but it may be larger than that;
the implementor gets to decide on the exact size.  I suspect that
almost every implementation on machines that support direct
addressing of 8-bit bytes will make char|byte an 8-bit Yao-byte
(meaning: 8 contiguous bits).  The Yao definition for "byte" is what
I would consider correct for general usage, and it is consistent with
the usage in the C standard.  The notion that a byte has to be exactly
8 bits was probably due to the prevalence of 8-bit chunk-addressable
systems (System/360, PDP-11, Nova, VAX, 8080, 6800, ...) and to
ignorance of the existence of non-8-bit systems.  After all, 8 bits
is enough to represent any character with one bit left over for
parity, isn't it?  (Rhetorical question, the answer is "not really".)

msb@sq.sq.com (Mark Brader) (08/12/90)

> > 1.6 says that any object is a contiguous sequence of bytes, each of which
> > is individually addressable.  3.3.3.4 forces the size of type char to be
> > exactly 1 byte.  

> In recent years, for some reason, people have been assuming that "byte"
> means "eight bits" (bit = binary unit of information).

I wasn't.  I didn't quote the Standard's definition of "byte" because
it didn't seem relevant to the alignment issues that I was talking about.
This definition is also in 1.6 and what it amounts to is that a "byte" is
an addressable unit of storage big enough to hold a character (in the
basic execution character set).  Section 2.2.4.2 then requires this to
be at least 8 bits, but it could be more.

-- 
Mark Brader, SoftQuad Inc.,		"For want of a bit the loop was lost..."
Toronto, utzoo!sq!msb, msb@sq.com				 -- Steve Summit

This article is in the public domain.