[comp.std.c] Structure Member Padding

healey@XN.LL.MIT.EDU (Joseph T. Healey) (07/06/90)

Does ANSI C enforce any inter-member "padding" standard?? 

			thanks, Joe Healey

henry@zoo.toronto.edu (Henry Spencer) (07/06/90)

In article <1929@xn.LL.MIT.EDU> healey@XN.LL.MIT.EDU (Joseph T. Healey) writes:
>Does ANSI C enforce any inter-member "padding" standard?? 

No.  ANSI C explicitly leaves this up to the implementation.

That's pretty well the only viable approach, since padding requirements
vary greatly.
-- 
"Either NFS must be scrapped or NFS    | Henry Spencer at U of Toronto Zoology
must be changed."  -John K. Ousterhout |  henry@zoo.toronto.edu   utzoo!henry

msb@sq.sq.com (Mark Brader) (07/08/90)

> > Does ANSI C enforce any inter-member "padding" standard?? 

> No.  ANSI C explicitly leaves this up to the implementation.
> That's pretty well the only viable approach ...

Well, here is what 3.5.2.1 actually says.  (Um, actually, this is from
the October 1988 draft.)

#  Within a structure object, the non-bit-field members and the units
#  in which bit-fields reside have addresses that increase in the order
#  in which they are declared.  A pointer to a structure object, suitably
#  converted, points to its initial member (or if that member is a bit-
#  field, then to the object in which it resides), and vice versa.
#  There may therefore be unnamed holes within a structure object,
#  but not at its beginning, AS NECESSARY TO ACHIEVE THE APPROPRIATE
#  ALIGNMENT.
#
#  ... There may also be unnamed padding at the end of a structure,
#  AS NECESSARY TO ACHIEVE THE APPROPRIATE ALIGNMENT were the structure
#  or union to be an element of an array.

Notice the emphasis that I have added: alignment is the only reason that
permits the use of holes or other padding.  Now, "alignment" is defined
in 1.6 as

#  a requirement that objects of a particular type be located on storage
#  boundaries with addresses that are particular multiples of a byte address.
(presumably meaning "with byte addresses that are multiples of a particular
value", but never mind, the intent is clear).

Note that alignment is a function of the type only, and it isn't
permitted for a type to have an alignment requirement larger than its
own size -- e.g., chars couldn't be required to be on even addresses --
because elements of an array are guaranteed to be adjacent (in 3.1.2.5).

It therefore appears to me that we can deduce that, in a struct such as

	struct {
		int	yksi;
		int	kaksi;
		char	kolme;
		char	nelja;
		char	viisi;
		int	kuusi;
		int	seitseman;
	};

the *only* places there could be padding are [1] after viisi, but
fewer than sizeof(int) bytes, and [2] at the end.  The consecutive
members of the same type have the same alignment requirement, and char
is guaranteed to have the least strict alignment requirement (in 3.3.4).

The overall struct type could have a stricter alignment requirement
than any member, so there could be padding at the end.

(Responses to this posting will probably not be read by me for 4 weeks.)

-- 
Mark Brader		"Europe contains a great many cathedrals, which were
Toronto			 caused by the Middle Ages, which means they are very
utzoo!sq!msb		 old, so you have to take color slide photographs
msb@sq.com		 of them."			-- Dave Barry

This article is in the public domain.

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/08/90)

In article <1929@xn.LL.MIT.EDU> healey@XN.LL.MIT.EDU (Joseph T. Healey) writes:
>Does ANSI C enforce any inter-member "padding" standard?? 

Certain constraints are implicit in what the standard does specify,
but in general the answer is "no, padding decisions are left up to
the implementor".

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/08/90)

In article <1990Jul7.225141.12002@sq.sq.com> msb@sq.sq.com (Mark Brader) writes:
>#  a requirement that objects of a particular type be located on storage
>#  boundaries with addresses that are particular multiples of a byte address.
>(presumably meaning "with byte addresses that are multiples of a particular
>value", but never mind, the intent is clear).

No, non-char objects do not have "byte addresses", only chars have those.
On a word-addressable architecture, a char must evenly divide a word, so
word-aligned objects will be aligned on specific multiples of "byte
addresses" even though "byte addressing" will in that case be an artifact
invented by the compiler implementor specifically for the purpose of being
able to denote individual members of packed char arrays.

>Note that alignment is a function of the type only, and it isn't
>permitted for a type to have an alignment requirement larger than its
>own size -- e.g., chars couldn't be required to be on even addresses --
>because elements of an array are guaranteed to be adjacent (in 3.1.2.5).

Chars could, however, in theory be constrained to even MACHINE addresses,
although consecutive char* values in C programs would still be required
to index consecutive char objects (each of which would, in such a case,
occupy two machine storage units).

Your argument about struct member alignment seems correct, but it raises
a problem with
	struct {
		short s;
		char a, b, c;
	};
on a word-addressed architecture, if short is made a half-word.  Note that
your line of argument would lead to the conclusion that there can be no
padding before the first char, but in such a situation a half-word of
padding would be necessary.  So, if your interpretation is correct, an
implementor in such an environment would have to make his shorts full-
word sized.  This may not be a big problem, but I'm somewhat surprised
by this.  I think we could use an official interpretation ruling here too.

daniels@ogicse.ogc.edu (Scott David Daniels) (07/10/90)

In article <13321@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes:

In article <13321@smoke.BRL.MIL> you write:
>...
>Your argument about struct member alignment seems correct, but it raises
>a problem with
>	struct {
>		short s;
>		char a, b, c;
>	};
>on a word-addressed architecture, if short is made a half-word.  Note that
>your line of argument would lead to the conclusion that there can be no
>padding before the first char, but in such a situation a half-word of
>padding would be necessary.  So, if your interpretation is correct, an
>implementor in such an environment would have to make his shorts full-
>word sized.  This may not be a big problem, but I'm somewhat surprised
>by this.  I think we could use an official interpretation ruling here too.

You may be more convinced when you realize that

	struct { short s; char a; };
	struct { short s; char a, b; };
	struct { short s; char a, b, c; };

should all have the same head (that is, the addition of b and c should not
move where a belongs).  The reason for wanting this is obvious: people often
build variant records in C by sharing heads, then including enough information
in the head to distinguish which tail is used.  Unfortunately, I believe the
standard goes a bit too far in specifying layout by insisting on an order for
the layout that precludes packing the record.

	struct first { char a; short s; };

	struct second{ char a; short s; char b; };

	struct third { char a; char b; short s; };

The first will be layed out (on a machine with alignment requirements):
	<a> <waste> <s>
I believe the second must therefore be layed out:
	<a> <waste> <s> <b> <possible waste> 
The third may be layed out:
	<a>   <b>   <s>

I would like to be able to use the same layout for third and second, thus
making packed data structures (while still keeping the alignment efficiency).
This can easily be done by the compiler (it simply keeps track of holes and
their alignments, and fills them whenever an appropriate (non-array) element
is added.  The reason for non-array is to allow the existing code that extends
a struct by adding to a final array at the end to continue to work.  However, 
I believe the standard requires me to use the layout that I have shown.  Am 
I wrong?  Can someone explain why this specification of the the layout is 
actually useful or desireable? ...

-Scott David Daniels
daniels@cse.ogi.edu			Just another puzzled programmer.

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/10/90)

In article <10420@ogicse.ogc.edu> daniels@ogicse.ogc.edu (Scott David Daniels) writes:
-	struct { short s; char a; };
-	struct { short s; char a, b; };
-	struct { short s; char a, b, c; };
-should all have the same head (that is, the addition of b and c should not
-move where a belongs).

But that has nothing to do with my point; in my example these would
all have compatible layout, including the padding between s and a;
I assumed that chars would not be packed into the same word as the short.

-	struct first { char a; short s; };
-	struct second{ char a; short s; char b; };
-	struct third { char a; char b; short s; };
-I would like to be able to use the same layout for third and second, ...

But if shorts have specific alignment requirements, this is impossible.
(At least, it would impose an unacceptable performance penalty.)

peter@ficc.ferranti.com (Peter da Silva) (07/10/90)

In article <10420@ogicse.ogc.edu> daniels@ogicse.ogc.edu (Scott David Daniels) writes:
> 	struct first { char a; short s; };
> 	struct second{ char a; short s; char b; };
> 	struct third { char a; char b; short s; };
> I would like to be able to use the same layout for third and second, ...

struct head { char a; short b; };
struct first { char a; short b; long c; };
struct second { char a; short b; char c; };

You want to shove second.c between second.a and second.b, right?

copyhead(dest, src)
struct head *dest, *src;
{
	*dest = *src;
}

	struct first a;
	struct second b;

	copyhead(b, a);

You just clobbered b.c if you do that.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.
<peter@ficc.ferranti.com>

karl@haddock.ima.isc.com (Karl Heuer) (07/12/90)

In article <13321@smoke.BRL.MIL> gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
>[Mark's] argument about struct member alignment seems correct, but it raises
>a problem with
>	struct { short s; char a, b, c; };
>on a word-addressed architecture, if short is made a half-word.  Note that
>your line of argument would lead to the conclusion that there can be no
>padding before the first char, but in such a situation a half-word of
>padding would be necessary.

I don't get it.  Why should a half-word of internal padding be necessary or
desirable?  Looks to me like the obvious implementation is to make it a
two-word struct, with the first word containing s and a and b, and the second
word containing c and three pad bytes.  What's the problem?

Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/13/90)

In article <17069@haddock.ima.isc.com> karl@kelp.ima.isc.com (Karl Heuer) writes:
>I don't get it.  Why should a half-word of internal padding be necessary or
>desirable?  Looks to me like the obvious implementation is to make it a
>two-word struct, with the first word containing s and a and b, and the second
>word containing c and three pad bytes.  What's the problem?

On many, perhaps most, word-addressed architectures there are speed advantages
to not trying to pack different small data types within the same word, unless
bit fields are explicitly requested.

flaps@dgp.toronto.edu (Alan J Rosenthal) (07/13/90)

daniels@ogicse.ogc.edu (Scott David Daniels) writes:
>This can easily be done by the compiler (it simply keeps track of holes and
>their alignments, and fills them whenever an appropriate (non-array) element
>is added.  The reason for non-array is to allow the existing code that extends
>a struct by adding to a final array at the end to continue to work.

For:
	struct { int i; char c; } var;
and
	struct { int i; char c[1]; } var;
to have different layouts, indeed different semantics with respect to layout,
would be quite bizarre.  Actually, it would be ok by me for the char to be
placed in different portions of the otherwise-wasted word in the two cases, but
for the behaviour of extending the struct by adding to the count passed to
malloc() to differ in the two cases would be strange.

ajr

p.s. the memcpy-clobber problem is of course more serious.

blarson@dianne.usc.edu (bob larson) (07/14/90)

In article <1990Jul13.104407.29078@jarvis.csri.toronto.edu> flaps@dgp.toronto.edu (Alan J Rosenthal) writes:
>For:
>	struct { int i; char c; } var;
>and
>	struct { int i; char c[1]; } var;
>to have different layouts, indeed different semantics with respect to layout,
>would be quite bizarre.

Prime's C compiler stores char variables in the right half of a 2-byte
"halfword", and character arrays are packed and start at the left.
This does break code that makes assumptions about unions or that no
padding is between individually declared chars in a structure.
As far as I know, they havn't changed this behavor in their beta-test
ansi compiler.

union {
    struct { char a, b, c, d;} x;
    char y[4];
}

Code that assumes that x.a is the same as y[0] and x.d is y[3] will not
work on prime's c compiler.  note that &a+1 != &b
Bob Larson (blars)	blarson@usc.edu			usc!blarson
	Hiding differences does not make them go away.  Accepting
	differences makes them unimportant.
To join Prime computer mailing list:	info-prime-request@ais1.usc.edu