[comp.std.c] Size of structure containing char fields

bds@lzaz.ATT.COM (Bruce Szablak) (07/17/90)

Given a structure that contains only char fields (possibly unsigned):

	struct example1 { char a, b, c; };

is ANSI restrictive enough [;-)] to force sizeof(example1) to be 3?

Is anyone aware of existing compilers for which this wouldn't be true?

Is there a portability problem with the following structure where the
array is intended to support a variable length array?

	struct example2 { char a, b, c[1]; };

Thanks in advance.

henry@zoo.toronto.edu (Henry Spencer) (07/18/90)

In article <1030@lzaz.ATT.COM> bds@lzaz.ATT.COM (Bruce Szablak) writes:
>Given a structure that contains only char fields (possibly unsigned):
>
>	struct example1 { char a, b, c; };
>
>is ANSI restrictive enough [;-)] to force sizeof(example1) to be 3?
>Is anyone aware of existing compilers for which this wouldn't be true?

ANSI more or less leaves the question open.  Machines on which char
pointers are strange will almost certainly pad the structure to a word
boundary so they can use "normal" pointer format for `struct example1 *'.
Some compilers on more ordinary machines may do likewise.  Some won't.
-- 
NFS:  all the nice semantics of MSDOS, | Henry Spencer at U of Toronto Zoology
and its performance and security too.  |  henry@zoo.toronto.edu   utzoo!henry

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/24/90)

In article <1030@lzaz.ATT.COM> bds@lzaz.ATT.COM (Bruce Szablak) writes:
>Given a structure that contains only char fields (possibly unsigned):
>	struct example1 { char a, b, c; };
>is ANSI restrictive enough [;-)] to force sizeof(example1) to be 3?

No; there may be padding at the end of the structure.

>Is anyone aware of existing compilers for which this wouldn't be true?

Yes.  Practically any word-oriented architecture would have padding.

>Is there a portability problem with the following structure where the
>array is intended to support a variable length array?
>	struct example2 { char a, b, c[1]; };

There is no problem with the declaration; however, care must be exerted
in using it for variable-sized allocations.  Karl Heuer (I think it was)
a few months ago gave a logical analysis in support of the view that the
C standard does permit this sort of trick if coded properly.  That is,
an attempt to enforce a bounds check on the array would be nonconforming
in such contexts.  I agree with the analysis, and have in fact used this
trick in code that was intended to be highly portable, but I don't
recall X3J11 specifically blessing this interpretation.

jimp@cognos.UUCP (Jim Patterson) (07/24/90)

In article <1030@lzaz.ATT.COM> bds@lzaz.ATT.COM (Bruce Szablak) writes:
>Given a structure that contains only char fields (possibly unsigned):
>
>	struct example1 { char a, b, c; };
>
>is ANSI restrictive enough [;-)] to force sizeof(example1) to be 3?

No. The compiler is permitted to put in padding between members and
at the end "as necessary to achieve the appropriate alignment were the
structure or union to be an element of an array" (section 3.5.2.1). 
To do what you want, the standard would have to constrain implementors
to do this and no more, but it seems that the constraint is only that
it be sufficient, at least as I read it.

>Is anyone aware of existing compilers for which this wouldn't be true?

Sun's C compiler on a Sun 3 (with Sun OS 4.0) aligns structs on two
byte boundaries. This is NOT an ANSI-compatible compiler, but I don't
think that the ANSI standard will force them to change their ways. My
guess is that the implementation was done this way simply because it
was easier. Two-byte alignment is sufficient for any data type on the
Sun 3, and rarely wasts any storage since only char datatypes can get
by with only 1-byte alignment (your example is an exception).
Interestingly, with the Sun 4, your struct in fact has a size of 3,
even though the Sun 4 generally has more strict alignment rules than
the Sun 3.

Another compiler that uses a constant 2-byte alignment rule is the
Data General MV-series C compiler. Since the MV is a word-addressed
machine, this is the most reasonable implementation and permits struct
pointers to be word pointers. (Word and Character pointers have
different formats on the MV). Permitting an odd-sized struct would
require using character pointers instead of word pointers to structs,
which is less efficient in most instances.

>Is there a portability problem with the following structure where the
>array is intended to support a variable length array?
>
>	struct example2 { char a, b, c[1]; };

I think this approach is reasonably portable if you use offsetof
instead of relying on the sizeof() operator. E.g. to allocate the
structure allowing the array "c" to be "n" bytes long:

   #include <stddef.h>
   #include <stdlib.h>
   ...
   struct example2 { char a, b, c[1]; };
   struct example2 *ptr = malloc(offsetof(struct example2, c) + n);
-- 
Jim Patterson                              Cognos Incorporated
UUCP:decvax!utzoo!dciem!nrcaer!cognos!jimp P.O. BOX 9707    
PHONE:(613)738-1440                        3755 Riverside Drive
                                           Ottawa, Ont  K1G 3Z4

daniels@ogicse.ogc.edu (Scott David Daniels) (07/28/90)

In article <8631@cognos.UUCP> jimp@cognos.UUCP (Jim Patterson) writes:
>In article <1030@lzaz.ATT.COM> bds@lzaz.ATT.COM (Bruce Szablak) writes:
>>Is there a portability problem with the following structure where the
>>array is intended to support a variable length array?
>>
>>	struct example2 { char a, b, c[1]; };
>
>I think this approach is reasonably portable if you use offsetof
>instead of relying on the sizeof() operator. E.g. to allocate the
>structure allowing the array "c" to be "n" bytes long:
>
>   #include <stddef.h>
>   #include <stdlib.h>
>   ...
>   struct example2 { char a, b, c[1]; };
>   struct example2 *ptr = malloc(offsetof(struct example2, c) + n);
>-- 
>Jim Patterson                              Cognos Incorporated
>UUCP:decvax!utzoo!dciem!nrcaer!cognos!jimp P.O. BOX 9707    
>PHONE:(613)738-1440                        3755 Riverside Drive
>                                           Ottawa, Ont  K1G 3Z4

I think you had better not follow the offsetof advice, it provides the 
offset of the field in the record, and should not be thought of as providing
the length of the record up to that point.  It is a tiny point, but note
that structure assignment (if used) may copy the entire sizeof(struct) block,
while 

    struct example2 *ptr = malloc(offsetof(struct example2, c) + n);

may allocate a small block (less than sizeof(structexample2)) if, for 
example, n is 0. Code like:

    struct example2 *ptr = malloc(sizeof(struct example2) + strlen(name) );
    (void) strcpy( ptr->c, name );

may over-allocate (if the system is padding the record), but it will always
provide enough room, and never under-allocate.

-Scott Daniels
daniels@cse.ogi.edu

guy@auspex.auspex.com (Guy Harris) (07/29/90)

>Sun's C compiler on a Sun 3 (with Sun OS 4.0) aligns structs on two
>byte boundaries. This is NOT an ANSI-compatible compiler, but I don't
>think that the ANSI standard will force them to change their ways. My
>guess is that the implementation was done this way simply because it
>was easier.

It was done that way because the Sun compiler was ultimately based on
the PCC port to the 68K done at MIT, and I think those porters decided
to impose only two-byte alignment on 4-byte data types - after all, it's
not as if Motorola was ever going to come out with a full-blown 32-bit
chip in the 68K family that would run faster with 4-byte alignment,
right? :-(

I think other 68K C compilers, at least for UNIX systems, do the same.

>Interestingly, with the Sun 4, your struct in fact has a size of 3,
>even though the Sun 4 generally has more strict alignment rules than
>the Sun 3.

Actually, there is one case where the Sun-4 compiler has *less* strict
alignment rules than the Sun-3 compiler.  The MIT compiler also made the
minimum alignment requirement for a structure 2-byte alignment, while
most other PCC-based compilers, including Sun's SPARC compiler, impose
no minimum alignment requirement on structures beyond the most
restrictive alignment requirement of the structure's members.

steve@taumet.com (Stephen Clamage) (07/29/90)

jimp@cognos.UUCP (Jim Patterson) writes:

>Sun's C compiler on a Sun 3 (with Sun OS 4.0) aligns structs on two
>byte boundaries. This is NOT an ANSI-compatible compiler, but I don't
>think that the ANSI standard will force them to change their ways.

As you noted, the ANSI standard allows padding between and at the end
of a struct.  The standard does not (and cannot) say anything about
where a data object must be located in memory.  Recall it is the struct
object, not the struct data type, which is being aligned.  That object
has an address, and whether one or more bytes preceding it in memory
are used or not is not business of the Standard.  In any event, no
portable program can access such bytes, even if they exist.
-- 

Steve Clamage, TauMetric Corp, steve@taumet.com

scott@bbxsda.UUCP (Scott Amspoker) (08/04/90)

In article <8631@cognos.UUCP> jimp@cognos.UUCP (Jim Patterson) writes:
>In article <1030@lzaz.ATT.COM> bds@lzaz.ATT.COM (Bruce Szablak) writes:
>>Given a structure that contains only char fields (possibly unsigned):
>>
>>	struct example1 { char a, b, c; };
>>
>>is ANSI restrictive enough [;-)] to force sizeof(example1) to be 3?
>>Is anyone aware of existing compilers for which this wouldn't be true?
>
>Sun's C compiler on a Sun 3 (with Sun OS 4.0) aligns structs on two
>byte boundaries....
>Another compiler that uses a constant 2-byte alignment rule is the
>Data General MV-series C compiler....

Interesting.  We dynamically allocate different structures back-to-back
in a common memory arena.  Some structures require word alignment and
some don't.  For a long time we simply used sizeof() to allocate the
next structure (we ran on many different machines without any problem).

We finally ran into an implementation that did not pad the character 
structures and got zapped when they got sandwhiched in between other 
structures that required alignment.  (We solved the problem by aligning 
the sizeof() in our code which is a constant expression folded at compile 
time).

The point is, for many years, systems that did *not* pad character
structures seemed to be the mavericks.  Today, I don't know.
We have it set up so we don't care anymore.

-- 
Scott Amspoker
Basis International, Albuquerque, NM
(505) 345-5232
unmvax.cs.unm.edu!bbx!bbxsda!scott