[comp.lang.c] C structs & A question about octet

mcvoy@rsch.WISC.EDU (Lawrence W. McVoy) (11/05/86)

I have a question about alignment and padding.  I have noticed (context:
Vax 780, 4.3BSD) that the c compiler pads out struct sizes to be
long word aligned.  And it does the pointer arithmetic based on the
padded sizes. (no sh*t, sherlock, one would hope that they are the same)
For instance,

    typedef struct {
	    char	byte;
	    short	word;
    } three_bytes;

    sizeof(three_bytes) == 4, not 3.

    three_bytes* p = 100;

    p == 100, p+1 == 104, not 103.

For all of you that knew this, you're all saying big deal, so what?  Well,
I do (did) stuff like this all the time:

    head = (three_bytes*)calloc(N, sizeof(three_bytes));

This wastes N bytes.  Sometimes N is around 10 to the 7th or 8th.  Bad news.
The fact that pointer arith is "wrong" makes this very icky to work around
even if you are aware of the problem.  Anyone have any comments or
suggestions?  Does everyone except me know about this?

Also, what's this about alignment that I hear all the time?  If compilers are
already aligning things for you, why bother to do it explicitly?  You might
say "so it works on stupid compilers" but who are you to say what alignment
should be?  I mean, if you port code to a machine with 24 bit alignment and
you've carefully aligned all your stuff to 32 bit boundries, you've screwed
yourself.  No fun.  Also no gain.

OK, next question:  I want to define some types to hold bytes, words, and
long words, where byte == 8 undsigned bits, word == 16 unsigned bits, and
long words == 32 unsigned bits.  I want to give them nice names, names
that imply the number of bits.  I could use u8, u16, and u32, but I
don't *like* those names.  I thought I had a better plan:
	use octet for the byte
	use hexdectet for sixteen
	use <latin for two>, latin for 30> for 32
but 32 turned out to be "duotrentet" or something and that's ugly.  So
does anyone have any better names?  Something nice and intuitive and
not ugly?  How about Greek?  How do they spell them?
-- 
Larry McVoy 	        mcvoy@rsch.wisc.edu, 
      		        {seismo, topaz, harvard, ihnp4, etc}!uwvax!mcvoy

"They're coming soon!  Quad-stated guru-gates!"

eric@sdcrdcf.UUCP (Eric Lund) (11/06/86)

"The C Programming Language", Kernighan and Ritchie, p. 196: "Each
non-field member of a structure begins on an addressing boundary
appropriate to its type; therefore, there may be unnamed holes in a
structure."

Kernighan and Ritchie say nothing of how big or small the unnamed holes
may be.  I've used one C compiler that byte aligned chars in structs
until it encountered a short, then used even byte alignment until it
encountered a long, and then aligned everything on four byte
boundaries.  There is nothing in the quoted sentence that prevents
anyone from writing a code generator that always aligns every non-field
member on a 4K boundary.

Mayhaps you can use fields for some of your needs, but "Field members
are packed into machine integers; they do not straddle words."
(p. 196), and "... implementations are not required to support any but
integer fields.  Moreover, even int fields may be considered to be
unsigned." (p. 197)  To increase your confidence in predicting the
behavior of the C compiler, p. 212 starts out "Purely hardware issues
like word size...".

(Ever tried sending a raw structure over a heterogeneous network
without benefit of RPC?  Back when you were young and foolish and
didn't know that VAXen and SUNs have different byte orders?  And you
thought that structs were all the same?  You have my sympathies!)


You could name bit types of differing lengths VIII, XVI, and XXXII,
but I don't think that that's what you intend.

					Eric the DBA

Disclaimer: any opinions found herein are mine.  Please return them;
no questions asked.

perl@rdin.UUCP (Robert Perlberg) (11/13/86)

>     head = (three_bytes*)calloc(N, sizeof(three_bytes));
> 
> This wastes N bytes.

It still would even if sizeof(three_bytes)==3.  Malloc allocates
storage that is properly aligned for any data type.  Therefore, even if
you passed 3 instead of 4, malloc would still skip to the next big
boundary.  I suppose that someone could write a calloc that would align
optimally based on the size of the data item being calloc'd, but how
would it know just from the sizeof argument what the proper alignment
for the data type should be?  As far as I know, most, if not all,
implementations of calloc just multiply their arguments and invoke
malloc to allocate the storage.

Robert Perlberg
Resource Dynamics Inc.
New York
{philabs|delftcc}!rdin!perl

jsdy@hadron.UUCP (Joseph S. D. Yao) (11/24/86)

In article <589@rdin.UUCP> perl@rdin.UUCP (Robert Perlberg) writes:
>>     head = (three_bytes*)calloc(N, sizeof(three_bytes));
>> This wastes N bytes.
>It still would even if sizeof(three_bytes)==3.  Malloc allocates
>storage that is properly aligned for any data type.  Therefore, even if
>you passed 3 instead of 4, malloc would still skip to the next big
>boundary.  I suppose that someone could write a calloc that would align
>optimally based on the size of the data item being calloc'd, but how
>would it know just from the sizeof argument what the proper alignment
>for the data type should be?  As far as I know, most, if not all,
>implementations of calloc just multiply their arguments and invoke
>malloc to allocate the storage.

You all probably did the multiplication and said, hunh?? after
this.  The problem is that, even though malloc() returns some-
thing longword aligned, if sizeof(...) == 3, then malloc() will
get an argument of 3*N, which is 3/4 of 4*N!  E.g., for N == 9,
malloc() would return 27 (well, 28) bytes instead of 36.  As the
argument above says, calloc() doesn't know to round the numbers,
it just multiplies them.  To get each data object on a "proper"
longword boundary, the sizeof() returns 4.
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}
			jsdy@hadron.COM (not yet domainised)