[net.lang.c] C structs & A question about octet

mcvoy@rsch.WISC.EDU (Lawrence W. McVoy) (11/05/86)

I have a question about alignment and padding.  I have noticed (context:
Vax 780, 4.3BSD) that the c compiler pads out struct sizes to be
long word aligned.  And it does the pointer arithmetic based on the
padded sizes. (no sh*t, sherlock, one would hope that they are the same)
For instance,

    typedef struct {
	    char	byte;
	    short	word;
    } three_bytes;

    sizeof(three_bytes) == 4, not 3.

    three_bytes* p = 100;

    p == 100, p+1 == 104, not 103.

For all of you that knew this, you're all saying big deal, so what?  Well,
I do (did) stuff like this all the time:

    head = (three_bytes*)calloc(N, sizeof(three_bytes));

This wastes N bytes.  Sometimes N is around 10 to the 7th or 8th.  Bad news.
The fact that pointer arith is "wrong" makes this very icky to work around
even if you are aware of the problem.  Anyone have any comments or
suggestions?  Does everyone except me know about this?

Also, what's this about alignment that I hear all the time?  If compilers are
already aligning things for you, why bother to do it explicitly?  You might
say "so it works on stupid compilers" but who are you to say what alignment
should be?  I mean, if you port code to a machine with 24 bit alignment and
you've carefully aligned all your stuff to 32 bit boundries, you've screwed
yourself.  No fun.  Also no gain.

OK, next question:  I want to define some types to hold bytes, words, and
long words, where byte == 8 undsigned bits, word == 16 unsigned bits, and
long words == 32 unsigned bits.  I want to give them nice names, names
that imply the number of bits.  I could use u8, u16, and u32, but I
don't *like* those names.  I thought I had a better plan:
	use octet for the byte
	use hexdectet for sixteen
	use <latin for two>, latin for 30> for 32
but 32 turned out to be "duotrentet" or something and that's ugly.  So
does anyone have any better names?  Something nice and intuitive and
not ugly?  How about Greek?  How do they spell them?
-- 
Larry McVoy 	        mcvoy@rsch.wisc.edu, 
      		        {seismo, topaz, harvard, ihnp4, etc}!uwvax!mcvoy

"They're coming soon!  Quad-stated guru-gates!"

pozar@well.UUCP (Tim Pozar) (11/05/86)

   Funny you were mentioning structure alignments.  I was just writing a 
programme that plays with the PSP on a MS-DOS machine.  I couldn't figure out
why the name of the file was always cut off by two bytes.  Oh! the structure
is aligned on the int boundary.  Geezsh.  But there is a switch for the Micro-
soft 4.0 C compiler (/Zp) that packs structure members.

    This is for all...
    Is there any spec that a puts() should a \n at the end of everything?  My
Microsoft 4.0 compiler does it, and I can't find any reference that describes
puts() doing something like that in K&R.  Is this a new standard?
                         Tim Pozar
 ______________________________
|                              |
| UUCP: ihp4!hplabs!well!pozar |
| Fido: 125/406 Sysop          |
|______________________________|

twb@hoqax.UUCP (BEATTIE) (11/05/86)

> OK, next question:  I want to define some types to hold bytes, words, and
> long words, where byte == 8 undsigned bits, word == 16 unsigned bits, and
> long words == 32 unsigned bits.  I want to give them nice names, names
> that imply the number of bits.  I could use u8, u16, and u32, but I
> don't *like* those names.  I thought I had a better plan:
> 	use octet for the byte
> 	use hexdectet for sixteen
> 	use <latin for two>, latin for 30> for 32
> but 32 turned out to be "duotrentet" or something and that's ugly.  So
> does anyone have any better names?  Something nice and intuitive and
> not ugly?  How about Greek?  How do they spell them?
> -- 
> Larry McVoy 	        mcvoy@rsch.wisc.edu, 
>      		        {seismo, topaz, harvard, ihnp4, etc}!uwvax!mcvoy
	
I use my own typedefs for portability.
I simply redefine the typedefs to get the required length and
characteristics.
For example SINT32 is a Signed 32 bit Integer anywhere I go and
P - Positive, U - Unsigned
They are nice and intuitive and not very ugly :-)
My typedefs for the VAX are:
typedef	long		PINT32;
typedef	long		SINT32;
typedef	unsigned long	UINT32;
typedef	long		BIT32;

typedef	short		PINT16;
typedef	short		SINT16;
typedef	unsigned short	UINT16;
typedef	short		BIT16;

typedef	char		PINT8;
typedef	char		SINT8;
typedef	unsigned char	UINT8;
typedef	char		BIT8;

---
Tom.
T.W. Beattie
...!{ihnp4 | houxm | whuxl | ulysses}!hoqax!twb
...!{decvax | ucbvax}!ihnp4!hoqax!twb

guy@sun.uucp (Guy Harris) (11/05/86)

> I have a question about alignment and padding.  I have noticed (context:
> Vax 780, 4.3BSD) that the c compiler pads out struct sizes to be
> long word aligned.

Well, no, actually, it doesn't.  It *aligns* members of structures on the
same boundaries that it would align non-structure-member items of the same
type.  Thus, in your example, the "short" would be aligned on a 2-byte
boundary; since the only thing that precedes it is a single "char", it would
require one padding byte between them.

On some machines, it simply doesn't have much of a choice.  It *could*,
presumably, just pack the structure as tightly as possible; however, it
would then have to generate a *lot* of extra code to access members not on
their "proper" boundary.  (No, Virginia, not all machines allow
arbitrary-boundary references to "short"s, "int"s, "long"s, etc..)

On other machines, it could pack the structure as tightly as possible,
because those machines do allow arbitrary-boundary references to "short"s,
"int"s, etc..  It would just mean the code would run more slowly, since few
(if any) machines that allow arbitrary-boundary references do them as
quickly as proper-boundary references.

> For all of you that knew this, you're all saying big deal, so what?  Well,
> I do (did) stuff like this all the time:
> 
>     head = (three_bytes*)calloc(N, sizeof(three_bytes));
> 
> This wastes N bytes.  Sometimes N is around 10 to the 7th or 8th.  Bad news.
> The fact that pointer arith is "wrong" makes this very icky to work around
> even if you are aware of the problem.  Anyone have any comments or
> suggestions?  Does everyone except me know about this?

Allocating arrays that big is relatively uncommon; C's padding rules make
the more common cases work well, and as such are doing the right thing.  I'd
suggest you allocate 3*N bytes as a single array, and then extract the
"short" yourself.  NOTE: if you absolutely insist on doing this extraction
by casting a pointer to the byte following the "char" into a "short *", and
just dereferencing that pointer, surround that code with some "#ifdef" and
put a more portable version in the "#else" clause!
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

peters@cubsvax.UUCP (Peter S. Shenkin) (11/06/86)

In article <sun.8943> guy@sun.uucp (Guy Harris) writes:

> [words to the effect that structures are internally padded so that members
> wind up on word-boundaries most efficient for that machine]

>> For all of you that knew this, you're all saying big deal, so what?  Well,
>> I do (did) stuff like this all the time:
>> 
>>     head = (three_bytes*)calloc(N, sizeof(three_bytes));
>> 
>> This wastes N bytes.  Sometimes N is around 10 to the 7th or 8th.  Bad news.
>> ...Anyone have any comments or suggestions?
>
>Allocating arrays that big is relatively uncommon; C's padding rules make
>the more common cases work well, and as such are doing the right thing.  I'd
>suggest you allocate 3*N bytes as a single array, and then extract the
>"short" yourself....

Another way to do it:  put your short and your char in different structures,
and allocate storage for them separately.  The program won't be as easy to
read, and you won't feel virtuous (what, don't YOU feel virtuous when you
write a well-structured program?), but as Guy points out this is a time/storage 
trade-off, made by most (if not all) C compilers in favor of time, and at the 
expense of storage.  Even if you had a compiler which allowed you to resolve
the issue in favor of storage, doing it that way would significantly, perhaps
prohibitively, increase the execution time, assuming you really do have to
do something to all those structure elements beyond allocating space for them.

Time/storage is the best-known trade-off in the programming world, but there 
are others; for instance, programming_ease/program_performance and 
program_readability/program_performance.  Yours is obviously a case of the 
latter, and to some extent of the former as well.

Peter S. Shenkin	 Columbia Univ. Biology Dept., NY, NY  10027
{philabs,rna}!cubsvax!peters		cubsvax!peters@columbia.ARPA

eric@sdcrdcf.UUCP (Eric Lund) (11/06/86)

"The C Programming Language", Kernighan and Ritchie, p. 196: "Each
non-field member of a structure begins on an addressing boundary
appropriate to its type; therefore, there may be unnamed holes in a
structure."

Kernighan and Ritchie say nothing of how big or small the unnamed holes
may be.  I've used one C compiler that byte aligned chars in structs
until it encountered a short, then used even byte alignment until it
encountered a long, and then aligned everything on four byte
boundaries.  There is nothing in the quoted sentence that prevents
anyone from writing a code generator that always aligns every non-field
member on a 4K boundary.

Mayhaps you can use fields for some of your needs, but "Field members
are packed into machine integers; they do not straddle words."
(p. 196), and "... implementations are not required to support any but
integer fields.  Moreover, even int fields may be considered to be
unsigned." (p. 197)  To increase your confidence in predicting the
behavior of the C compiler, p. 212 starts out "Purely hardware issues
like word size...".

(Ever tried sending a raw structure over a heterogeneous network
without benefit of RPC?  Back when you were young and foolish and
didn't know that VAXen and SUNs have different byte orders?  And you
thought that structs were all the same?  You have my sympathies!)


You could name bit types of differing lengths VIII, XVI, and XXXII,
but I don't think that that's what you intend.

					Eric the DBA

Disclaimer: any opinions found herein are mine.  Please return them;
no questions asked.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/07/86)

In article <2005@well.UUCP> pozar@well.UUCP (Tim Pozar) writes:
>    Is there any spec that a puts() should a \n at the end of everything?

Yes; this is what the original puts() did and what every puts() I
have ever seen does.  Note that fputs() does NOT append a newline.

henry@utzoo.UUCP (Henry Spencer) (11/07/86)

>     Is there any spec that a puts() should a \n at the end of everything?  My
> Microsoft 4.0 compiler does it, and I can't find any reference that describes
> puts() doing something like that in K&R.  Is this a new standard?

No, it's an extremely old one.  You won't find puts() (or its friend gets())
in K&R at all -- they are too old and too thoroughly obsolete.  Try using
fputs(), which is the modern equivalent (and does not add a newline).
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry