mcvoy@rsch.WISC.EDU (Lawrence W. McVoy) (11/05/86)
I have a question about alignment and padding. I have noticed (context:
Vax 780, 4.3BSD) that the c compiler pads out struct sizes to be
long word aligned. And it does the pointer arithmetic based on the
padded sizes. (no sh*t, sherlock, one would hope that they are the same)
For instance,
typedef struct {
char byte;
short word;
} three_bytes;
sizeof(three_bytes) == 4, not 3.
three_bytes* p = 100;
p == 100, p+1 == 104, not 103.
For all of you that knew this, you're all saying big deal, so what? Well,
I do (did) stuff like this all the time:
head = (three_bytes*)calloc(N, sizeof(three_bytes));
This wastes N bytes. Sometimes N is around 10 to the 7th or 8th. Bad news.
The fact that pointer arith is "wrong" makes this very icky to work around
even if you are aware of the problem. Anyone have any comments or
suggestions? Does everyone except me know about this?
Also, what's this about alignment that I hear all the time? If compilers are
already aligning things for you, why bother to do it explicitly? You might
say "so it works on stupid compilers" but who are you to say what alignment
should be? I mean, if you port code to a machine with 24 bit alignment and
you've carefully aligned all your stuff to 32 bit boundries, you've screwed
yourself. No fun. Also no gain.
OK, next question: I want to define some types to hold bytes, words, and
long words, where byte == 8 undsigned bits, word == 16 unsigned bits, and
long words == 32 unsigned bits. I want to give them nice names, names
that imply the number of bits. I could use u8, u16, and u32, but I
don't *like* those names. I thought I had a better plan:
use octet for the byte
use hexdectet for sixteen
use <latin for two>, latin for 30> for 32
but 32 turned out to be "duotrentet" or something and that's ugly. So
does anyone have any better names? Something nice and intuitive and
not ugly? How about Greek? How do they spell them?
--
Larry McVoy mcvoy@rsch.wisc.edu,
{seismo, topaz, harvard, ihnp4, etc}!uwvax!mcvoy
"They're coming soon! Quad-stated guru-gates!"
pozar@well.UUCP (Tim Pozar) (11/05/86)
Funny you were mentioning structure alignments. I was just writing a programme that plays with the PSP on a MS-DOS machine. I couldn't figure out why the name of the file was always cut off by two bytes. Oh! the structure is aligned on the int boundary. Geezsh. But there is a switch for the Micro- soft 4.0 C compiler (/Zp) that packs structure members. This is for all... Is there any spec that a puts() should a \n at the end of everything? My Microsoft 4.0 compiler does it, and I can't find any reference that describes puts() doing something like that in K&R. Is this a new standard? Tim Pozar ______________________________ | | | UUCP: ihp4!hplabs!well!pozar | | Fido: 125/406 Sysop | |______________________________|
twb@hoqax.UUCP (BEATTIE) (11/05/86)
> OK, next question: I want to define some types to hold bytes, words, and > long words, where byte == 8 undsigned bits, word == 16 unsigned bits, and > long words == 32 unsigned bits. I want to give them nice names, names > that imply the number of bits. I could use u8, u16, and u32, but I > don't *like* those names. I thought I had a better plan: > use octet for the byte > use hexdectet for sixteen > use <latin for two>, latin for 30> for 32 > but 32 turned out to be "duotrentet" or something and that's ugly. So > does anyone have any better names? Something nice and intuitive and > not ugly? How about Greek? How do they spell them? > -- > Larry McVoy mcvoy@rsch.wisc.edu, > {seismo, topaz, harvard, ihnp4, etc}!uwvax!mcvoy I use my own typedefs for portability. I simply redefine the typedefs to get the required length and characteristics. For example SINT32 is a Signed 32 bit Integer anywhere I go and P - Positive, U - Unsigned They are nice and intuitive and not very ugly :-) My typedefs for the VAX are: typedef long PINT32; typedef long SINT32; typedef unsigned long UINT32; typedef long BIT32; typedef short PINT16; typedef short SINT16; typedef unsigned short UINT16; typedef short BIT16; typedef char PINT8; typedef char SINT8; typedef unsigned char UINT8; typedef char BIT8; --- Tom. T.W. Beattie ...!{ihnp4 | houxm | whuxl | ulysses}!hoqax!twb ...!{decvax | ucbvax}!ihnp4!hoqax!twb
guy@sun.uucp (Guy Harris) (11/05/86)
> I have a question about alignment and padding. I have noticed (context: > Vax 780, 4.3BSD) that the c compiler pads out struct sizes to be > long word aligned. Well, no, actually, it doesn't. It *aligns* members of structures on the same boundaries that it would align non-structure-member items of the same type. Thus, in your example, the "short" would be aligned on a 2-byte boundary; since the only thing that precedes it is a single "char", it would require one padding byte between them. On some machines, it simply doesn't have much of a choice. It *could*, presumably, just pack the structure as tightly as possible; however, it would then have to generate a *lot* of extra code to access members not on their "proper" boundary. (No, Virginia, not all machines allow arbitrary-boundary references to "short"s, "int"s, "long"s, etc..) On other machines, it could pack the structure as tightly as possible, because those machines do allow arbitrary-boundary references to "short"s, "int"s, etc.. It would just mean the code would run more slowly, since few (if any) machines that allow arbitrary-boundary references do them as quickly as proper-boundary references. > For all of you that knew this, you're all saying big deal, so what? Well, > I do (did) stuff like this all the time: > > head = (three_bytes*)calloc(N, sizeof(three_bytes)); > > This wastes N bytes. Sometimes N is around 10 to the 7th or 8th. Bad news. > The fact that pointer arith is "wrong" makes this very icky to work around > even if you are aware of the problem. Anyone have any comments or > suggestions? Does everyone except me know about this? Allocating arrays that big is relatively uncommon; C's padding rules make the more common cases work well, and as such are doing the right thing. I'd suggest you allocate 3*N bytes as a single array, and then extract the "short" yourself. NOTE: if you absolutely insist on doing this extraction by casting a pointer to the byte following the "char" into a "short *", and just dereferencing that pointer, surround that code with some "#ifdef" and put a more portable version in the "#else" clause! -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)
peters@cubsvax.UUCP (Peter S. Shenkin) (11/06/86)
In article <sun.8943> guy@sun.uucp (Guy Harris) writes: > [words to the effect that structures are internally padded so that members > wind up on word-boundaries most efficient for that machine] >> For all of you that knew this, you're all saying big deal, so what? Well, >> I do (did) stuff like this all the time: >> >> head = (three_bytes*)calloc(N, sizeof(three_bytes)); >> >> This wastes N bytes. Sometimes N is around 10 to the 7th or 8th. Bad news. >> ...Anyone have any comments or suggestions? > >Allocating arrays that big is relatively uncommon; C's padding rules make >the more common cases work well, and as such are doing the right thing. I'd >suggest you allocate 3*N bytes as a single array, and then extract the >"short" yourself.... Another way to do it: put your short and your char in different structures, and allocate storage for them separately. The program won't be as easy to read, and you won't feel virtuous (what, don't YOU feel virtuous when you write a well-structured program?), but as Guy points out this is a time/storage trade-off, made by most (if not all) C compilers in favor of time, and at the expense of storage. Even if you had a compiler which allowed you to resolve the issue in favor of storage, doing it that way would significantly, perhaps prohibitively, increase the execution time, assuming you really do have to do something to all those structure elements beyond allocating space for them. Time/storage is the best-known trade-off in the programming world, but there are others; for instance, programming_ease/program_performance and program_readability/program_performance. Yours is obviously a case of the latter, and to some extent of the former as well. Peter S. Shenkin Columbia Univ. Biology Dept., NY, NY 10027 {philabs,rna}!cubsvax!peters cubsvax!peters@columbia.ARPA
eric@sdcrdcf.UUCP (Eric Lund) (11/06/86)
"The C Programming Language", Kernighan and Ritchie, p. 196: "Each non-field member of a structure begins on an addressing boundary appropriate to its type; therefore, there may be unnamed holes in a structure." Kernighan and Ritchie say nothing of how big or small the unnamed holes may be. I've used one C compiler that byte aligned chars in structs until it encountered a short, then used even byte alignment until it encountered a long, and then aligned everything on four byte boundaries. There is nothing in the quoted sentence that prevents anyone from writing a code generator that always aligns every non-field member on a 4K boundary. Mayhaps you can use fields for some of your needs, but "Field members are packed into machine integers; they do not straddle words." (p. 196), and "... implementations are not required to support any but integer fields. Moreover, even int fields may be considered to be unsigned." (p. 197) To increase your confidence in predicting the behavior of the C compiler, p. 212 starts out "Purely hardware issues like word size...". (Ever tried sending a raw structure over a heterogeneous network without benefit of RPC? Back when you were young and foolish and didn't know that VAXen and SUNs have different byte orders? And you thought that structs were all the same? You have my sympathies!) You could name bit types of differing lengths VIII, XVI, and XXXII, but I don't think that that's what you intend. Eric the DBA Disclaimer: any opinions found herein are mine. Please return them; no questions asked.
gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/07/86)
In article <2005@well.UUCP> pozar@well.UUCP (Tim Pozar) writes: > Is there any spec that a puts() should a \n at the end of everything? Yes; this is what the original puts() did and what every puts() I have ever seen does. Note that fputs() does NOT append a newline.
henry@utzoo.UUCP (Henry Spencer) (11/07/86)
> Is there any spec that a puts() should a \n at the end of everything? My > Microsoft 4.0 compiler does it, and I can't find any reference that describes > puts() doing something like that in K&R. Is this a new standard? No, it's an extremely old one. You won't find puts() (or its friend gets()) in K&R at all -- they are too old and too thoroughly obsolete. Try using fputs(), which is the modern equivalent (and does not add a newline). -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,decvax,pyramid}!utzoo!henry