mcvoy@rsch.WISC.EDU (Lawrence W. McVoy) (11/05/86)
I have a question about alignment and padding. I have noticed (context:
Vax 780, 4.3BSD) that the c compiler pads out struct sizes to be
long word aligned. And it does the pointer arithmetic based on the
padded sizes. (no sh*t, sherlock, one would hope that they are the same)
For instance,
typedef struct {
char byte;
short word;
} three_bytes;
sizeof(three_bytes) == 4, not 3.
three_bytes* p = 100;
p == 100, p+1 == 104, not 103.
For all of you that knew this, you're all saying big deal, so what? Well,
I do (did) stuff like this all the time:
head = (three_bytes*)calloc(N, sizeof(three_bytes));
This wastes N bytes. Sometimes N is around 10 to the 7th or 8th. Bad news.
The fact that pointer arith is "wrong" makes this very icky to work around
even if you are aware of the problem. Anyone have any comments or
suggestions? Does everyone except me know about this?
Also, what's this about alignment that I hear all the time? If compilers are
already aligning things for you, why bother to do it explicitly? You might
say "so it works on stupid compilers" but who are you to say what alignment
should be? I mean, if you port code to a machine with 24 bit alignment and
you've carefully aligned all your stuff to 32 bit boundries, you've screwed
yourself. No fun. Also no gain.
OK, next question: I want to define some types to hold bytes, words, and
long words, where byte == 8 undsigned bits, word == 16 unsigned bits, and
long words == 32 unsigned bits. I want to give them nice names, names
that imply the number of bits. I could use u8, u16, and u32, but I
don't *like* those names. I thought I had a better plan:
use octet for the byte
use hexdectet for sixteen
use <latin for two>, latin for 30> for 32
but 32 turned out to be "duotrentet" or something and that's ugly. So
does anyone have any better names? Something nice and intuitive and
not ugly? How about Greek? How do they spell them?
--
Larry McVoy mcvoy@rsch.wisc.edu,
{seismo, topaz, harvard, ihnp4, etc}!uwvax!mcvoy
"They're coming soon! Quad-stated guru-gates!"
eric@sdcrdcf.UUCP (Eric Lund) (11/06/86)
"The C Programming Language", Kernighan and Ritchie, p. 196: "Each non-field member of a structure begins on an addressing boundary appropriate to its type; therefore, there may be unnamed holes in a structure." Kernighan and Ritchie say nothing of how big or small the unnamed holes may be. I've used one C compiler that byte aligned chars in structs until it encountered a short, then used even byte alignment until it encountered a long, and then aligned everything on four byte boundaries. There is nothing in the quoted sentence that prevents anyone from writing a code generator that always aligns every non-field member on a 4K boundary. Mayhaps you can use fields for some of your needs, but "Field members are packed into machine integers; they do not straddle words." (p. 196), and "... implementations are not required to support any but integer fields. Moreover, even int fields may be considered to be unsigned." (p. 197) To increase your confidence in predicting the behavior of the C compiler, p. 212 starts out "Purely hardware issues like word size...". (Ever tried sending a raw structure over a heterogeneous network without benefit of RPC? Back when you were young and foolish and didn't know that VAXen and SUNs have different byte orders? And you thought that structs were all the same? You have my sympathies!) You could name bit types of differing lengths VIII, XVI, and XXXII, but I don't think that that's what you intend. Eric the DBA Disclaimer: any opinions found herein are mine. Please return them; no questions asked.
perl@rdin.UUCP (Robert Perlberg) (11/13/86)
> head = (three_bytes*)calloc(N, sizeof(three_bytes)); > > This wastes N bytes. It still would even if sizeof(three_bytes)==3. Malloc allocates storage that is properly aligned for any data type. Therefore, even if you passed 3 instead of 4, malloc would still skip to the next big boundary. I suppose that someone could write a calloc that would align optimally based on the size of the data item being calloc'd, but how would it know just from the sizeof argument what the proper alignment for the data type should be? As far as I know, most, if not all, implementations of calloc just multiply their arguments and invoke malloc to allocate the storage. Robert Perlberg Resource Dynamics Inc. New York {philabs|delftcc}!rdin!perl
jsdy@hadron.UUCP (Joseph S. D. Yao) (11/24/86)
In article <589@rdin.UUCP> perl@rdin.UUCP (Robert Perlberg) writes: >> head = (three_bytes*)calloc(N, sizeof(three_bytes)); >> This wastes N bytes. >It still would even if sizeof(three_bytes)==3. Malloc allocates >storage that is properly aligned for any data type. Therefore, even if >you passed 3 instead of 4, malloc would still skip to the next big >boundary. I suppose that someone could write a calloc that would align >optimally based on the size of the data item being calloc'd, but how >would it know just from the sizeof argument what the proper alignment >for the data type should be? As far as I know, most, if not all, >implementations of calloc just multiply their arguments and invoke >malloc to allocate the storage. You all probably did the multiplication and said, hunh?? after this. The problem is that, even though malloc() returns some- thing longword aligned, if sizeof(...) == 3, then malloc() will get an argument of 3*N, which is 3/4 of 4*N! E.g., for N == 9, malloc() would return 27 (well, 28) bytes instead of 36. As the argument above says, calloc() doesn't know to round the numbers, it just multiplies them. To get each data object on a "proper" longword boundary, the sizeof() returns 4. -- Joe Yao hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP} jsdy@hadron.COM (not yet domainised)