jdubb@bucsf.bu.edu (jay dubb) (03/20/91)
I am posting this for a friend of mine who doesn't have access to USENET, so please respond directly to mlevin@jade.tufts.edu. Can anyone explain to me why the following short program give the size of the structure as 38 on a Sun 3, and 40 on an Encore Multimax: main() { struct tt { enum {P, PP} a; char b[30]; int c; }; printf("%d\n",sizeof(struct tt)); } I notice that making the size of b[] 32, makes the structure be 40 bytes large on both machines. I imagine thiis is due to the way the machines align fields in structures. I am worried since I am trying to send structures across sockets between en encore and a sun machine. So, can someone enlighten me about the following things: 1) why exactly are the sizes different? 2) what reprecussions does this have on sending the structures across sockets between machines which pack them differently and trying to interpret the fields on each end? 3) and most importantly, how can I avoid this problem? are there some rules (for example, I know that all structures have to be a multiple of 4 bytes large, and I know about compensating for byte order differences, etc.) to be followed in making up structures that will ensure that they are the same on any machine? Thanks in advance. Mike Levin (mlevin@jade.tufts.edu)
boykin@encore.com (Joseph Boykin) (03/21/91)
In article <77336@bu.edu.bu.edu>, jdubb@bucsf.bu.edu (jay dubb) writes: |> |> I am posting this for a friend of mine who doesn't have access |> to USENET, so please respond directly to mlevin@jade.tufts.edu. |> |> Can anyone explain to me why the following short program give the |> size of the structure as 38 on a Sun 3, and 40 on an Encore Multimax: The reason is that sizeof(structure) is not the same in different machine architecutres. If you had a 64-bit int on one machine and 16 bits on another, it would seem obvious that sizeof(struct) would return different values. Another reason is that sizeof() is supposed to return the amount of space used, *plus* whatever is necessary for alignment to a machine boundary. So swapping the 'b' and 'c' members of your structure on the Multimax would still yield the same results. Last, word alignment within the structure may be different for different machines. Alot of CISC architectures can access 32-bit values which are only aligned on a 16-bit boundary, but at a performance penalty. |> struct tt |> { |> enum {P, PP} a; |> char b[30]; |> int c; |> }; In your particular case, the enum and int are (coincidentally) the same on both machines. On the multimax, int's are aligned on 32-bit boundaries hence the character array gets padded. On the SUN 3 (68K) you don't have to align on 32-bit boundaries (although memory access is faster if you do), and apparently the SUN compiler isn't doing the alignment. Hence the problem of transferring structures across machine boundaries. It's reasons like this why Mach IPC (and other systems) types the data on transfer. |> 3) and most importantly, how can I avoid this problem? are there some |> rules (for example, I know that all structures have to be a |> multiple of 4 bytes large, and I know about compensating for byte |> order differences, etc.) to be followed in making up structures |> that will ensure that they are the same on any machine? Since sockets are a simple byte stream, there isn't a way of sending structures between two unknown system architectures and assuring that you will see the same thing on both ends. The simpler case of sending between two known machines can be dealt with by making sure that all structures and structure members are aligned on 32-bit boundaries. Often, there is a compiler switch to do this. Of course, this won't help if you're dealing with a 64-bit machine, but that's probably not a major concern right now! You'll also need to consider byte ordering. ---- Joseph Boykin Manager, Mach OS Development Encore Computer Corp Treasurer, IEEE Computer Society Internet: boykin@encore.com Phone: 508-460-0500 x2720
hoswell@tramp.Colorado.EDU (WARlock) (03/21/91)
In article <77336@bu.edu.bu.edu> jdubb@bucsf.bu.edu (jay dubb) writes: > > Can anyone explain to me why the following short program give the >size of the structure as 38 on a Sun 3, and 40 on an Encore Multimax: > >main() >{ > struct tt > { > enum {P, PP} a; > char b[30]; > int c; > }; > printf("%d\n",sizeof(struct tt)); >} > The difference is that the Sun is more efficient with it's data storage. The encore machine has a 4 byte word size, while the sun has a smaller granularity. Thus, the encore adds the two (unused) bytes... -- || -=> The WARlock <=- | <<<<=- "Dial a cliche" -=>>>> || || hoswell@tramp.Colorado.EDU | || || or hoswell@yoda.hao.ucar.EDU | A cynic knows the price of everything || || Think Clearly! | and the value of nothing. ||
jim@segue.segue.com (Jim Balter) (03/21/91)
In article <77336@bu.edu.bu.edu> jdubb@bucsf.bu.edu (jay dubb) writes: > struct tt > { > enum {P, PP} a; > char b[30]; > int c; > }; >1) why exactly are the sizes different? The size of the struct is the padded size; i.e., the size it will take up as one element of an array. A struct is always aligned on the boundary of its most aligned member. With a compiler that pads ints on multiples of 4, this struct will be aligned on a multiple of 4, and thus will have a size that's a multiple of 4. With a compiler that aligns ints on multiples of 2, the struct will have size 4+30+4. Also, int c will be aligned on a boundary of 2 in one compiler (i.e., no padding) and a boundary of 4 in another (i.e., 2 bytes of padding). And, a compiler is allowed to allocate only one byte for that enum, leading to yet other possible sizes (34, 36) for the struct. And if you have a 16-bit machine, the int would only be two bytes. >2) what reprecussions does this have on sending the structures across > sockets between machines which pack them differently and trying to > interpret the fields on each end? Er, don't do it. >3) and most importantly, how can I avoid this problem? are there some > rules (for example, I know that all structures have to be a > multiple of 4 bytes large, and I know about compensating for byte > order differences, etc.) to be followed in making up structures > that will ensure that they are the same on any machine? Transmit numeric values as a sequence of bytes, low-order byte first, and turn them back into numeric values on the receiving end. Do this with right shifts and ANDs in the sender and left shifts and ORs in the receiver, not unions or casts, so that the code is machine-independent.
guy@auspex.auspex.com (Guy Harris) (03/25/91)
> The difference is that the Sun is more efficient with it's data >storage. The encore machine has a 4 byte word size, while the sun has >a smaller granularity. Not quite. Suns - with the possible exception of the Sun-1s and Sun-2s, depending on how you think of the 68000 and 68010 - *also* have 4-byte word sizes. However, the 68K Suns use a compiler derived from the MIT port of PCC to the 68000; that compiler didn't bother aligning 4-byte quantities on 4-byte boundaries. On the 68000, and on the 68010 used in the Sun-2s, there was no performance benefit to doing so; the 68000 and 68010 fetched 4-byte quantities 2 bytes at a time. On most if not all fully-32-bit machines, there *is* a performance benefit to aligning 4-byte quantities on 4-byte boundaries. On some machines, unaligned 4-byte quantities can be handled directly by the hardware, but at a cost in speed; on others, the hardware can't handle it, and either the software doesn't compensate (so your program blows up) or the software does compensate, again at a cost in speed. The Encore has, if I remember correctly, a National Semiconductor 32K-family chip in it; that family started out (at least in the chips used in UNIX systems) as a fully 32-bit chip, and either there was a performance win for 4-byte alignment of 4-byte quantities, or the people who did compilers for it were planning ahead for machines where there was a performance win. I don't know if the MIT folk chose not to align 4-byte quantities on 4-byte boundaries because it would, in some cases, reduce the storage requirements of data structures, or just because they weren't thinking ahead. Had they done so, there *would* have been a performance win on the 68020 and later 68K-family chips. Sun's 68K compiler continues to align 4-byte structure members only on 2-byte boundaries, for data structure binary compatibility between Sun-2s and Sun-3s and, at this point, for data structure compatibility between different software versions on Sun-3s. (I think many other 68K C compilers do so for similar historical reasons.) The Sun compiler does, however, align 4-byte automatics on 4-byte boundaries, and the compiler and linker may also do so with 4-byte static quantities. (The compilers on SPARC-based and 386-based Suns do align 4-byte quantities, including structure members, on 4-byte boundaries.)
mouse@thunder.mcrcim.mcgill.edu (der Mouse) (03/26/91)
In article <77336@bu.edu.bu.edu>, jdubb@bucsf.bu.edu (jay dubb) writes: > Can anyone explain to me why the following short program give the > size of the structure as 38 on a Sun 3, and 40 on an Encore Multimax: This is a C question, not a UNIX question. I'm crossposting to comp.lang.c and pointing followups there. > main() > { > struct tt > { > enum {P, PP} a; > char b[30]; > int c; > }; > printf("%d\n",sizeof(struct tt)); > } > I notice that making the size of b[] 32, makes the structure be 40 > bytes large on both machines. I imagine thiis is due to the way the > machines align fields in structures. Bingo. > I am worried since I am trying to send structures across sockets > between en encore and a sun machine. You have only two choices. One is to assume that the machines on the ends of the connection are, say, a Sun-3 or an Encore, and carefully write your code so it works in all those cases. This is not a good idea unless you absolutely must get all the speed possible out of it, because the other alternative is much more portable and maintainable. The other choice is to put the structure in some portable form for transmission over the connection. This could mean anything from judicious use of ntohl() and related routines to converting everything to text. This slows you down, of course, but in the vast majority of the cases the gain in portability and maintainability is worth it. > So, can someone enlighten me about the following things: > 1) why exactly are the sizes different? In this case, because the Sun-3 is willing to align the int on a 16-bit boundary while the Encore inserts two bytes of padding to move it to a 32-bit boundary. (I feel sure this is true, though I have no access to an Encore to check. If I have it wrong and you *have* checked, feel free to correct me.) > 2) what reprecussions does this have on sending the structures across > sockets between machines which pack them differently and trying to > interpret the fields on each end? It is only the tip of the iceberg. Basically, you don't want to send structures in raw binary unless you really really don't care about portability (for example, the application should be one you're willing to write in assembly for - as a rough rule of thumb). > 3) and most importantly, how can I avoid this problem? You can't, portably. You have to convert to some sort of interchange format. > are there some rules (for example, I know that all structures have > to be a multiple of 4 bytes large, They don't really; a VAX would be perfectly happy with a 7-byte structure. (I don't recall offhand whether there exists a VAX C compiler that doesn't pad structures.) A Sun-3 is quite happy provided only that everything is a multiple of 16 bits. > and I know about compensating for byte order differences, etc.) to > be followed in making up structures that will ensure that they are > the same on any machine? No. There are just too many weird machines out there. der Mouse old: mcgill-vision!mouse new: mouse@larry.mcrcim.mcgill.edu
jfw@ksr.com (John F. Woods) (04/02/91)
I think the real question ought to be: why, oh WHY, do they let people who don't understand the differences between computers get anywhere NEAR a sharpened compiler?