[comp.lang.c] TC bug in sizeof

dmurdoch@watstat.waterloo.edu (Duncan Murdoch) (02/16/90)

A friend of mine has found something surprising in TC.  Neither of us knows
C well enough to know for sure that this is a bug, but it looks like one.
As illustrated in the program below, if a structure is an odd size, and
is compiled with Word alignment, the sizeof function rounds the size up
one byte.  

Is this a bug?

Sample program:

struct test
  { char a;
    char b;
    char c;
  } structure;

main()
{
  printf("Size of structure= %d\n",sizeof(structure));
}

This prints a 3 if compiled with byte alignment, an 4 if compiled with word
alignment.

Thanks for any help.  Please post to comp.sys.ibm.pc, or email to me, as
you prefer.

Duncan Murdoch

dmurdoch@watstat.waterloo.edu (Duncan Murdoch) (02/16/90)

In article <1519@maytag.waterloo.edu> I asked:
>
>As illustrated in the program below, if a structure is an odd size, and
>is compiled with Word alignment, the sizeof function rounds the size up
>one byte.  
>
>Is this a bug?

Thanks to several people who told me that it isn't a bug, but a necessary
property so that relations like sizeof(array) = sizeof(element)*(number of
elements) hold.  

One more question then:  Originally this came up because my friend wanted
to read fixed (odd) sized records from a file, using 

  while (fread(&structure,sizeof(structure),1,data)!=0)
  { ... }

Obviously sizeof() isn't the right thing to use here.  What's the recommended
way to declare that a structure is "packed", and find out its packed size?

Duncan Murdoch

john@stat.tamu.edu (John S. Price) (02/16/90)

In article <1519@maytag.waterloo.edu> dmurdoch@watstat.waterloo.edu (Duncan Murdoch) writes:
>
>A friend of mine has found something surprising in TC.  Neither of us knows
>C well enough to know for sure that this is a bug, but it looks like one.
>As illustrated in the program below, if a structure is an odd size, and
>is compiled with Word alignment, the sizeof function rounds the size up
>one byte.  
>
>Is this a bug?

No, it's not a bug.  You've basically answered your own question, also.
The fact is, the word alignment makes thing WORD ALIGNED. That is, if you
define your structure

>struct test
>  { char a;
>    char b;
>    char c;
>  } structure;

with word alignment, the sizeof() this IS 4.  A word on a PC is 2 bytes, 
so all structures must fall on 2 byte boundries.  If this structure
was 3 bytes long, then then next object in memory would fall on
a byte boundry, which is wrong.  Think of this:

struct test array[10];

10 consecutive test structures in memory.  If they weren't 4 bytes long,
this array would not work, for you want word alignment.

<char a><char b><char c><char a><char b><char c>
 1 byte  1 byte  1 byte  1 byte  1 byte  1 byte
                        ^ this point right here is on a byte alignment,
                          which is against the word alignment.

In memory this structure would be, if you wanted word alignment:
<char a><char b><char c><one wasted byte><char a><char b><char c>

It's not a bug.

--------------------------------------------------------------------------
John Price                   |   It infuriates me to be wrong
john@stat.tamu.edu           |   when I know I'm right....
--------------------------------------------------------------------------

psrc@pegasus.ATT.COM (Paul S. R. Chisholm) (02/16/90)

< Krasny Oktyabr:  the hunt is on, March 2, 1990 >

In article <1519@maytag.waterloo.edu>, dmurdoch@watstat.waterloo.edu (Duncan Murdoch) writes:
> As illustrated in the program below, if a structure is an odd size,
> and is compiled with Word alignment, the sizeof function rounds the
> size up one byte.

struct mystruct /* changed from "struct test" in Duncan's code */
  { char a;
    char b;
    char c;
  } structure;

> This prints a 3 if compiled with byte alignment, a 4 if compiled with
> word alignment.

Okay, what does "sizeof" mean?  It doesn't just include the data
elements; it also includes any padding, in the middle, or at the end.
If you have an array of mystructs, the space between two elements
(e.g., ( ( (long) & array[ 1 ] ) - ( (long) & array[ 1 ] ) ), that is,
convert each address to a number and subtract) has to include any
padding.

If word alignment is specified, TC will make every mystruct start on a
word boundary.  It can only do that in an array by adding a byte of
padding.  There's no way TC can tell if a mystruct is a member of an
array or not, so it calls the size four.  (Frankly, I'm surprised it
didn't put a pad byte after *each* member, so each member is word
aligned; I'd expected the answer to be six!)

> Duncan Murdoch

Paul S. R. Chisholm, AT&T Bell Laboratories
att!pegasus!psrc, psrc@pegasus.att.com, AT&T Mail !psrchisholm
I'm not speaking for the company, I'm just speaking my mind.

btrue@emdeng.Dayton.NCR.COM (Barry.True) (02/16/90)

In article <1519@maytag.waterloo.edu> dmurdoch@watstat.waterloo.edu (Duncan Murdoch) writes:
>
>[Stuff deleted about asking whether it is an error that sizeof rounds the]
>[following structure up one byte when in word but not alignment.]
>
>Sample program:
>
>struct test
>  { char a;
>    char b;
>    char c;
>  } structure;
>

IMHO (since I don't know that much about TC) the fact that you used word
alignment means that the compiler is going to try to align all multi-byte
data on a word boundary (i.e., four bytes). Since the above structure is
only three bytes long it is padded with an extra byte when stored so the
next data element will be aligned on an even word boundary. We used to have
the same problem with the C compiler on a SYSVR0 system running on an AT&T
3B5 system. I'm not sure if you will run into any problems using the length
return from sizeof(struct test) in something like strncpy() or memcpy() to
initialize the structure or not since you won't be overwriting into another
variable's data space. Any body else know?

doerschu@rex.cs.tulane.edu (David Doerschuk) (02/17/90)

In article <1519@maytag.waterloo.edu> dmurdoch@watstat.waterloo.edu (Duncan Murdoch) writes:
>
>A friend of mine has found something surprising in TC.  Neither of us knows
>C well enough to know for sure that this is a bug, but it looks like one.
>As illustrated in the program below, if a structure is an odd size, and
>is compiled with Word alignment, the sizeof function rounds the size up
>one byte.
>
>Is this a bug?
>
>Sample program:
>
>struct test
>  { char a;
>    char b;
>    char c;
>  } structure;
>
>main()
>{
>  printf("Size of structure= %d\n",sizeof(structure));
>}
>
>This prints a 3 if compiled with byte alignment, an 4 if compiled with word
>alignment.

Duncan, that's not a bug, its a feature! (sorry!)  No, seriously, your C
compiler is working correctly.  The story is this:  You get to choose
whether you want byte or word allignment.  The byte allignment stores
those three chars just as you'd expect, one right after the other, and
then the "next available" space for a second structure of three chars is
immediately after the first.  Word allignment also stores the 3 chars one
right after another, but then leaves a byte of "dead space" in order to
let the *next* structure start on a word boundary (a word, here, being
2 bytes).  The sizeof operation correctly reports (when using Word allign-
ment) that a total of 4 bytes are being "used" by the structure, even
though one of the bytes is dead space.  Use byte allignment if you've
got a lot of structures to store in memory and *need* that extra byte.
(note that the dead-space byte will appear every time you create one
of these structures, so if you've got 10,000 of them in memory, you've
got 10,000 dead-space bytes!)  Use word allignment if you've got
lots of memory, but need execution speed.  The whole idea of word
allignment is that the memory unit can fetch constructs beginning on
a word boundary faster than if they start between word boundaries.

Moral of the story:  Use sizeof() frequently, because things aren't
always what they seem! By the way, sizeof() costs you nothing at
execution time.  It is evaluated at compile time.

Good Luck!
Dave
doerschu@rex.cs.tulane.edu

cs4g6ag@maccs.dcss.mcmaster.ca (Stephen M. Dunn) (02/27/90)

In article <1519@maytag.waterloo.edu> dmurdoch@watstat.waterloo.edu (Duncan Murdoch) writes:
$A friend of mine has found something surprising in TC.  Neither of us knows
$C well enough to know for sure that this is a bug, but it looks like one.
$As illustrated in the program below, if a structure is an odd size, and
$is compiled with Word alignment, the sizeof function rounds the size up
$one byte.  
$Is this a bug?

   Well, it depends on how you define "bug" ... I don't have a copy of
K&R to check exactly what they say about it, but according to their
definition it may be.  However, it does make sense, since when compiled
with word alignment that extra byte is not available for anything
else and is allocated along with the structure.  Since it's reflecting
the way things really work, you could say it's the most accurate way
of doing it.
-- 
Stephen M. Dunn                               cs4g6ag@maccs.dcss.mcmaster.ca
          <std_disclaimer.h> = "\nI'm only an undergraduate!!!\n";
****************************************************************************
               I Think I'm Going Bald - Caress of Steel, Rush

martin@mwtech.UUCP (Martin Weitzel) (02/28/90)

In article <25E9856C.8135@maccs.dcss.mcmaster.ca> cs4g6ag@maccs.dcss.mcmaster.ca (Stephen M. Dunn) writes:
|In article <1519@maytag.waterloo.edu> dmurdoch@watstat.waterloo.edu (Duncan Murdoch) writes:
|$A friend of mine has found something surprising in TC.  Neither of us knows
|$C well enough to know for sure that this is a bug, but it looks like one.
|$As illustrated in the program below, if a structure is an odd size, and
|$is compiled with Word alignment, the sizeof function rounds the size up
|$one byte.  
|$Is this a bug?
|
|   Well, it depends on how you define "bug" ... I don'
|t have a copy of
|K&R to check exactly what they say about it, but according to their
|definition it may be.  However, it does make sense, since when compiled
|with word alignment that extra byte is not available for anything
|else and is allocated along with the structure.  Since it's reflecting
|the way things really work, you could say it's the most accurate way
|of doing it.
|-- 
|Stephen M. Dunn                               cs4g6ag@maccs.dcss.mcmaster.ca
|          <std_disclaimer.h> = "\nI'm only an undergraduate!!!\n";
|****************************************************************************
|               I Think I'm Going Bald - Caress of Steel, Rush

It's common practice to write the following piece of code:

	struc { ....... } x[100];
	int i;

	for (i = 0; i < sizeof x / sizeof x[0]; i++)
		/* do something with x[i] */

To make this work, it must allways be guaranteed for any array,
that you can compute the number of elements by dividing the total
size thrue the size of one element.

BTW: In many of my programs I find the following very handy:

	#define elementsof(z) (sizeof z / sizeof z[0])

which makes the above a little more readable, as you could write:

	for (i = 0; i < elementsof(x); i++)

GENERAL RULE:
	The 'sizeof' every object is the number of bytes,
	this object would occupy as array element.

A not so obvious further aspect is, that a struct must begin on a
boundery, that has the same alignment as the struct-s component
with the most restrictive alignment. If it were not, you would end
up with elements of different size, if you build an array of such
struct-s. (Think a little about this, if you don't believe it.)
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83