[comp.lang.c] Little problem with sizeof on PC

allender@ux1.cso.uiuc.edu (Mark Allender) (04/23/91)

I'm having a litle problem that I have a suspicision about, but want
to clarify.  Here's the situation....

I have a structure that is defined like:

struct header {
	int version[2];
	char unused[40];
	int stuff[8];
	char bogus;
	char mode;
	int time;
	char unused2[90];
	char filler[38];
	char filler2[15]
	float number;
};

The total size of the structure is 201 bytes (count it if you wish....).
Now, I want to read the beginning of a binary file into this structure,
so I do something like this:

	struct header Header;

	if ((readnum = read(fd, (char *)(&Header), sizeof(Header)).....

Things don't seem to get done correctly at this point.  A little investigation
shows that sizeof(Header) return 202, and not 201.  This is clearly not
what I want to do.

Now, I kind of figure that the problem has to do with the way structure
members are lined up in memory.  Am I correct in thinking that since there
are an odd number of bytes in the structure (201), that sizeof(Header)
returns an even number since things have to be word aligned (with a word
being 2 bytes)?  This would seem to make some sense.

In any case, what is the best way around this problem.  Could I do something
like
	if ((readnum = read(fd, (char *)(&Header), sizeof(Header) - 1))....

Seems like kind of a bad way to fix things....

Any help would be appreciated...

-Thanks in advance...

-Mark Allender
-University of Illinois at Urbana/Champaign
-Conversation Builder Project
-allender@cs.uiuc.edu

c60b-1eq@e260-1d.berkeley.edu (Noam Mendelson) (04/23/91)

In article <1991Apr23.022057.29511@ux1.cso.uiuc.edu> allender@ux1.cso.uiuc.edu (Mark Allender) writes:
>I'm having a litle problem that I have a suspicision about, but want
>to clarify.  Here's the situation....
>struct header {
> .........
>};
>Now, I want to read the beginning of a binary file into this structure,
>so I do something like this:
>	struct header Header;
>	if ((readnum = read(fd, (char *)(&Header), sizeof(Header)).....

Hmmm.  That should be:
   if ((readnum = read(fd, (struct header *)(&Header), sizeof(Header)) ...

>Things don't seem to get done correctly at this point.  A little investigation
>shows that sizeof(Header) return 202, and not 201.  This is clearly not
>what I want to do.

Your compiler probably word-aligned the structure (202 is on a word boundary).
Either that or your miscounted (I didn't verify your figure :-)
In any case, I suggest you check your compiler for a possible byte alignment
option (I know Turbo C 2.0 has this), and recompile.

>In any case, what is the best way around this problem.  Could I do something
>like
>	if ((readnum = read(fd, (char *)(&Header), sizeof(Header) - 1))....
>
>Seems like kind of a bad way to fix things....

That would probably work on a PC, assuming the structure was word-aligned.
But it's also dependent on your compiler.
The best thing would be to restructure the data file, using 202 byte blocks
instead of 201 to be on the safe side.  At worst, you're increasing the
size of the data file by about .5%.

-- 
+==========================================================================+
| Noam Mendelson   ..!ucbvax!web!c60b-1eq       | "I haven't lost my mind, |
| c60b-1eq@web.Berkeley.EDU                     |  it's backed up on tape  |
| University of California at Berkeley          |  somewhere."             |

rearl@gnu.ai.mit.edu (Robert Earl) (04/23/91)

In article <1991Apr23.050747.19705@agate.berkeley.edu> c60b-1eq@e260-1d.berkeley.edu (Noam Mendelson) writes:

|   In article <1991Apr23.022057.29511@ux1.cso.uiuc.edu> allender@ux1.cso.uiuc.edu (Mark Allender) writes:
|
|   >Now, I want to read the beginning of a binary file into this structure,
|   >so I do something like this:
|   >	struct header Header;
|   >	if ((readnum = read(fd, (char *)(&Header), sizeof(Header)).....
|
|   Hmmm.  That should be:
|      if ((readnum = read(fd, (struct header *)(&Header), sizeof(Header)) ...

Nope, (char *) &Header is right, because "&Header" is already a
"struct header *" and read() expects a "char *" argument.

|   The best thing would be to restructure the data file, using 202 byte blocks
|   instead of 201 to be on the safe side.  At worst, you're increasing the
|   size of the data file by about .5%.

If it were just a data file, the best thing would be to make it a
plain text file and use fscanf() or fgets(), but he appears to want
only the header bit of it, so he should just be very careful to note
this binary representation could break under another operating
system/CPU/phase of the moon.

--robert

jap@convex.cl.msu.edu (Joe Porkka) (04/23/91)

allender@ux1.cso.uiuc.edu (Mark Allender) writes:

>I'm having a litle problem that I have a suspicision about, but want
>to clarify.  Here's the situation....

>I have a structure that is defined like:

>struct header {
>	int version[2];
>	char unused[40];
>	int stuff[8];
>	char bogus;
>	char mode;
>	int time;
>	char unused2[90];
>	char filler[38];
>	char filler2[15]
^^^^^^^^^^^^^^^^^^^^^^^^^
>	float number;
>};

>The total size of the structure is 201 bytes (count it if you wish....).
>Now, I want to read the beginning of a binary file into this structure,
>so I do something like this:

The compiler probly is inserting the pad byte after filler2, to
bring "number" to an even address.

If you really want to have a file with this info in bnary
form, then you will need to write the while 202 bytes, and read the
while 202 bytes, including the pad byte imbedded within the
structure (or do it member by member).

If you want your programs data file to be at all portable, then
you will have to use something the the XDR (external data representation)
library, or write the data in ASCII form.

allender@cso.uiuc.edu (Mark Allender) (04/23/91)

I have a structure that is defined like:

struct header {
	int version[2];
	char unused[40];
	int stuff[8];
	char bogus;
	char mode;
	int time;
	char unused2[90];
	char filler[38];
	char filler2[15]
	float number;
};

Just a followup to me problem with the word alignment in this structure..
The data that I am reading in someone else's form, and has to stay this way.
I can't change it.  And indeed, I would rather have the data in raw binary
form.  There is a filler byte inserted after the filler2[15] declaration.

The best suggestion around this was to read the entire 201 bytes into
a char buffer[201] array, and then memcpy the elements into their corresponding
location.

Thanks to all who responded.....

-- 
-Mark Allender
-University of Illinois at Urbana/Champaign
-Conversation Builder Project
-allender@cs.uiuc.edu

c60b-1eq@e260-1g.berkeley.edu (Noam Mendelson) (04/24/91)

In article <REARL.91Apr23040230@nutrimat.gnu.ai.mit.edu> rearl@gnu.ai.mit.edu (Robert Earl) writes:
>In article <1991Apr23.050747.19705@agate.berkeley.edu> c60b-1eq@e260-1d.berkeley.edu (Noam Mendelson) writes:
>|   In article <1991Apr23.022057.29511@ux1.cso.uiuc.edu> allender@ux1.cso.uiuc.edu (Mark Allender) writes:
>|   The best thing would be to restructure the data file, using 202 byte blocks
>|   instead of 201 to be on the safe side.  At worst, you're increasing the
>|   size of the data file by about .5%.
>If it were just a data file, the best thing would be to make it a
>plain text file and use fscanf() or fgets(), but he appears to want
>only the header bit of it, so he should just be very careful to note
>this binary representation could break under another operating
>system/CPU/phase of the moon.

If you want to take up disk space unnecessarily and decrease program
performance, sure, you can create ASCII data files.  Portability will
be limited to the Intel 80x86 line, however, if you opt to use the
binary method.

-- 
+==========================================================================+
| Noam Mendelson   ..!ucbvax!web!c60b-1eq       | "I haven't lost my mind, |
| c60b-1eq@web.Berkeley.EDU                     |  it's backed up on tape  |
| University of California at Berkeley          |  somewhere."             |

c60b-1eq@e260-1g.berkeley.edu (Noam Mendelson) (04/24/91)

In article <1991Apr23.155042.5532@ux1.cso.uiuc.edu> allender@cso.uiuc.edu (Mark Allender) writes:
>Just a followup to me problem with the word alignment in this structure..
>The data that I am reading in someone else's form, and has to stay this way.
>I can't change it.  And indeed, I would rather have the data in raw binary
>form.  There is a filler byte inserted after the filler2[15] declaration.
>The best suggestion around this was to read the entire 201 bytes into
>a char buffer[201] array, and then memcpy the elements into their corresponding
>location.

How about reading the struct in two steps, first reading everything up to
the end of filler2[] (not including the byte padding), then reading the 
rest into the float number?
That would be much more efficient (memory- and speed-wise) than that
e-mail suggestion.

-- 
+==========================================================================+
| Noam Mendelson   ..!ucbvax!web!c60b-1eq       | "I haven't lost my mind, |
| c60b-1eq@web.Berkeley.EDU                     |  it's backed up on tape  |
| University of California at Berkeley          |  somewhere."             |

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (04/24/91)

In article <1991Apr23.050747.19705@agate.berkeley.edu>, c60b-1eq@e260-1d.berkeley.edu (Noam Mendelson) writes:
> >Now, I want to read the beginning of a binary file into this structure,

> >Things don't seem to get done correctly at this point.  A little investigation
> >shows that sizeof(Header) return 202, and not 201.  This is clearly not
> >what I want to do.

> Your compiler probably word-aligned the structure (202 is on a word boundary).

It's not just that.  The compiler may well have placed *elements* of the
structure at "natural alignment boundaries" as well.  Assuming 1 byte
chars (8-bit aligned), 2 byte integers (16-bit aligned), and 4-byte
floats (16-bit aligned), I would not be surprised to find a "filler" byte
inserted *before* the last member.

> >In any case, what is the best way around this problem.  Could I do something
> >like
> >	if ((readnum = read(fd, (char *)(&Header), sizeof(Header) - 1))....

> That would probably work on a PC, assuming the structure was word-aligned.

It the compiler has inserted padding *within* the structure (e.g. in order
to align the float member on a 16-bit boundary) then it certainly _won't_
work.

If you want to save C structures to a file and read them back,
you should be using fwrite() and fread() on a FILE* opened in binary mode.
If you want to read records written by some other language, you CAN'T
rely on C structs.  You just do not have enough control over the storage
layout of a struct to do it.  You will have to write a function that
reads values from the stream a chunk at a time and stores them into the
struct.

-- 
Bad things happen periodically, and they're going to happen to somebody.
Why not you?					-- John Allen Paulos.

darcy@druid.uucp (D'Arcy J.M. Cain) (04/24/91)

In article <1991Apr23.022057.29511@ux1.cso.uiuc.edu> Mark Allender writes:
>struct header {
>	int version[2];
>	char unused[40];
>	int stuff[8];
>	char bogus;
>	char mode;
>	int time;
>	char unused2[90];
>	char filler[38];
>	char filler2[15]
>	float number;
>};
>
>The total size of the structure is 201 bytes (count it if you wish....).
>Now, I want to read the beginning of a binary file into this structure,

I have already seen some responses (One completely bogus - nettor beware)
but I get the impression that what you have here is the header from an
existing file format that you have no control over.  If this is the case
then you will have to read this piece by piece.  The main culprit is the
filler2 element.  This will probably cause a misalignment on most systems.
If you expect portability you will also have problems with the declarations
of version and time as they can be different sizes on different systems.
If you use short and long in these situations your code will go further
before it blows up but you still have no guarantees.

Depending on the project I have two ways of dealing with this problem.
One way is to write a separate module with routines to read and write
the various types of items from the file or simply to read the entire
header into the structure.  The advantage to this method is speed in
that you can write separate modules for different systems tuning it to
the natural byte order etc.  Since in most case speed isn't all that
important (you suggest you only do this operation once) the best way
might be to write the code in a system independent way.  This simplifies
moving to other platforms.  An example:
    version[0] = getc(fp) << 16;
    version[0] += getc(fp);

Slow and bulky but will work on any platform.  Note the assumption about
the byte order in the file but none about the byte order on the machine.

>In any case, what is the best way around this problem.  Could I do something
>like
>	if ((readnum = read(fd, (char *)(&Header), sizeof(Header) - 1))....

I doubt this will work since the padding is probably not at the end of
the structure.  You will probably get the first byte of number in
filler2 and a completely bogus value in number.  On some systems it
would get worse.

-- 
D'Arcy J.M. Cain (darcy@druid)     |
D'Arcy Cain Consulting             |   There's no government
Toronto, Ontario, Canada           |   like no government!
+1 416 424 2871                    |

enag@ifi.uio.no (Erik Naggum) (04/24/91)

In article <1991Apr24.031700.17233@agate.berkeley.edu>, Noam Mendelson writes:

   If you want to take up disk space unnecessarily and decrease
   program performance, sure, you can create ASCII data files.
   Portability will be limited to the Intel 80x86 line, however, if
   you opt to use the binary method.

If you want to take up debugging time unnecessarily and decrease
programmer performance, sure, you can create binary data files.
Portability and debug-ability will be be unlimited, however, if you
opt to use the readable data file method.

--
[Erik Naggum]					     <enag@ifi.uio.no>
Naggum Software, Oslo, Norway			   <erik@naggum.uu.no>

c60b-1eq@web-4e.berkeley.edu (Noam Mendelson) (04/25/91)

In article <ENAG.91Apr24175955@maud.ifi.uio.no> enag@ifi.uio.no (Erik Naggum) writes:
>In article <1991Apr24.031700.17233@agate.berkeley.edu>, Noam Mendelson writes:
>
>   If you want to take up disk space unnecessarily and decrease
>   program performance, sure, you can create ASCII data files.
>   Portability will be limited to the Intel 80x86 line, however, if
>   you opt to use the binary method.
>
>If you want to take up debugging time unnecessarily and decrease
>programmer performance, sure, you can create binary data files.

Decrease programmer performance?  Read()'ing in the data structure
causes a lot less headache than trying to fscanf() in the data
from an ASCII data file.

>Portability and debug-ability will be be unlimited, however, if you
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>opt to use the readable data file method.

So I assume you've never had any problems doing I/O on ASCII files.
ASCII data files produce a whole new set of problems that don't exist
with the binary format, ranging from the fscanf() syntax to the
record size.

Portability will be dramatically increased, although not unlimited.

-- 
+==========================================================================+
| Noam Mendelson   ..!ucbvax!web!c60b-1eq       | "I haven't lost my mind, |
| c60b-1eq@web.Berkeley.EDU                     |  it's backed up on tape  |
| University of California at Berkeley          |  somewhere."             |

grimlok@hubcap.clemson.edu (Mike Percy) (04/25/91)

c60b-1eq@web-4e.berkeley.edu (Noam Mendelson) writes:

>In article <ENAG.91Apr24175955@maud.ifi.uio.no> enag@ifi.uio.no (Erik Naggum) writes:
>>In article <1991Apr24.031700.17233@agate.berkeley.edu>, Noam Mendelson writes:
>>
>>   If you want to take up disk space unnecessarily and decrease
>>   program performance, sure, you can create ASCII data files.
>>   Portability will be limited to the Intel 80x86 line, however, if
>>   you opt to use the binary method.
>>
>>If you want to take up debugging time unnecessarily and decrease
>>programmer performance, sure, you can create binary data files.
 
If I dump structs out to disk and read them back in later, it is a lot
less error prone than coding a bunch of fscanf() calls (which have their
own problems).


>Decrease programmer performance?  Read()'ing in the data structure
>causes a lot less headache than trying to fscanf() in the data
>from an ASCII data file.
 
Unless, of course, you need to try to figure out what's wrong with the
program by examining the data files. But I assume that when I write a
struct to a binary file, I can read it back into the struct.  If not,
then my compiler has done me serious wrong... 

>>Portability and debug-ability will be be unlimited, however, if you
>                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>opt to use the readable data file method.

>So I assume you've never had any problems doing I/O on ASCII files.
>ASCII data files produce a whole new set of problems that don't exist
>with the binary format, ranging from the fscanf() syntax to the
>record size.
 
I sure have.

>Portability will be dramatically increased, although not unlimited.
 
I suppose protability of the _program_ would be unaffected either way.
The _data_files_, on the other hand, would tend to have problems.

"I don't know about your brain, but mine is really...bossy."
Mike Percy                    grimlok@hubcap.clemson.edu
ISD, Clemson University       mspercy@clemson.BITNET
(803)656-3780                 mspercy@clemson.clemson.edu