[comp.lang.c] foo.text[0] Was: Auto variable with sizeof == 0

greg@utcsri.UUCP (Gregory Smith) (02/17/87)

In article <626@vu-vlsi.UUCP> colin@vu-vlsi.UUCP (Colin Kelley) writes:
>In article <159@batcomputer.tn.cornell.edu> braner@batcomputer.UUCP (braner) writes:
>>
>>In the famous "microEMACS" by David Conroy, which has been widely
>>utilized and modified, the basic text-line structure looks like this:
>>
>>typedef struct LINE {
>>	struct LINE *nextline;
>>	struct LINE *prevline;
>>	short       size;		/* s.b. int! */
>>	short       used;
>>	char        text[];		/* !!!!!!!!! */
>>}	LINE;

>Some other people suggested declaring the text field to be char *text, but
>I'm surprised no one suggested this:
>
>Declare the text field to be char text[1], then use
>
>	lineptr = malloc(sizeof(LINE)-1+length);
>
>Almost all compilers will optimize sizeof(LINE)-1 into a single constant, so
>the code generated is likely to be exactly the same as that generated for
>the uEmacs example above...[Of course you can cast the argument to (unsigned)
>to keep lint happy.]

Well, not quite.... the offset of 'text' within the structure is *not*
equal to sizeof(LINE)-1, so the above call to malloc is asking for N
bytes too many, where N is 3 on a vax and 1 on a 68K or PDP-11.

The problem is that the struct will be padded out after the one-byte
'text' array to meet alignment requirements for the pointer fields.
'sizeof(LINE)' includes this padding. This can be fixed by placing a
substruct around everything but the 'text[1]' declaration, and taking
the size of that struct instead of sizeof(LINE)-1. Or, declare a dummy
struct with the same declarations just to get its size (and *comment
heavily* or someone will change one and not the other). The use of a
dummy struct instead of a substruct would allow you to leave the struct
references unchanged. Both of these methods may fail if 'text' is
an array of things other than chars - i.e. if padding is required
to align 'text' on a more strict boundary than that required by any
previous field in the struct. Nobody said it was easy :-).

Alternately, change the declaration to text[4] and the malloc call to
sizeof(LINE)-4. This is still a little dodgy - the method in the
preceding paragraph is better.
>
>Gnuplot (which we posted a couple weeks ago) uses this technique because it
>seemed to be the most portable...

Feel like fixing it?
>
>	-Colin Kelley  ..{cbmvax,pyrnj,bpa}!vu-vlsi!colin

-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...

mwm@eris.UUCP (02/20/87)

On the subject of dealing with structure with variable sized text
arrays, <4157@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes:

>The problem is that the struct will be padded out after the one-byte
>'text' array to meet alignment requirements for the pointer fields.
>'sizeof(LINE)' includes this padding. This can be fixed by placing a
>substruct around everything but the 'text[1]' declaration, and taking
>the size of that struct instead of sizeof(LINE)-1. Or, declare a dummy
>struct with the same declarations just to get its size (and *comment
>heavily* or someone will change one and not the other). The use of a
>dummy struct instead of a substruct would allow you to leave the struct
>references unchanged. Both of these methods may fail if 'text' is
>an array of things other than chars - i.e. if padding is required
>to align 'text' on a more strict boundary than that required by any
>previous field in the struct. Nobody said it was easy :-).

I like this. Especially if you get the C preprocessor to help you like so:

#define	FUNKY_HEADER	struct LINE	*nextline;\
			struct LINE	*prevline;\
			short		size;\
			short		used;	/* Note semicolon! */

struct funky_dummmy_for_size {
	FUNKY_HEADER
	} ;

struct real_thing {
	FUNKY_HEADER
	char text[1];
	} ;

#define	get_new_real_thing(len) \
	((struct real_thing *) malloc(sizeof(funky_dummy_for_size) + len)) ;

This takes care of the problem of keeping the two versions in synch.
I'm going to look at it for the next version of mg. Of course, someone
will probably point at a good reason why this is broken before then,
anyway. :-)

	<mike





But I'll survive, no you won't catch me,		Mike Meyer
I'll resist the urge that is tempting me,		ucbvax!mwm
I'll avert my eyes, keep you off my knee,		mwm@berkeley.edu
But it feels so good when you talk to me.		mwm@ucbjade.BITNET

colin@vu-vlsi.UUCP (02/21/87)

In article <4157@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes:
>In article <626@vu-vlsi.UUCP> colin@vu-vlsi.UUCP (Colin Kelley) writes:
>>
>>Declare the text field to be char text[1], then use
>>
>>	lineptr = malloc(sizeof(LINE)-1+length);
>>
>Well, not quite.... the offset of 'text' within the structure is *not*
>equal to sizeof(LINE)-1, so the above call to malloc is asking for N
>bytes too many, where N is 3 on a vax and 1 on a 68K or PDP-11.
>
> [suggestion to wrap the whole field in a dummy struct]
>
>Alternately, change the declaration to text[4] and the malloc call to
>sizeof(LINE)-4.

For that matter, just use 8, or 16, or 1024!  The actual constant should be
#defined of course...

But I contend that it isn't necessary to go to all this work to try to rescue
those 1-3 wasted bytes!  Even if the memory is that important to you, malloc()
will probably round the number of bytes up anyway, right?  I'm pretty sure that
any implementation of malloc() on machine which requires such padding bytes
will certainly round up to a multiple of this padding size (e.g. malloc() on
a VAX will round the request up to the nearest multiple of N, where N may be
as small as 4, but is likely to be a lot larger).

>>Gnuplot (which we posted a couple weeks ago) uses this technique because it
>>seemed to be the most portable...
>Feel like fixing it?

Now hold on here!  You've demonstrated that my program MAY waste several bytes
on its malloc() calls (and I contend that it probably won't).  How does this
make it broken?  [It may well be broken in some other respect, but that's
another story!]

	-Colin Kelley  ..{cbmvax,pyrnj,bpa}!vu-vlsi!colin

greg@utcsri.UUCP (Gregory Smith) (02/24/87)

In article <632@vu-vlsi.UUCP> colin@vu-vlsi.UUCP (Colin Kelley) writes:
>In article <4157@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes:
>>Well, not quite.... the offset of 'text' within the structure is *not*
>>equal to sizeof(LINE)-1, so the above call to malloc is asking for N
>>bytes too many, where N is 3 on a vax and 1 on a 68K or PDP-11.

[...]

>But I contend that it isn't necessary to go to all this work to try to rescue
>those 1-3 wasted bytes!  Even if the memory is that important to you, malloc()
>will probably round the number of bytes up anyway, right?  I'm pretty sure that
>any implementation of malloc() on machine which requires such padding bytes
>will certainly round up to a multiple of this padding size (e.g. malloc() on
>a VAX will round the request up to the nearest multiple of M, where M may be
>as small as 4, but is likely to be a lot larger).

Firstly, the average number of wasted bytes will be unaffected by the
rounding process. (Assuming uniform distribution of size requests over
an interval rather larger than the rounding period etc etc). If you
ask for 81 instead of 80 and it is rounded up to the next 16, you get 96
instead of 80.

I suggested this more for the sake of correctness than for the 3 bytes.
You need n bytes, and you write a simple expression that apparently asks
for n bytes, but actually *asks for* n+N bytes. The fact that N is never
large or negative is not really sufficient justification. At least, this
discrepancy should be documented in the code so that those who maintain
after you will not be misled.

Did you realize that this was happening when you wrote the code? If not,
I contend that you made a design error and got away with it. If you did
realize it, fine, but the weirdness of the effect demands a comment in
the code.

>>>Gnuplot (which we posted a couple weeks ago) uses this technique because it
>>>seemed to be the most portable...
>>Feel like fixing it?
>
>Now hold on here!  You've demonstrated that my program MAY waste several bytes
>on its malloc() calls (and I contend that it probably won't).  How does this
>make it broken?  [It may well be broken in some other respect, but that's
>another story!]

Well, I meant to imply through the phrasing of that remark that I didn't
consider it a big deal. But I think we have each spent more time talking
about it than it would take to 'fix' it.
>
>	-Colin Kelley  ..{cbmvax,pyrnj,bpa}!vu-vlsi!colin

-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...

meissner@dg_rtp.UUCP (02/27/87)

In article <4157@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes:
>In article <159@batcomputer.tn.cornell.edu> braner@batcomputer.UUCP (braner) writes:
> >
> >typedef struct LINE {
> >	struct LINE *nextline;
> >	struct LINE *prevline;
> >	short       size;		/* s.b. int! */
> >	short       used;
> >	char        text[];		/* !!!!!!!!! */
> >}	LINE;
> 
> .... the offset of 'text' within the structure is *not*
> equal to sizeof(LINE)-1, so the above call to malloc is asking for N
> bytes too many, where N is 3 on a vax and 1 on a 68K or PDP-11.
> 
> The problem is that the struct will be padded out after the one-byte
> 'text' array to meet alignment requirements for the pointer fields.
> 'sizeof(LINE)' includes this padding. 

	yea, and malloc will also add anywhere of 0-64 extra bytes in the
memory allocation.  Extreme micro optimizations like this can backfire.

(by micro optimizations, I don't mean optimizations for memory limited
micro's, but extreme efforts here and there to save a byte or a milli-
second) while ignoring the more stubstanal gains to be made by looking
at the program as a whole for improvements.
-- 
	Michael Meissner, Data General	Uucp: ...mcnc!rti-sel!dg_rtp!meissner

It is 11pm, do you know what your sendmail and uucico are doing?