[comp.lang.c] Auto variable with sizeof == 0

escott%deis.uci.edu@icsg.uci.edu (01/30/87)

Somebody recently came to me with a program that worked on a VAX 11/750
running 4.2BSD but failed on our Sequent Balance 21000 running Dynix 2.1.
The apparent culprit was of course the C compiler on the latter machine.
However, after examining the code in question, I found a construct that
seems a little strange to me: an automatic variable was declared as a
"struct foo **bar[]".  "How could this be right?"  I said to myself.  "How
can you declare an automatic variable that has no size?"

So I wrote a program that contained a similar declaration, and then tried to
take sizeof( bar ).  Sure enough:

	warning: sizeof returns 0

[This from the VAX 11/750 4.2BSD compiler]

Okay, that makes sense.  My question is: is there any reason why you should
be able to declare an array with zero elements as an automatic variable?
What's strange is that, on the VAX, the program apparently successfully
dereferenced bar, both setting a value for "*bar" and then using that value
later.  How can this be right?  How can "bar" have any value at all, much
less "*bar"?  If there is no use for a zero-sized automatic variable, how
come the compiler lets you do it?  (Even a C compiler should occasionally
clamp down 8^).  And, just for the heck of asking, does ANSI C let you make
such a declaration?

+-------------------------------------------------------------------------+
 Scott Menter  UCI ICS Computing Support Group   Univ. of Calif. at Irvine
                     (714) 856 7552              Irvine, California  92717

 Internet:  escott@ics.uci.edu             UUCP:  ...!ucbvax!ucivax!escott
 Bitnet:    escott@uci               CSNet: escott%ics.uci.edu@csnet-relay
 Internet (with Name Server):  TBA
+-------------------------------------------------------------------------+

chris@mimsy.UUCP (Chris Torek) (02/02/87)

In article <4114@brl-adm.ARPA> escott%deis.uci.edu@icsg.uci.edu
(Scott Menter) writes:
>... is there any reason why you should be able to declare an array
>with zero elements as an automatic variable?

Why not?  It makes sense.  Perhaps it should elicit a warning, since
no members of that array are accessible:  Valid subscripts are in the
range [0..0).

>What's strange is that, on the VAX, the program apparently successfully
>dereferenced bar, both setting a value for "*bar" and then using that value
>later.  How can this be right?

Just luck.

>And, just for the heck of asking, does ANSI C let you make such a
>declaration?

There seems to be a great debate over malloc(0), with some support
as well for empty arrays.  It is trivial to allow either, or to
disallow either; some argue in favour of `catching the programmer's
mistakes for him', while others argue that the construct may not
be a mistake, or may have been written by a machine, and that having
special cases for zero is both unnecessary and ugly.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

pinkas@mipos3.UUCP (02/02/87)

In article <4114@brl-adm.ARPA> escott%deis.uci.edu@icsg.uci.edu (Scott Menter) writes:
>				... I found a construct that
>seems a little strange to me: an automatic variable was declared as a
>"struct foo **bar[]".  "How could this be right?"  I said to myself.  "How
>can you declare an automatic variable that has no size?"
>
>So I wrote a program that contained a similar declaration, and then tried to
>take sizeof( bar ).  Sure enough:
>
>	warning: sizeof returns 0
>
>[This from the VAX 11/750 4.2BSD compiler]
>
>Okay, that makes sense.  My question is: is there any reason why you should
>be able to declare an array with zero elements as an automatic variable?
>What's strange is that, on the VAX, the program apparently successfully
>dereferenced bar, both setting a value for "*bar" and then using that value
>later.  How can this be right?  How can "bar" have any value at all, much
>less "*bar"?  If there is no use for a zero-sized automatic variable, how
>come the compiler lets you do it?

I don't see the problem with this declaration.  bar is declared to be an
array of pointers to pointers to struct foo.  That is, **(bar[0]) is of
type foo.  bar initially has no memory allocated to it.  This type of
construct appears to be a dynamic array, where malloc will be called to get
some memory.  Since the array is declared to have zero elements, sizeof
will return zero.  (Remember that sizeof(array) =~ sizeof(element of array)
times number of elements.  This is approximate because C allows a compiler
to pack arrays.)  So in your case, the compiler was correct in warning you
that bar was of size zero (taking sizeof a zero sized element is not very
useful as the most common uses for sizeof are malloc and pointer arithmatic
when something cast the pointer to a different type).  You should inspect
the code, but if it worked on one machine, it should work on another.  It
could be that they really wanted to say sizeof(foo), in something like:

	bar = malloc(sizeof(struct foo) * 100)

which would allocate 100 elements to the array bar, making it the
equivalent of the auto declaration struct foo **bar[100].

-Israel
-- 
----------------------------------------------------------------------
UUCP:	{amdcad,decwrl,hplabs,oliveb,pur-ee,qantel}!intelca!mipos3!pinkas
ARPA:	pinkas%mipos3.intel.com@relay.cs.net
CSNET:	pinkas%mipos3.intel.com

tim@amdcad.UUCP (02/03/87)

In article <397@mipos3.UUCP> pinkas@mipos3.UUCP (Israel Pinkas) writes:
>In article <4114@brl-adm.ARPA> escott%deis.uci.edu@icsg.uci.edu (Scott Menter) writes:
>>				... I found a construct that
>>seems a little strange to me: an automatic variable was declared as a
>>"struct foo **bar[]".  "How could this be right?"  I said to myself.  "How
>>can you declare an automatic variable that has no size?"

It isn't right!

>
>I don't see the problem with this declaration.  bar is declared to be an
>array of pointers to pointers to struct foo.  That is, **(bar[0]) is of
>type foo.  bar initially has no memory allocated to it.  This type of
>construct appears to be a dynamic array, where malloc will be called to get
>some memory.  Since the array is declared to have zero elements, sizeof
>will return zero.  (Remember that sizeof(array) =~ sizeof(element of array)
>times number of elements.  This is approximate because C allows a compiler
>to pack arrays.)  So in your case, the compiler was correct in warning you
>that bar was of size zero (taking sizeof a zero sized element is not very
>useful as the most common uses for sizeof are malloc and pointer arithmatic
>when something cast the pointer to a different type).  You should inspect
>the code, but if it worked on one machine, it should work on another.  It
>could be that they really wanted to say sizeof(foo), in something like:
>
>	bar = malloc(sizeof(struct foo) * 100)
>

	^^ Won't work; bar is a *constant* (see pp 94, 95 of K&R)

>which would allocate 100 elements to the array bar, making it the
>equivalent of the auto declaration struct foo **bar[100].

There are only 3 places where an array declaration is not required to
declare a size between the brackets []:

	1:	an extern array		-->	extern int foo[];
	2:	an initialized array	-->	int foo[] = {1,2,3};
	3:	an array parameter	-->	foo(bar)
						int bar[];


	Tim Olson
	Advanced Micro Devices

greg@utcsri.UUCP (Gregory Smith) (02/03/87)

In article <4114@brl-adm.ARPA> escott%deis.uci.edu@icsg.uci.edu writes:
>Somebody recently came to me with a program that worked on a VAX 11/750
>running 4.2BSD but failed on our Sequent Balance 21000 running Dynix 2.1.
> [...] after examining the code in question, I found a construct that
>seems a little strange to me: an automatic variable was declared as a
>"struct foo **bar[]".  "How could this be right?"  I said to myself.  "How
>can you declare an automatic variable that has no size?"
>
>So I wrote a program that contained a similar declaration, and then tried to
>take sizeof( bar ).  Sure enough:
>	warning: sizeof returns 0
>[This from the VAX 11/750 4.2BSD compiler]
>
>Okay, that makes sense.  My question is: is there any reason why you should
>be able to declare an array with zero elements as an automatic variable?
>What's strange is that, on the VAX, the program apparently successfully
>dereferenced bar, both setting a value for "*bar" and then using that value
>later.  How can this be right?  How can "bar" have any value at all, much
>less "*bar"?  If there is no use for a zero-sized automatic variable, how
>come the compiler lets you do it?

Well, *bar is just bar[0], and is of type (struct foo **). There is
indeed no storage reserved for this array element. It is like declaring
'char blat[6]' and then setting "blat[6]='?';". blat[0] through blat[5]
exist, and blat[6] is simply the next char after blat[5]. The C language
does not guarantee what that is. On the vax compiler the 'bar'
declaration reserves zero words for the array 'bar', and then bar[0] is
the word *after* that zero-length array. 

Despite having no length, the array still has an address, and bar[0] is
effectively stored at this same address. Since the array occupies zero
memory, another variable may start in the same place, and bar[0] will
reference the memory occupied by this other variable.

The goof who wrote it probably knew that, with the VAX compiler, setting
*bar would actually set the next declared auto (a little knowledge is a
dangerous thing).  I.e. if it looks like this:
{
	struct foo **bar[];
	char *ptr;
	...
Then ptr and *bar are stored in the same place. Of course this is
non-portable as you have found. Presumably the code using *bar depends
on this shared storage. A quick and dirty fix (fight nasty with nasty?) is

#define bar (struct foo ***)&ptr

instead of the declaration for bar. This will achieve the same effect and
is considerably more portable. It still isn't really correct; the shared
storage should be done either by use of a union, or by casting between the
stored type and the 'struct foo **' type. The choice depends on what is
actually being done with this pointer.

"No one ever said it was going to be easy...."
-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...

chris@mimsy.UUCP (Chris Torek) (02/04/87)

>>>"struct foo **bar[]".

>In article <397@mipos3.UUCP> pinkas@mipos3.UUCP (Israel Pinkas) writes:
>>I don't see the problem with this declaration. ...
>>
>>	bar = malloc(sizeof(struct foo) * 100)

In article <14575@amdcad.UUCP> tim@amdcad.UUCP (Tim Olson) writes:
>	^^ Won't work; bar is a *constant* (see pp 94, 95 of K&R)

To be picky, bar is neither a constant nor a variable.  Its value
is set at entry to the function, and cannot be changed within that
function invocation---not without cheating: all automatic variables
are really just names for stack frame offsets, so altering the stack
or frame pointer shuffles all the variables.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

pinkas@mipos3.UUCP (02/04/87)

In article <5258@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <4114@brl-adm.ARPA> escott%deis.uci.edu@icsg.uci.edu
>(Scott Menter) writes:
>>... is there any reason why you should be able to declare an array
>>with zero elements as an automatic variable?
>
>Why not?  It makes sense.  Perhaps it should elicit a warning, since
>no members of that array are accessible:  Valid subscripts are in the
>range [0..0).

Wrong.  There are no valid subscripts to the array.  To allow a subscript
of 0, the array must be declared as bar[1].  Remember, the valid subscripts
of an array declared foo[n] are [0..n1].

Regarding this problem, in a former posting I mentioned that the
declaration of struct foo **bar[] as an auto variable would be useful as a
dynamic array.  I have since been corrected.  Someone (I deleted the mail
message, so I don't have your name, sorry) pointed out that K&R stat that
an array is a constant and is thus unusable as an lvalue.  To make a
dynamic array, the declaration should read struct foo ***bar.  When
malloc'ed, it will yield an array of pointers to pointers to struct foo.

-Israel
-- 
----------------------------------------------------------------------
UUCP:	{amdcad,decwrl,hplabs,oliveb,pur-ee,qantel}!intelca!mipos3!pinkas
ARPA:	pinkas%mipos3.intel.com@relay.cs.net
CSNET:	pinkas%mipos3.intel.com

chris@mimsy.UUCP (02/05/87)

>In article <5258@mimsy.UUCP> I wrote:
>>[for automatic array declarations of the form `int a[];' valid subscripts
>>are in the range [0..0).

In article <409@mipos3.UUCP> pinkas@mipos3.UUCP (Israel Pinkas) writes:
>Wrong.

Not so.

>There are no valid subscripts to the array.

That is what I said.  The valid subscripts are in the range [0..0).
Perhaps you would prefer the [0..0[ form?  (I always thought that
form particularly vile.)  The notation [0..0) means the half-open
interval between zero and zero, i.e., the null set.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

throopw@dg_rtp.UUCP (02/07/87)

> chris@mimsy.UUCP (Chris Torek)
>> escott%deis.uci.edu@icsg.uci.edu (Scott Menter)

>>... is there any reason why you should be able to declare an array
>>with zero elements as an automatic variable?
> Why not?  It makes sense.  [...]
> some argue in favour of `catching the programmer's
> mistakes for him', while others argue that the construct may not
> be a mistake, or may have been written by a machine, and that having
> special cases for zero is both unnecessary and ugly.

This is sensible, I agree.  But it is worth noting that

        int foo[0];
and
        int foo[];

are NOT the same thing.  Even if X3J11 were to take the reasonable
approach and allow the first as an automatic declaration, the second
should still be an error as an automatic declaration.

--
IBM manuals are written by little old ladies in Poughkeepsie who are
instructed to say nothing specific.
                                --- R. T. Lillington
-- 
Wayne Throop      <the-known-world>!mcnc!rti-sel!dg_rtp!throopw

greg@utcsri.UUCP (Gregory Smith) (02/10/87)

In article <397@mipos3.UUCP> pinkas@mipos3.UUCP (Israel Pinkas) writes:
> [...]  (Remember that sizeof(array) =~ sizeof(element of array)
>times number of elements.  This is approximate because C allows a compiler
>to pack arrays.)

The relationship is exact: sizeof(array)==sizeof( array[0] )*(# of elements).
An array may not be packed in a way which makes this relationship inexact.

-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...

braner@batcomputer.UUCP (02/11/87)

[]

In the famous "microEMACS" by David Conroy, which has been widely
utilized and modified, the basic text-line structure looks like this:

typedef struct LINE {
	struct LINE *nextline;
	struct LINE *prevline;
	short       size;		/* s.b. int! */
	short       used;
	char        text[];		/* !!!!!!!!! */
}	LINE;

The idea is to allocate memory for lines as follows:

	lineptr = malloc(sizeof(LINE)+length);

where length is as needed at the time for that line.  The actual text
of the line is stored OUTSIDE the struct, starting at lineptr->text[0].
This is, of course, "illegal".  Some compilers give a warning about
"zero-size structure element".

Question:  Do some compilers refuse to accept this?  Is there a GOOD
way to do it legally?  (NOTE: I KNOW that you can use:

	...
	char	*text;
	...
	lineptr = malloc(sizeof(LINE));
	lineptr->text = malloc(length);

- but the illegal version saves the overhead of the extra pointer and
the overhead of the extra malloc() control block.  In this application
this saving is important, since there will be hundreds or even thousands
of LINEs.)

- Moshe Braner

mouse@mcgill-vision.UUCP (02/11/87)

In article <409@mipos3.UUCP>, pinkas@mipos3.UUCP (Israel Pinkas) writes:
> In article <5258@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>> In article <4114@brl-adm.ARPA> escott%deis.uci.edu@icsg.uci.edu (Scott Menter) writes:
>>> ... is there any reason why you should be able to declare an array
>>> with zero elements as an automatic variable?

Uniformity.  Note that this, ie [0], is not the same as [].

>> Why not?  [...] no members of that array are accessible:  Valid
>> subscripts are in the range [0..0).

It doesn't even occupy any storage (at least it does, zero bytes of
it), so sure, why not?

> Wrong.  There are no valid subscripts to the array.

That is what Chris meant (I'm sure).  Mathematicians use square
brackets to denote a closed interval end and parentheses to denote an
open end, so that [1..10) would indicate those x for which 1<=x<10.
This is arguably inconsistent when both ends are the same value, but I,
at least, found his meaning perfectly clear anyway.

(Generally, if you disagree with Chris about a point of fact (as
opposed to opinion), check your beliefs, assumptions, and understanding
of his posting very carefully; he's usually right.)

					der Mouse

USA: {ihnp4,decvax,akgua,utzoo,etc}!utcsri!mcgill-vision!mouse
     think!mosart!mcgill-vision!mouse
Europe: mcvax!decvax!utcsri!mcgill-vision!mouse
ARPAnet: think!mosart!mcgill-vision!mouse@harvard.harvard.edu

drw@cullvax.UUCP (02/12/87)

braner@batcomputer.tn.cornell.edu (braner) writes:
> In the famous "microEMACS" by David Conroy, which has been widely
> utilized and modified, the basic text-line structure looks like this:
> 
> typedef struct LINE {
> 	struct LINE *nextline;
> 	struct LINE *prevline;
> 	short       size;		/* s.b. int! */
> 	short       used;
> 	char        text[];		/* !!!!!!!!! */
> }	LINE;
> 
> The idea is to allocate memory for lines as follows:
> 
> 	lineptr = malloc(sizeof(LINE)+length);
> 
> where length is as needed at the time for that line.  The actual text
> of the line is stored OUTSIDE the struct, starting at lineptr->text[0].
> This is, of course, "illegal".  Some compilers give a warning about
> "zero-size structure element".
> 
> Question:  Do some compilers refuse to accept this?  Is there a GOOD
> way to do it legally?  (NOTE: I KNOW that you can use:

Replace "char text[]" with "char text[0]".  This leaves the
declaration perfectly legitimate.  Probably it isn't kosher according
to ANSI to reference foo.text[27], but the various requirements that
ANSI puts on make it extremely likely that it will work in any
conforming implementation.

Dale
-- 
Dale Worley		Cullinet Software
UUCP: ...!seismo!harvard!mit-eddie!cullvax!drw
ARPA: cullvax!drw@eddie.mit.edu

tanner@ki4pv.UUCP (02/12/87)

) braner@batcomputer.tn.cornell.edu (braner) writes
) ... char text[] ...
) ... Question: Do some compilers refuse to accept this?

Yes, the microsoft compiler (at least the one distributed by SCO as
2.2\(*b) refuses to accept this.  Says "unknown size", of all things.
I can understand and sympathise with this, of course -- the size is
not specified, so I'd consider it unknown too.

However, note that the obvious replacement (char text[0]) elicits the
same error message, even though the size IS known (to be zero).

In all fairness, this is a beta release of the compiler, and it may have
been fixed already.
-- 
<std dsclm, copies upon request>	   Tanner Andrews

dhb@rayssd.UUCP (02/12/87)

In article <159@batcomputer.tn.cornell.edu> braner@batcomputer.UUCP (braner) writes:
> [much discussion of a structure with a trailing character string
> and the fact that the way it is being used is illegal.  also
> mention of the fact that the overhead of an extra pointer and
> malloc control block might be critical factors.]

I have run into this problem on several occasions and have come up with
what I think is a reasonable solution (actually two solutions).  The
approach that I prefer is the following:

	1. change the definition of 'text' to 'char *text;'
	2. do the malloc() for 'sizeof(LINE)+length'
	3. set the text pointer to the base address of the structure
	   plus sizeof(LINE).

This approach has the added overhead of an extra pointer but it
eliminates the extra malloc control block.  Since the malloc control
block is generally larger than a pointer this is a reasonable tradeoff.
It also eliminates the extra call to malloc which can be important if
you want your application to run fast.

Another approach that I have used is to define multiple structures.  The
first structure has everything except the 'text' variable, the second
structure consists of an instance of the first structure followed by a
huge text buffer.  For example:

	struct header_junk {
		int	length;
		int	other stuff;
		whatever else you need;
	};

	struct LINE {
		struct header_junk hj;
		char	text[32768]; /* or other large number */
	};

When you malloc() the structure, specify the size as
'sizeof(struct header_junk)+length' but assign the pointer to something
of type 'struct LINE'.  This has the minor drawback of adding another
level of indirection to get at the variables in the header and on some
machines (actually: some compilers) this might add to the execution
time.  This can be taken care of by using a pointer to the header area.
Note that this only adds one pointer instead of one pointer for each
line element.  You could probably even cheat a little and just use a
cast to convert the pointer to the proper type.
-- 
David H. Brierley
Raytheon Submarine Signal Division; Portsmouth RI; (401)-847-8000 x4073
smart mailer or arpanet: dhb@rayssd.ray.com
old dumb mailer or uucp: {cbosgd,gatech,ihnp4,linus!raybed2} !rayssd!dhb

colin@vu-vlsi.UUCP (02/13/87)

In article <159@batcomputer.tn.cornell.edu> braner@batcomputer.UUCP (braner) writes:
>
>In the famous "microEMACS" by David Conroy, which has been widely
>utilized and modified, the basic text-line structure looks like this:
>
>typedef struct LINE {
>	struct LINE *nextline;
>	struct LINE *prevline;
>	short       size;		/* s.b. int! */
>	short       used;
>	char        text[];		/* !!!!!!!!! */
>}	LINE;
>
>The idea is to allocate memory for lines as follows:
>
>	lineptr = malloc(sizeof(LINE)+length);

Some other people suggested declaring the text field to be char *text, but
I'm surprised no one suggested this:

Declare the text field to be char text[1], then use

	lineptr = malloc(sizeof(LINE)-1+length);

Almost all compilers will optimize sizeof(LINE)-1 into a single constant, so
the code generated is likely to be exactly the same as that generated for
the uEmacs example above...[Of course you can cast the argument to (unsigned)
to keep lint happy.]

Gnuplot (which we posted a couple weeks ago) uses this technique because it
seemed to be the most portable...

	-Colin Kelley  ..{cbmvax,pyrnj,bpa}!vu-vlsi!colin

gwyn@brl-smoke.UUCP (02/13/87)

In article <795@cullvax.UUCP> drw@cullvax.UUCP (Dale Worley) writes:
-braner@batcomputer.tn.cornell.edu (braner) writes:
-> typedef struct LINE {
->...
-> 	char        text[];		/* !!!!!!!!! */
-> }	LINE;
-Replace "char text[]" with "char text[0]".

Since X3J11 hasn't agreed to permit 0-sized objects anywhere,
you might be better off using "char text[1]".  Turn off any
range-checking your C system might have.

mikes@apple.UUCP (02/18/87)

We got bit by this one, where the user does
struct Line {
	some overhead fields, like links, etc.
	char data[1]; /* actually, can be a lot more than one */
};
	The idea was that the data part is at the end if struct Line,
and can be VERY long.
	We were using a Green Hills C compiler, which had this nice
feature of using 'short math' for certain array index calculations.
	Of course, when the 'data' got to be VERY long, short math
won't do, and this caused some hard-to-find problems.
	Personally, I would incur the overhead of having an extra
pointer in the structure, but if you really want to allocate the
data *as part of struct Line*, then I am left with the feeling
that the proper way to do this is:
struct Line {
	overhead fields
	char data[MAX_IT_CAN_EVER_BE];
};
	and allocate it via
	malloc(sizeof(struct Line) - MAX_IT_CAN_EVER_BE + sizeofyourdata)

	This isn't super clean, but I expect that for a language without
dynamic arrays.
-- 
			Michael Shannon {apple!mikes}