[comp.std.c] "expandable" structs with last element declared using 1

std-unix@longway.TIC.COM (Moderator, John S. Quarterman) (12/16/89)

From: Mark Brader <uunet!sq.sq.com!msb>

Well, I've just seen the same topic being discussed independently in three
different newsgroups, with three different Subject lines (four, now...).
I've cross-posted this article to all three groups, and directed followups
to comp.std.c; I suggest that further followups on the topic be made from
this article (to keep the same Subject line), and in that group unless
they refer specifically to existing C implementations or to POSIX.

The issue is the legality of:

    struct foo_struct {
	int bar;
	char baz[1];
    } *foo;

    foo = (struct foo_struct *) malloc(sizeof(struct foo_struct)+1);
    foo->baz[1] = 1;  /* error? */

[Note that it is not disputed that, if this IS done, an assignment of
*foo to another struct foo_struct won't copy the entire contents of the
"extended" baz member; for this reason if no other, the construct may
be undesirable.]

Both Doug Gwyn and Dennis Ritchie have recently stated without proof,
unless I misunderstood them, that this is not safe.  I believe Doug has
stated that there are implementations where it doesn't work, but hasn't
named any.  Can someone do so (in comp.lang.c)?

A second issue is whether the usage is in conformance with the proposed
ANSI Standard (pANS) for C.  I claim that it is.

The article from which the above code was taken continues:

> Note that it is provable that the char pointer (foo->baz + 1) points
> within the object returned by malloc.

(The + here is of course the one derived from replacing x[y] with *(x+(y)).)

To this another poster replied (in an article that was for some reason
posted with Distribution usa, but which made it here anyway):

| Unfortunately, it is not provable that the char pointer(foo->baz + 1)
| points within the sub-object baz.  Hence, the behavior is undefined
| (X3J11/88-158, 3.3.6, page 48, lines 24-27). 

But this, I say, is irrelevant.  I'll quote the actual words:

# Unless both the pointer operand and the result point to elements of
# the same array, or the pointer operand points one past the last element
# of an array object and the result points to an element of the same array
# object, the behavior is undefined if the result is used as an operand
# of the unary * operator.

There is NO REQUIREMENT here that the "array" spoken of, and the array
whose name was mentioned in the pointer operand, be the same.  In this
case the pointer operand (char pointer value foo->baz), and the result
(foo->baz + 1), both point into the space returned by malloc() which, it
is guaranteed, may be treated as an array of sizeof(struct foo_struct)+1
chars.  So they do point into the same array.

Section 4.10.3, page 155, lines 13-15 (gee, this sounds familiar):

# The pointer returned ... may be assigned to a pointer to any type of
# object and then used to access such an object or an array of such
# objects in the space allocated ...

Well, to be fair, we didn't literally do that.  To do it literally,
we would have had to do:

    char *fooc = (char *) malloc(sizeof(struct foo_struct)+1);
    fooc += offsetof (struct foo, baz);      /* sets fooc to foo->baz */
    fooc[1] = 1;			     /* error? */

Is anyone claiming that fooc in the last line of this code could have
a different value from foo->baz in the original?  If not, can anyone
cite another reason why this code is not conforming?  Offsetof is a macro
defined in section 4.1.5, page 99, lines 24-30, of which the key part is:

# offsetof(type, memberdesignator) ... expands to an integral constant
# expression ... the value of which is the offset in bytes, to the structure
# member ..., from the beginning of the structure ...


-- 
Mark Brader	"Either the universe works in a predictable, analyzable way
Toronto		 or it works spasmodically, with miracles, action at a distance
utzoo!sq!msb	 and wishful thinking as the three fundamental forces.  People
msb@sq.com	 tend to take one view or the other."	-- Frank D. Kirschner

This article is in the public domain.

Volume-Number: Volume 17, Number 104

scott@bbxsda.UUCP (Scott Amspoker) (12/19/89)

In article <477@longway.TIC.COM> uunet!sq!msb (Mark Brader) writes:
>The issue is the legality of:
>
>    struct foo_struct {
>	int bar;
>	char baz[1];
>    } *foo;
>
>    foo = (struct foo_struct *) malloc(sizeof(struct foo_struct)+1);
>    foo->baz[1] = 1;  /* error? */
>
>[Note that it is not disputed that, if this IS done, an assignment of
>*foo to another struct foo_struct won't copy the entire contents of the
>"extended" baz member; for this reason if no other, the construct may
>be undesirable.]
>
>Both Doug Gwyn and Dennis Ritchie have recently stated without proof,
>unless I misunderstood them, that this is not safe.  I believe Doug has
>stated that there are implementations where it doesn't work, but hasn't
>named any.  Can someone do so (in comp.lang.c)?

Don't expect anyone on comp.lang.c to provide proofs (which is why
I unsubscribed to that group some time ago).  If they had their way
"a=b" would not be portable.  If worst comes to worst, your example may
end up allocating a little more memory than necessary but I see no way you
would get screwed unless sizeof(char) is more than 1.  Perhaps you should say:

       malloc(sizeof(struct foo_struct)+sizeof(char))

Maybe someone thinks that structure fields need not be allocated in
any particular order.  I wouldn't be surprised if it didn't work on
some system somewhere, but such a system is probably at fault.

-- 
Scott Amspoker
Basis International, Albuquerque, NM
(505) 345-5232
unmvax.cs.unm.edu!bbx!bbxsda!scott

karl@haddock.ima.isc.com (Karl Heuer) (12/19/89)

In article <477@longway.TIC.COM> uunet!sq!msb (Mark Brader) writes:
>A second issue is whether the usage is in conformance with the proposed
>ANSI Standard (pANS) for C.  I claim that it is.

My earlier posting on this topic attempted to prove it rigorously; Doug
acknowledged that I'd proved the legality of strcpy(foo->baz, "x") but
questioned whether explicitly referencing foo->baz[1] is legal.

I claim that there is no difference: if it's illegal to reference foo->baz[1]
directly, for whatever reason, then it cannot become legal simply by using an
auxiliary variable to hide the reference.  A tight-sphinctered implementation
could try to, and should be able to, enforce the bounds-checking at all levels
with run-time checks.  Thus if foo->baz[1]='\0' is illegal, then so is
	char *temp = foo->baz;
	temp[1] = '\0';
and so is
	void hideaway(char *p) { p[1] = '\0'; }
	... hideaway(foo->baz);
and so is
	strcpy(foo->baz, "x");
(all of which are just variations on a theme).

And thus contrapositively, the legality of the strcpy() implies the legality
of the direct reference.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
(I don't expect this issue to be settled by anything less than an official
Request for Interpretation, but this is my expert opinion.)

seanf@sco.COM (Sean Fagan) (12/19/89)

In article <468@bbxsda.UUCP> scott@bbxsda.UUCP (Scott Amspoker) writes:
>In article <477@longway.TIC.COM> uunet!sq!msb (Mark Brader) writes:
>I believe Doug has
>>stated that there are implementations where it doesn't work, but hasn't
>>named any.  Can someone do so (in comp.lang.c)?
>
>Maybe someone thinks that structure fields need not be allocated in
>any particular order.  I wouldn't be surprised if it didn't work on
>some system somewhere, but such a system is probably at fault.

*sigh*  Think:  you have a structure that looks like

	struct foo {
		int size;
		char name[1];
	};

Now, if you're doing symbolic debugging, the information gets encoded as
something like:

	structure <name: foo> <elements: 2>
		integer <name: size>
		array <name: name> <size: 1>
			char

or somesuch.  Now, imagine an architecture that does that in hardware (well,
microcode, probably).  Guess what, there *are* machines that do!  Most LISP
machines do something similar, and there are other machines that can do it
(I'm sure a Burroughs machine can, since they can do everything else 8-)).
Turning on bounds-checking in your compiler, if it supported it, would cause
it to fail.

So, nope, the implementation is *not* at fault.  There are some good reasons
for not making it valid (although I believe they can all be gotten around
somehow), and not only having to do with interpreters.

-- 
Sean Eric Fagan  | "Time has little to do with infinity and jelly donuts."
seanf@sco.COM    |    -- Thomas Magnum (Tom Selleck), _Magnum, P.I._
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

law@qtc.UUCP (Larry Westerman) (12/20/89)

In article <477@longway.TIC.COM> uunet!sq!msb (Mark Brader) writes:
>The issue is the legality of:
>
>    struct foo_struct {
>	int bar;
>	char baz[1];
>    } *foo;
>
>    foo = (struct foo_struct *) malloc(sizeof(struct foo_struct)+1);
>    foo->baz[1] = 1;  /* error? */
>

Although it is not relevant to the question of the legality of the
structure declarations and usage such as the above example, it is worth
noting that the Stepstone implementation of Objective C uses exactly
this technique for the creation and manipulation of objects with
indexed instance variables.

"A man hears what he wants to hear and disregards the rest"

Larry Westerman  Quantitative Technology Corporation  Beaverton OR 503-626-3081
   ...verdix! or ...sequent! qtc!law
-- 
"A man hears what he wants to hear and disregards the rest"

Larry Westerman  Quantitative Technology Corporation  Beaverton OR 503-626-3081
   ...verdix! or ...sequent! qtc!law

mmengel@cuuxb.ATT.COM (Marc W. Mengel) (12/21/89)

In article <468@bbxsda.UUCP> scott@bbxsda.UUCP (Scott Amspoker) writes:
>In article <477@longway.TIC.COM> uunet!sq!msb (Mark Brader) writes:
>>The issue is the legality of:
>>
>>    struct foo_struct {
>>	int bar;
>>	char baz[1];
>>    } *foo;
>>
>>    foo = (struct foo_struct *) malloc(sizeof(struct foo_struct)+1);
>>    foo->baz[1] = 1;  /* error? */

Gee guys, you *told* the compiler that foo->baz only has one
character in it; of *course* it's wrong to reference 
foo->baz[1].  Yes, you can probably make strcpy scribble
where foo->baz[1] would have been if baz had been declared
baz[2] on most machines, but why live dangerously? Sooner
or later you will find a machine with really wierd base/offset
limitations, and a compiler will try to generate

	mov.b $1,8(%a2)

when the offset "n" on a "n(%am)" can only be 0..7, and you'll
be sorry.  The compiler will have been correct in it's code
generation because it "knew" that the subscript had to be
zero.
-- 
 Marc Mengel					mmengel@cuuxb.att.com
 						attmail!mmengel
 						...!{lll-crg|att}!cuuxb!mmengel

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (12/21/89)

In article <4379@cuuxb.ATT.COM> mmengel@cuuxb.UUCP (Marc W. Mengel) writes:

| Gee guys, you *told* the compiler that foo->baz only has one
| character in it; of *course* it's wrong to reference 
| foo->baz[1].  

  Good point. Since this is a fairly common practice in C, I think there
would be room in a future version of the standard for a solution. I
*suggest* that it might be to allow the zero length declaration 
(int x[0]) as an explicit way of specifying just this type of struct
growth. It would, of course, disable subscript checking on that
particular array.

  I can't think of any use for it other than as the last element of a
struct, but there might be such. Any comments on the implications of
allowing this? I don't see a conflict with the existing standard, and
the practice of expanding a struct is certainly common, if not perfectly
portable under the ANSI rules.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
"The world is filled with fools. They blindly follow their so-called
'reason' in the face of the church and common sense. Any fool can see
that the world is flat!" - anon

karl@haddock.ima.isc.com (Karl Heuer) (12/22/89)

In article <468@bbxsda.UUCP> scott@bbxsda.UUCP (Scott Amspoker) writes:
>...I see no way you would get screwed unless sizeof(char) is more than 1...
>Maybe someone thinks that structure fields need not be allocated in any
>particular order.

In ANSI C, sizeof(char) must be 1 and structure members are allocated in the
order that they're declared.  But even among people who realize this, there is
disagreement as to the legality of the construct in question.

In article <4171@scolex.sco.COM> seanf@sco.COM (Sean Fagan) writes:
>[Symbolic debugging information...]  Now, imagine an architecture that does
>that in hardware (well, microcode, probably).  Guess what, there *are*
>machines that do!  ...  Turning on bounds-checking in your compiler, if it
>supported it, would cause it to fail.

This doesn't prove that the construct is illegal.  An equally valid
interpretation is that implementations that like to do bounds-checking must
settle for a weaker check in these cases, if they claim to be ANSI-conforming.

In article <4379@cuuxb.ATT.COM> mmengel@cuuxb.UUCP (Marc W. Mengel) writes:
>Gee guys, you *told* the compiler that foo->baz only has one character in it;
>of *course* it's wrong to reference foo->baz[1].

Intuition is not necessarily a good guide to understanding the pANS.

>Sooner or later you will find a machine with really wierd base/offset
>limitations, and a compiler will try to generate
>	mov.b $1,8(%a2)
>when the offset "n" on a "n(%am)" can only be 0..7, and you'll be sorry.  The
>compiler will have been correct in it's code generation because it "knew"
>that the subscript had to be zero.

That's a valid argument for why it *should* be illegal, but again, it doesn't
make it so.  If the construct *is* legal, then a compiler such as you describe
is not ANSI-conforming, and hence is of little interest in this discussion.

(Note: in this article I have responded to the arguments of the three previous
posters, but have not supplied any evidence that would actually help answer
the question.  I'll do that in a separate posting.)

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

karl@haddock.ima.isc.com (Karl Heuer) (12/22/89)

Here's a challenge to those who believe that expandable structs are not legal
in a strictly conforming program.  I have enclosed six code fragments.  If you
agree that [0] is clearly legal (it doesn't even use the struct-pointer
|foo|), but maintain that [5] is not, then there must be some value n for
which you believe that fragment [n-1] is legal but fragment [n] is not.  I
would be interested in hearing where you would draw the line, and your reasons
for believing that the legality changes at that point.

Assume global declarations
	typedef struct foo_struct { int bar; char baz[1]; } T;
	T *foo;  void *vp;  char *cp;

[0]	vp = malloc(sizeof(T)+1);
	foo = (T *)vp;
	cp = (char *)vp;
	cp[offsetof(T, baz[0]) + 1] = '\0';

[1]	foo = (T *)malloc(sizeof(T)+1);
	cp = (char *)foo;
	cp[offsetof(T, baz[0]) + 1] = '\0';

[2]	foo = (T *)malloc(sizeof(T)+1);
	cp = (char *)foo + offsetof(T, baz[0]) + 1;
	*cp = '\0';

[3]	foo = (T *)malloc(sizeof(T)+1);
	cp = (char *)(&foo->baz[0]) + 1;
	*cp = '\0';

[4]	foo = (T *)malloc(sizeof(T)+1);
	cp = &foo->baz[1];
	*cp = '\0';

[5]	foo = (T *)malloc(sizeof(T)+1);
	foo->baz[1] = '\0';

(Please don't bother to reply if you haven't read at least one draft of the
Standard.  The question is not whether it's useful, nor whether current
compilers do or do not accept it, but whether the pANS permits it.)

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

dkeisen@Gang-of-Four.Stanford.EDU (Dave Eisen) (12/29/89)

In article <15509@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes:
Six pieces of code starting from the obviously legal to the questionable.

I can't say whether or not it is legal, but if it isn't the problem is the
jump from [3] to [4].

All [0] to [1] relies on is the alignment of storage returned by malloc.
All [1] to [2] does is add to cp an integer that keeps it within the space
allocated by the malloc.
[2] to [3] is OK because the address of foo->baz[0] is (by definition))
offset (T, baz[0]) bytes away from foo.
But [3] to [4] takes the address of a "nonexistent" element, foo->baz[1]].
If this is legal, then that immediately answers the question as to whether
or not structs are "extendible".
And [4] to [5] is just the definition of the address of and pointer too
operators. 





--
Dave Eisen                      	    Home:(415) 324-9366 / (415) 323-9757
814 University Avenue                       Office: (415) 967-5644
Palo Alto, CA 94301 		            dkeisen@Gang-of-Four.Stanford.EDU