[comp.std.c] variable-length struct hack

karl@haddock.ima.isc.com (Karl Heuer) (12/07/89)

[Snarfed from comp.std.unix, thread "Query about <dirent.h>"]
In article <450@longway.TIC.COM> Doug Gwyn <uunet!brl.mil!gwyn> writes:
>In article <448@longway.TIC.COM> dmr@research.att.com (Dennis Ritchie) writes:
>>I wish Gwyn et. al had sounded a bit more embarrassed about using
>>`char d_name[1]' in struct dirent.
>
>Here is the line in question taken directly from my PD dirent implementation:
>	char		d_name[1];	/* name of file */	/* non-ANSI */
>You will note that I'm well aware that a trick is being used here.  ...
>Certainly it is unportable usage, i.e. not guaranteed to work by the C
>language specification.

I question this.  It seems to me that
	typedef struct { junk_t xx; char name[1]; } T;
	T *p = (T *)malloc(sizeof(T) + strlen(s));
	strcpy(p->name, s);
is legal and portable, and I believe I can rigorously prove it from the rule
"objects are composed of bytes":

0.  The result of malloc() is an aligned chunk of sizeof(T)+strlen(s) bytes

1.  Casting such into a (T *), and then back into a (void *), yields the same
    result (else free() wouldn't work; see recent commentary in this group!)

2.  (void *) has the same format as (char *)

3.  Hence, (char *)p points into an array [sizeof(T)+strlen(s)] of char

4.  (char *)(p->name) = (char *)p + offsetof(T, name)

5.  Hence, (char *)(p->name) points into an array [sizeof(T)+strlen(s)-\
    offsetof(T, name)] of char

6.  The cast is a no-op, since p->name in an rvalue context will already be of
    type (char *)

7.  sizeof(T) >= offsetof(T, name) + 1 /* 1 = sizeof(p->name) */

8.  Hence, p->name points into an array of at least strlen(s)+1 chars

9.  Hence, strcpy(p->name, s) is legal.

So it would appear that Doug's implementation of <dirent.h> will work on any
ANSI C implementation.  Have I overlooked anything?

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
________
Btw, I agree with Dennis that one should be embarrassed about using this
hack.  I can also prove that one valid way to initialize to a null pointer is
	char *p = '\0';
, but that doesn't make it a good idea.

gwyn@smoke.BRL.MIL (Doug Gwyn) (12/08/89)

In article <15364@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes:
>9.  Hence, strcpy(p->name, s) is legal.

I basically agree with the argument to this point, which is as far as you
took it.  However, the next step might be to access p->name[n], where n > 0,
and that is technically illegal since the `name' member is an array of
length 1, so [0] is the last legally accessible element in it.  A
conforming implementation could, so far as I can tell, enforce bounds
checking on such an array access.  I think Dennis believes that too.

>So it would appear that Doug's implementation of <dirent.h> will work on any
>ANSI C implementation.  Have I overlooked anything?

I think my implementation functions properly internally, for the reasons
you spelled out in deriving conclusion 9 above, but the real question is
whether the user of the dirent facilities can then deal with the struct
dirent to which readdir() returns a pointer in a useful way.  I believe
Dennis is correct in having misgivings about this trickery, not because
the implementation will malfunction, but because in some environments the
struct dirent user might get screwed by this kludge.  (I believe that so
long as he merely uses strcpy() etc. to handle the d_name field, he'll be
okay; it's direct access of the d_name[] elements that might go wrong.)

I don't think IEEE Std 1003.1 guarantees that all the useful information
for a directory entry is contained entirely within the declared struct
dirent type, but it also doesn't warn the user about this issue, so it
is quite likely that somebody will attempt to copy the information using
normal C operations such as
	struct dirent mycopy = *p;
then use the copy, with unpredictable results.  Absent a clear statement
of intent, it is impossible to determine whether this was supposed to
work or not.  From my involvement in thrashing out the dirent specs, my
clear recollection is that the d_name[1] trick was specifically catered
for, meaning that mycopy=*p is not meant to be guaranted to work, but I
admit that this may well come as a surprise to the reader of Std 1003.1
as it now stands.

I think that this whole topic is awfully messy.  I am not "embarrassed"
to have used this trick, for reasons I explained in a previous posting.
However, I do plan on making reliance on the trick an option, rather
than forced, in a future version of my PD dirent implementation.  That
way the installer can, in an environment where array bounds checking is
enforced, decide whether or not to trade off space for user convenience.

meulenbr@cstw68.prl.philips.nl (Frans Meulenbroeks) (12/08/89)

In article <15364@haddock.ima.isc.com> karl@haddock.ima.isc.com (Karl Heuer) writes:
[about dirent.h & the form of a dirent struct]
>I question this.  It seems to me that
>	typedef struct { junk_t xx; char name[1]; } T;
>	T *p = (T *)malloc(sizeof(T) + strlen(s));
>	strcpy(p->name, s);
>is legal and portable, and I believe I can rigorously prove it from the rule
>"objects are composed of bytes":

[proof deleted; see original article]

Karl, I like your proof. However, a struct definition like that
does not allow indexing in the name array under implementations which
check for array out of bounds. (by the way, does ANSI allow index out of
bound checks? Are they forbidden? Is it left to the implementor? I could
not find anything in the draft)

E.g.: if s = "Hello World", I'd like p->name[3] to result in an 'l', and
not in a syntax error.

I for me, I would prefer char *name instead of char name[1].
Frans Meulenbroeks        (meulenbr@cst.prl.philips.nl)
	Centre for Software Technology
	( or try: ...!mcvax!phigate!prle!cst!meulenbr)