kendall@wjh12.UUCP (Sam Kendall) (07/04/84)
The response to my news item "Variable-length string at end of structure" has been awe-inspiring. Thanks to those who replied: wjh12!bb, hscvax!sasaki, oddjob!sean, wateng!padpowell, rlgvax!guy, hsi!stevens, eagle!msf, browngr!jnp, nbires!rcd, whuxle!mp, fortune!crane, rlgvax!jack, sdccs6!ix269, lzmi!psc, gatech!jeff, petsd!joe, scc!ted, arizona!whm, sdchema!jwp, bbncca!keesan, mordor!jdb. For those who missed it, here is my original request: > I am wondering how many programs use the following construct, or > something similar: > > struct a { > ... > char varlen_string[1]; > } a_struct; > ... > p = (struct a *) malloc(sizeof (struct a) + strlen(a_string)); > ... /* fill in structure members */ > (void) strcpy(p->varlen_string, a_string); > > That is, malloc'ing space for a fixed-length structure plus a > variable-length string, and referencing the string using the last member > of the structure. > > The Rand Editor and its derivatives do this, and Martin Minow's cpp does > it; has anyone seen other programs that do? I'd be interested to know > how many programs do this, and exactly what type the last member of the > structure is (i.e. is it char [1]? char? Something else?) I need to > know in order to put some kludge in our runtime checker to handle it; > currently it is flagged as an error. From Jeff Lee (gatech!jeff) comes, finally, a short name for this construction: "open-ended structure". A good subject line for further news items on this subject. The most important point in the responses was that open-ended structures are used a good deal. A few felt uneasy about it ("Guilty as charged, your honor." --Brent Byer (wjh12!bb)). But the following programs and libraries use it, according to letters: malloc(3), awk(1), UNSW Prolog Interpreter, Multiplan, BBN's cc(1), APL\11, msg*(2) (message-passing system calls in System V); and plenty of people said they use it in proprietary software, or just use it a lot. The most common use is for symbol tables where names can get large. Interestingly, Berkeley's cc with flexnames does not use open-ended structures, since it has a hash table with fixed-size entries. Another point is that strings of things other than chars can use this mechanism. Several use it for arrays of substructures. There were several types mentioned for the open-ended last member. The most common is array of 1 element, for whatever element type is appropriate, usually char. Array of 0 elements, or empty brackets, which is the same thing on most compilers, was also mentioned. The problem with arrays of 0 in structures is that the Portable C Compiler dislikes them to the point of giving a fatal error; thus any use of them is highly nonportable. (An array of 0 elements makes no sense if you consult the reference manual, since when you use an array you get "a pointer to the first object in the array". Empty brackets used with storage definitions are not discussed, and so are also questionable.) Morris Keesan (bbncca!keesan) objects to the use of "char" type for the last element, and I agree that array[1] is clearer; the only piece of code I have heard of that uses "char" is a version of the Rand Editor. Incidentally, my runtime checker now understands that array[1] at the end of a structure in dynamic storage means an open-ended structure. One person claimed, > This [empty brackets] seems more straightforward to me than "char name[1]", > since > a) it makes it clear that the array is really variable length, and > b) you can "malloc(sizeof(SYMBOL) + strlen(s))" instead of having to > remember "malloc(sizeof(SYMBOL) + strlen(s) - 1)" [where SYMBOL > is the structure type]. I agree that empty brackets are clearer, but unfortunately they are not portable. Point (b) isn't right; strlen doesn't count the null byte, so either of those expressions can allocate too little storage. Joe Pato (browngr!jnp) defines a constant VARYING (of value 1) to use as the array bound in such cases. This is nice, although a person looking at the code would not know why sizeof (struct_type) + strlen(string) would be the correct amount of storage to allocate, unless he keeps in mind the value of VARYING, which he should not have to. To make the storage allocation as high-level as the declaration, you need another macro: #define VARYSIZE(struct_type, nelem, elsize) \ (sizeof (struct_type) + ((nelem) - 1) * (elsize)) which, for strings, would be called as VARYSIZE(struct_type, strlen(string)+1, sizeof (char)) Well, you might want to have another macro for the case of strings, since the general macro is, ah, bulky to use. Perhaps it is easier to forget the whole thing and stay low-level, the way God intended. In any case, the use of this construct is high-level--you can think of the storage as part of the structure, even though it isn't. Some people objected to open-ended structures, making two points. (1) They are nonportable; and (2) it is just as easy to do it the other way, putting a pointer into the structure and allocating the variable-length data seperately. However, they are portable, and it really can be harder (and less clear, as Marty Sasaki (hscvax!sasaki) pointed out) to have to allocate a second area of memory and do an additional indirection to access the variable-length element. It is debatable, but I think when someone is reading a large program, it won't take him/her that long to figure out what this construct does, even if it is not commented and he/she hasn't seen it before. Here, finally, are two interesting comments: > We've done that frequently here. Unfortunately, you can't declare something > like > > struct header_plus_stuff { > ...declarations... > int length; > char stuff[length]; > } > > and have "sizeof()", etc. work. It would be a major change, and twice as > major if the variable-length stuff weren't at the end of the structure. > PL/I does it, but PL/I is a bigger language. So we're stuck with that trick. > > Guy Harris > {seismo,ihnp4,allegra}!rlgvax!guy > I did something like this for a DBMS I was working on. What I really > wanted was this: > struct foo { > short num_items; > short total_length; > short offsets[ num_items ]; > char data[ total_length ]; > } > that is, a bunch of (null terminated) strings stored in a data space, > and a table of offsets into it. What I wrote was: > struct foo { > short num_items; > short total_length; > short offsets[ 1 ]; > } > and some macros to figure out where the beginning of the data was. > -Paul S R Chisholm, AT&T-IS, {lznv,lzmi,lzwi}!psc Sam Kendall {allegra,ihnp4,ima,amd}!wjh12!kendall Delft Consulting Corp. decvax!genrad!wjh12!kendall