bothner@sevenlayer.cs.wisc.edu (Per Bothner) (10/26/90)
[If I get some support for this proposal, I hope to present it to the ANSI committee. But it has been suggested that "they will never go for it." It seems that Algol60-style variable-length arrays is too radical an idea...] The problem: C++ provides only limited support for arrays whose size is calculated at run-time, and provides no support for objects that contain variable-size arrays. Here is a reasonable-looking String implementation: struct String { int len; char c[len+1]; ...; }; String *s = new String(len); This is not allowed, so the standard work-around is: struct String { int len; char c[1]; } String *s = (String*)new char[sizeof String + len]; This is not typesafe, and is disparaged in ARM (p.172) (ARM==Ellis&Stroustrup: The Annotated C++ Reference Manual), which suggests this style: struct String { int len; char *c; } char *c = new char[len]; String *s = new String(len, c); But this has a number of disadvantages: * One extra 'new' is needed (and later an extra 'free'). * Extra space needed for an unnecessary pointer, plus the extra overhead imposed by the storage manager for an extra object. * Extra indirection to get pointer. * Cannot describe data structures used in existing applications. * Usual type-safety problems because pointer-to-object and pointer-to-array cannot be distinguished. So I propose to explictly allow variable-length arrays and objects. This allows the efficiency advantages of embedded arrays, but in a type-safe manner. SUMMARY: Allow: declarator [ expression_opt ] as a valid 'declarator' (ARM, p 130, 136.), with certain restrictions to be discussed below. (Currently, the 'expression' must be a 'constant-expression.') If the declarator defines a non-static data member, the expression may involve other data member of the same class that are const and integral. PRIOR ART: The GNU compilers gcc and g++ allow variable-sized auto (stack-based) arrays. Otherwise, I know of no prior C++ art. ARRAYS: The following is a valid 'declarator': declarator [ expression_opt ] The 'expression' must be a 'constant-expression', unless it defines a local variable or a data member in a class. Specifically, it must be constant in a typedef-declaration or if a global variable or a parameter is being declared. (This may be overly restrictive, but I don't see much need for variable-size typedefs, globals, or parameters.) It is legal for the expression to evaluate to zero. For consistency, this should be allowed even if the expression is constant and even if the declarator declares a typedef, global, or a parameter. (The gcc/g++ compilers already allow zero-length arrays.) IMPLEMENTATION OF LOCAL VARIABLE-SIZED ARRAY: The declaration: TYPE VAR[LENGTH]; can be implemented as if it were: const size_t __len = LENGTH; TYPE *VAR = new TYPE[__len]; struct __dummy { ~__dummy() { delete [__len] VAR; }} __dummy_var; (The last declaration is a trick to make sure VAR gets deleted when it exits its scope.) However: sizeof(VAR) is __len*sizeof(TYPE). A more efficient implementation would allocate VAR on the stack instead of the heap, perhaps using alloca() or something similar. (Note: The latter is done in the g++ implementation.) VARIABLE-SIZED CLASS MEMBERS: The size of an array declared as a class member can be an expression containing arbitrary in-scope variables. The scoping is the same as default arguments (ARM 8.2.6), except that the expression may contain local variables and non-static members. (Digression: I don't understand why the latter aren't allowed for default arguments; I think they should be. The bullet on page 142 makes the interpretation clear, and also defines the meaning of formal arguments defined in default expressions.) The basic idea is the same as before: The expression for the size of the variable-sized member must be evaluated, before a variable-sized object is allocated. The problem is that the size expression in general may depend on the object having already been constructed. To break this cicularity, we must impose certain restrictions. The restrictions I propose are: * The only non-static members (of 'this') that may be used in a size expression must be const and have integral type. Using 'this' itself is also illegal. * If an non-static member is used in a size expression, there are restrictions on how that member is initialized: The expression used to initialize it (either in a constructor's mem-initializer (ARM, p. 291) or in an initializer-list (ARM, p. 148)) cannot reference *any* members of the object being constructed. * All constructors for variable-sized objects must be inline. I suggest that variable-sized unions not be allowed. THE SIZE OF A VARIABLE-SIZED OBJECT: When a variable-sized array or object is constructed, the value of the expression defining its size must be saved (in a hidden field in the case of a variable-sized object, or in a hidden local variable in the case of a varaible-sized local array). This is because we don't want to allow the "size" of an object to change after it has been allocated. (This is analoguous to how in 'new T[expr]' the value of 'expr' is saved in a hidden location.) Implementations are urged to optimize the special case of a size-expression which has one of the forms 'a*x+b', 'a*x-b', or 'a*x' (where 'a' and 'b' are constant-espressions and 'x' is a member). In that case no extra hidden variable should be used, but the member 'x' should be used (and the whole size-expression should be re-evaluated when needed). (The rationale for optimizing a case like 'x+b' is for classes like String, given in the introduction, where one wants to allocate space for a terminating NUL byte, but would prefer the "count" field to not include it in the size.) I propose that the sizeof operator may be applied to a variable-sized object, and yield the true "run-time" size of that specific object. In that case, sizeof may have to evaluate its argument, and the expression containing sizeof is *not* a constant-expression. I propose that it be illegal to apply sizeof to a variable-sized class. (Alternatively, one could define it to yield the size of the fixed portion.) CONSTRUCTION OF VARIABLE-SIZED OBJECTS (Implementation): The correct constructor (which must be inline) is determined as usual. All the actual parameters have been evaluated. We then evaluate all non-static members that are used in the size expression. These are evaluated into temporary integer variables, using the expressions in the constructor's appropriate mem-initializer (required, because the members must be const, and so cannot be initialized any other way). We next evaluate the size-expression itself, using these temporary integers. After adding the constant part of the size of the structure, we can allocate the object, either on the stack or using 'operator new'. We now copy the temporary integer values into their final location in the object, and finish evaluating the rest of the constructor. The value of the size-expression is conceptually saved in a hidden variable of the object (but see above). Initialization using an intializer-list (ARM, p. 148) is similar. INHERITANCE: A variable-sized class may be a base class. But in that case it cannot have any variable-sized data members. Also, a derived class can have at most one (non-virtual) variable-sized base class. RESTRICTIONS: There are a number of common-sense restrictions on using variable-sized objects. (Note that there are no such restrictions on using a *pointer* or *reference* to a variable-sized object.) Variable-sized objects are not rvalues. They cannot be assigned to, passed as parameters, and they cannot be returned by functions. The size of a variable-size global object must be a constant-expression. There can be no arrays of variable-sized objects. No address arithmetic is allowed on pointers to variable-sized objects. A structure can contain only one variable-sized member, and it must be the last member. A class can be derived from a variable-sized class, but it can have at most one variable-sized super-class, and in that case can have no variable-sized members. -- --Per Bothner bothner@cs.wisc.edu Computer Sciences Dept, U. of Wisconsin-Madison
leech@degas.cs.unc.edu (Jonathan Leech) (10/28/90)
In article <11578@spool.cs.wisc.edu>, bothner@sevenlayer.cs.wisc.edu (Per Bothner) writes: |> struct String { int len; char *c; } |> char *c = new char[len]; |> String *s = new String(len, c); |>But this has a number of disadvantages: |>* One extra 'new' is needed (and later an extra 'free'). |>* Extra space needed for an unnecessary pointer, plus the extra |>overhead imposed by the storage manager for an extra object. But your proposal needs extra space for the array length, so it's a wash in this respect.
diamond@tkou02.enet.dec.com (diamond@tkovoa) (10/29/90)
In article <11578@spool.cs.wisc.edu> bothner@sevenlayer.cs.wisc.edu (Per Bothner) writes: [perfectly reasonable suggestion for variable-sized array syntax and semantics, with perfectly reasonable justification] >PRIOR ART: >The GNU compilers gcc and g++ allow variable-sized auto (stack-based) >arrays. Otherwise, I know of no prior C++ art. Is the "Not Invented Here" disease really running so rampant in the C++ camp now? Yes, (dare I say "obviously") variable-sized auto arrays are a good idea, as has been proved in dozens of other languages. PL/I had it 26 years ago and there were surely others before that. If "Not Invented Here" were always so rampant, C would not have enumerated types (copied from a language which also sadly lacked variable-sized arrays in its early years), and it wouldn't use "*" as a multiply operator. -- Norman Diamond, Nihon DEC diamond@tkov50.enet.dec.com (tkou02 is scheduled for demolition) We steer like a sports car: I use opinions; the company uses the rack.
bothner@sevenlayer.cs.wisc.edu (Per Bothner) (11/05/90)
In article <17109@thorin.cs.unc.edu>, leech@degas.cs.unc.edu (Jonathan Leech) writes: > But your proposal needs extra space for the array length, >so it's a wash in this respect. The program *must* know the length of the array; otherwise the array is useless. If the array is variable-length, the program usually keeps this length in a variable. In addition, the ARM (p. 64-65) mandates that the array length of a (non-embedded, heap-allocated) array be squirelled away somewhere. So using an embedded array typically saves *two* "words:" one for the no-longer-needed pointer, and one for the no-longer-needed squirelled-away length. There also the extra space overhead inside the memory allocator that may depend on how it is implemented and alignment restrictions. Consider the case of a binary buddy system (rounds up to nearest power of two), and len==6: struct String : public Object { const int len; char c[len]; String(int l) : len(l) { } }; String *s = new String(6); The size of s is 4 (vtable) + 4 (len) + 6 (c), i.e. 16 bytes. The alternative: struct String : public Object { const int len; char *c; String(int l) : len(l) { c = new char[l]; } } String *s = new String(6); The s structure takes 3 words (with vtable), i.e. 16 bytes. The c string takes 6 bytes plus a 4-byte-length, i.e. 16 bytes. Total: 32 bytes, i.e. double. If no hidden count is allocated for s->c (since the compiler knows that char's destructor is empty), it still takes 24 bytes. The example is admittedly atypical, but it illustrates the overheads. (It is possible that the embedded implementation may take more room, if the string is just the "wrong" size such that the fixed fields ush it just beyond a power of two. This is statistically unlikely.) -- --Per Bothner bothner@cs.wisc.edu Computer Sciences Dept, U. of Wisconsin-Madison