[comp.std.c++] A Proposal to Extend C++ with Variable-length Arrays

bothner@sevenlayer.cs.wisc.edu (Per Bothner) (10/26/90)

[If I get some support for this proposal, I hope to present
it to the ANSI committee. But it has been suggested that
"they will never go for it." It seems that Algol60-style
variable-length arrays is too radical an idea...]

The problem: C++ provides only limited support for arrays
whose size is calculated at run-time, and provides no
support for objects that contain variable-size arrays.
Here is a reasonable-looking String implementation:
	struct String { int len; char c[len+1]; ...; };
	String *s = new String(len);
This is not allowed, so the standard work-around is:
	struct String { int len; char c[1]; }
	String *s = (String*)new char[sizeof String + len];
This is not typesafe, and is disparaged in ARM (p.172)
(ARM==Ellis&Stroustrup: The Annotated C++ Reference Manual),
which suggests this style:
	struct String { int len; char *c; }
	char *c = new char[len];
	String *s = new String(len, c);
But this has a number of disadvantages:
* One extra 'new' is needed (and later an extra 'free').
* Extra space needed for an unnecessary pointer, plus the extra
overhead imposed by the storage manager for an extra object.
* Extra indirection to get pointer.
* Cannot describe data structures used in existing applications.
* Usual type-safety problems because pointer-to-object and
pointer-to-array cannot be distinguished.

So I propose to explictly allow variable-length arrays and objects.
This allows the efficiency advantages of embedded arrays, but
in a type-safe manner.

SUMMARY:
Allow:
	declarator [ expression_opt ]
as a valid 'declarator' (ARM, p 130, 136.),
with certain restrictions to be discussed below.
(Currently, the 'expression' must be a 'constant-expression.')
If the declarator defines a non-static data member, the expression
may involve other data member of the same class that are const and integral.

PRIOR ART:
The GNU compilers gcc and g++ allow variable-sized auto (stack-based)
arrays. Otherwise, I know of no prior C++ art.

ARRAYS:
The following is a valid 'declarator':
	declarator [ expression_opt ]
The 'expression' must be a 'constant-expression', unless it defines
a local variable or a data member in a class. Specifically,
it must be constant in a typedef-declaration or if a global variable
or a parameter is being declared. (This may be overly restrictive,
but I don't see much need for variable-size typedefs, globals,
or parameters.)

It is legal for the expression to evaluate to zero.
For consistency, this should be allowed even if the expression
is constant and even if the declarator declares a typedef,
global, or a parameter. (The gcc/g++ compilers already
allow zero-length arrays.)

IMPLEMENTATION OF LOCAL VARIABLE-SIZED ARRAY:
The declaration:
	TYPE VAR[LENGTH];
can be implemented as if it were:
	const size_t __len = LENGTH;
	TYPE *VAR = new TYPE[__len];
	struct __dummy { ~__dummy() { delete [__len] VAR; }} __dummy_var;
(The last declaration is a trick to make sure VAR gets deleted
when it exits its scope.)
However: sizeof(VAR) is __len*sizeof(TYPE).
A more efficient implementation would allocate VAR on the stack
instead of the heap, perhaps using alloca() or something similar.
(Note: The latter is done in the g++ implementation.)

VARIABLE-SIZED CLASS MEMBERS:
The size of an array declared as a class member can be an expression
containing arbitrary in-scope variables. The scoping is the same as
default arguments (ARM 8.2.6), except that the expression may contain
local variables and non-static members. (Digression: I don't understand
why the latter aren't allowed for default arguments; I think they
should be. The bullet on page 142 makes the interpretation clear,
and also defines the meaning of formal arguments defined in
default expressions.)

The basic idea is the same as before:
The expression for the size of the variable-sized member must be evaluated,
before a variable-sized object is allocated. The problem is that the
size expression in general may depend on the object having already
been constructed. To break this cicularity, we must impose
certain restrictions.

The restrictions I propose are:
* The only non-static members (of 'this') that may be used in
a size expression must be const and have integral type.
Using 'this' itself is also illegal.
* If an non-static member is used in a size expression,
there are restrictions on how that member is initialized:
The expression used to initialize it (either in a
constructor's mem-initializer (ARM, p. 291) or in an
initializer-list (ARM, p. 148)) cannot reference *any*
members of the object being constructed.
* All constructors for variable-sized objects must be inline.

I suggest that variable-sized unions not be allowed.

THE SIZE OF A VARIABLE-SIZED OBJECT:

When a variable-sized array or object is constructed, the
value of the expression defining its size must be saved
(in a hidden field in the case of a variable-sized object,
or in a hidden local variable in the case of a varaible-sized
local array). This is because we don't want to allow the "size"
of an object to change after it has been allocated. (This is
analoguous to how in 'new T[expr]' the value of 'expr' is
saved in a hidden location.)

Implementations are urged to optimize the special case
of a size-expression which has one of the forms 'a*x+b', 'a*x-b',
or 'a*x' (where 'a' and 'b' are constant-espressions and 'x'
is a member). In that case no extra hidden variable should be
used, but the member 'x' should be used (and the whole size-expression
should be re-evaluated when needed).

(The rationale for optimizing a case like 'x+b' is for classes like
String, given in the introduction, where one wants to allocate
space for a terminating NUL byte, but would prefer the "count"
field to not include it in the size.)

I propose that the sizeof operator may be applied to a
variable-sized object, and yield the true "run-time" size
of that specific object. In that case, sizeof may have
to evaluate its argument, and the expression containing sizeof
is *not* a constant-expression.

I propose that it be illegal to apply sizeof to a variable-sized
class. (Alternatively, one could define it to yield the size of the
fixed portion.)

CONSTRUCTION OF VARIABLE-SIZED OBJECTS (Implementation):
The correct constructor (which must be inline) is determined as usual.
All the actual parameters have been evaluated.
We then evaluate all non-static members that are used
in the size expression. These are evaluated into temporary
integer variables, using the expressions in the constructor's
appropriate mem-initializer (required, because the members
must be const, and so cannot be initialized any other way).
We next evaluate the size-expression itself, using these
temporary integers. After adding the constant part of the
size of the structure, we can allocate the object,
either on the stack or using 'operator new'. We now copy the
temporary integer values into their final location in the
object, and finish evaluating the rest of the constructor.
The value of the size-expression is conceptually saved
in a hidden variable of the object (but see above).

Initialization using an intializer-list (ARM, p. 148) is similar.

INHERITANCE:
A variable-sized class may be a base class. But in that case
it cannot have any variable-sized data members.
Also, a derived class can have at most one (non-virtual)
variable-sized base class.

RESTRICTIONS:
There are a number of common-sense restrictions on using
variable-sized objects. (Note that there are no such
restrictions on using a *pointer* or *reference* to a
variable-sized object.)

Variable-sized objects are not rvalues. They cannot be assigned to,
passed as parameters, and they cannot be returned by functions.
The size of a variable-size global object must be a constant-expression.
There can be no arrays of variable-sized objects.
No address arithmetic is allowed on pointers to variable-sized objects.
A structure can contain only one variable-sized member, and it
must be the last member. A class can be derived from a variable-sized
class, but it can have at most one variable-sized super-class,
and in that case can have no variable-sized members.

-- 
	--Per Bothner
bothner@cs.wisc.edu Computer Sciences Dept, U. of Wisconsin-Madison

leech@degas.cs.unc.edu (Jonathan Leech) (10/28/90)

In article <11578@spool.cs.wisc.edu>, bothner@sevenlayer.cs.wisc.edu
(Per Bothner) writes:
|>	struct String { int len; char *c; }
|>	char *c = new char[len];
|>	String *s = new String(len, c);
|>But this has a number of disadvantages:
|>* One extra 'new' is needed (and later an extra 'free').
|>* Extra space needed for an unnecessary pointer, plus the extra
|>overhead imposed by the storage manager for an extra object.

	But your proposal needs extra space for the array length,
so it's a wash in this respect.

diamond@tkou02.enet.dec.com (diamond@tkovoa) (10/29/90)

In article <11578@spool.cs.wisc.edu> bothner@sevenlayer.cs.wisc.edu (Per Bothner) writes:

[perfectly reasonable suggestion for variable-sized array syntax and
semantics, with perfectly reasonable justification]

>PRIOR ART:
>The GNU compilers gcc and g++ allow variable-sized auto (stack-based)
>arrays. Otherwise, I know of no prior C++ art.

Is the "Not Invented Here" disease really running so rampant in the C++
camp now?  Yes, (dare I say "obviously") variable-sized auto arrays are
a good idea, as has been proved in dozens of other languages.  PL/I had
it 26 years ago and there were surely others before that.

If "Not Invented Here" were always so rampant, C would not have enumerated
types (copied from a language which also sadly lacked variable-sized arrays
in its early years), and it wouldn't use "*" as a multiply operator.

-- 
Norman Diamond, Nihon DEC    diamond@tkov50.enet.dec.com
                                    (tkou02 is scheduled for demolition)
We steer like a sports car:  I use opinions; the company uses the rack.

bothner@sevenlayer.cs.wisc.edu (Per Bothner) (11/05/90)

In article <17109@thorin.cs.unc.edu>, leech@degas.cs.unc.edu (Jonathan Leech) writes:
>	But your proposal needs extra space for the array length,
>so it's a wash in this respect.  

The program *must* know the length of the array;
otherwise the array is useless. If the array is variable-length,
the program usually keeps this length in a variable.

In addition, the ARM (p. 64-65) mandates that the array length of
a (non-embedded, heap-allocated) array be squirelled away somewhere.

So using an embedded array typically saves *two* "words:"
one for the no-longer-needed pointer, and one for the
no-longer-needed squirelled-away length.

There also the extra space overhead inside the memory
allocator that may depend on how it is implemented and
alignment restrictions.
Consider the case of a binary buddy system (rounds
up to nearest power of two), and len==6:

	struct String : public Object {
	    const int len;
	    char c[len];
	    String(int l) : len(l) { }
	};
	String *s = new String(6);
The size of s is 4 (vtable) + 4 (len) + 6 (c), i.e. 16 bytes.

The alternative:
	struct String : public Object {
	    const int len;
	    char *c;
	    String(int l) : len(l) { c = new char[l]; }
	}
	String *s = new String(6);
The s structure takes 3 words (with vtable), i.e. 16 bytes.
The c string takes 6 bytes plus a 4-byte-length, i.e. 16 bytes.
Total: 32 bytes, i.e. double.

If no hidden count is allocated for s->c (since the compiler
knows that char's destructor is empty), it still takes 24 bytes.

The example is admittedly atypical, but it illustrates the overheads.
(It is possible that the embedded implementation may take more room,
if the string is just the "wrong" size such that the fixed fields
ush it just beyond a power of two. This is statistically unlikely.)
-- 
	--Per Bothner
bothner@cs.wisc.edu Computer Sciences Dept, U. of Wisconsin-Madison