[comp.lang.c++] Variable sized objects

ben@duttnph.tudelft.nl (Ben Verwer) (12/05/89)

How do you implement variable sized objects in 2.0
Consider the example of a stack from Stroustrup, page 165:

class char_stack {
	int size;
	char *top;
	char s[1];
public:
	char_stack(int sz);
	void push(char c)  // etc...
	char pop()         // etc...
};

char_stack::char_stack(int sz) {
	this = (char_stack*)new char[sizeof(char_stack)+sz-1];
	size = sz;
	top = s;
}

You can allocate a new stack by the statement:
char_stack* mystack = new char_stack(1000);

In 2.0 operator new has be be used instead of assignment to this.
But both operator new and the constructor need to know the size of the object.
Should it be something ugly like (probably not, it does not even compile, 
though I don't understand why not, "too few arguments for method 
`operator new'"):

char_stack* mystack = new(1000) char_stack(1000); // tell the same thing twice

With:
void* char_stack::operator new(size_t SizeWithOneChar, int sz) {
	return (char_stack*) new char[SizeWithOneChar+sz-1];
}

char_stack::char_stack(int sz) { size = sz; top = s; }


A user experiencing an upgrade in a language! Help me can onesome?

-----------------------------------------------------------------------------
Ben Verwer                                                       Lorentzweg 1
Pattern Recognition Group                                      2628 CJ  Delft
Faculty of Applied Physics                                    The Netherlands
Delft University of Technology                                  +31(15)783247
	

shopiro@alice.UUCP (Jonathan Shopiro) (12/05/89)

In article <1020@dutrun.UUCP>, ben@duttnph.tudelft.nl (Ben Verwer) writes:
> How do you implement variable sized objects in 2.0
> Consider the example of a stack from Stroustrup, page 165:
> 
> class char_stack {
> 	int size;
> 	char *top;
> 	char s[1];
> public:
> 	char_stack(int sz);
> 	void push(char c)  // etc...
> 	char pop()         // etc...
> };
> 
> char_stack::char_stack(int sz) {
> 	this = (char_stack*)new char[sizeof(char_stack)+sz-1];
> 	size = sz;
> 	top = s;
> }
> 
> You can allocate a new stack by the statement:
> char_stack* mystack = new char_stack(1000);
> 
> In 2.0 operator new has be be used instead of assignment to this.

Operator new is supplied to support controlling where memory is
allocated for objects, not how much memory is allocated.  The trick
described in ``the book'' is non-portable, implementation-dependent,
and generally a bad idea.  Objects in C++ are always fixed-size.

You can easily make objects _appear_ to be variable sized by using
indirection, e.g.,

	class Char_stack {
		int	size;
		char*	top;
		char*	base;
	public:
			Char_stack(int);
			~Char_stack();
		// etc
	};
	Char_stack::Char_stack(int sz) : size(sz) {
		base = top = new char[size];
	}
	Char_stack::~Char_stack() {
		delete [size] base;
	}

Similar techniques can be used to make an object into a surrogate
for a linked structure such as a list or graph.

By the way, there are two contributors to this list that match
Jonathan Sh[ao]piro.  The twain have met, and will probably meet again,
but still hope not to be confused.
-- 
		Jonathan Shopiro
		AT&T Bell Laboratories, Warren, NJ  07060-0908
		research!shopiro   (201) 580-4229

keffer@blake.acs.washington.edu (Thomas Keffer) (12/08/89)

In article <10213@alice.UUCP> shopiro@alice.UUCP (Jonathan Shopiro) writes:
>In article <1020@dutrun.UUCP>, ben@duttnph.tudelft.nl (Ben Verwer) writes:
>> How do you implement variable sized objects in 2.0
>
>Operator new is supplied to support controlling where memory is
>allocated for objects, not how much memory is allocated.  The trick
>described in ``the book'' is non-portable, implementation-dependent,
>and generally a bad idea.  Objects in C++ are always fixed-size.

Is this now the official "party line"?  I.e., the "trick" in The Book
is just that, and there will be no upwards compatibility from it?

Variable sized objects are used extensively in the Rogue Wave Vector
and Matrix Clases --- if they're an evolutionary dead-end I'll take
them out.

-tk

---
 Dr. Thomas Keffer          | Internet: keffer@ocean.washington.edu
 Rogue Wave                 | BITNET:   keffer%ocean.washington.edu@UWAVM
 Seattle, WA 98145          | uucp:     uw-beaver!ocean.washington.edu!keffer
 (206) 523-5831             | Telemail: T.KEFFER/OMNET

mat@zeus.opt-sci.arizona.edu (Mat Watson) (12/09/89)

In article <4798@blake.acs.washington.edu> keffer@blake.acs.washington.edu (Thomas Keffer) writes:

   In article <10213@alice.UUCP> shopiro@alice.UUCP (Jonathan Shopiro) writes:
   >In article <1020@dutrun.UUCP>, ben@duttnph.tudelft.nl (Ben Verwer) writes:
   >> How do you implement variable sized objects in 2.0
   >
   >Operator new is supplied to support controlling where memory is
   >allocated for objects, not how much memory is allocated.  The trick
   >described in ``the book'' is non-portable, implementation-dependent,
   >and generally a bad idea.  Objects in C++ are always fixed-size.

   Is this now the official "party line"?  I.e., the "trick" in The Book
   is just that, and there will be no upwards compatibility from it?

   Variable sized objects are used extensively in the Rogue Wave Vector
   and Matrix Clases --- if they're an evolutionary dead-end I'll take
   them out.

   -tk

I too am using variable sized objects in my own vector and matrix
classes, and I'd be very disappointed if I couldn't use them.
But I don't see where, in what Jonathan Shopiro wrote, one is kept
from using variable sized objects.  He points out that the assignment
to "this" is non-portable, but he also gives an alternate method.
I just don't see how this implies that variable sized objects
are dead.  If they are I'll just go back to writing code in C.

--Mat

Mat Watson
mat@zeus.opt-sci.arizona.edu [128.196.128.219]
..{allegra,cmcl2,hao!noao}!arizona!zeus.opt-sci.arizona.edu!mat
Optical Sciences Center, Univ. of Arizona, Tucson, AZ 85721, USA

shap@delrey.sgi.com (Jonathan Shapiro) (12/10/89)

There is a lot of misunderstanding about the purpose of operator
new(). Let's see if I can help straighten some of it out.

There are three reasons for wanting to allocate your own storage for
objects, and they are distinct.

1. Multiple Arenas

The new version of operator new() is useful for an environment with
multiple heap arenas.

2. Collection Initialization Control

Operator new is also useful in managing collections, wherein you wish
to allocate some number of contiguous fixed-size objects, then
reallocate later and ensure that only the *new* objects are
constructed.  Consider the following example:

   myClass *cp = new myClass[10];  // allocates AND CONSTRUCTS 10 of them
   ... do some computation ...
   ... decide to realloc ...

   myClass *cp2 = new myClass[newSize];
   for(int i = 0; i < 10; i++)
      cp2[i] = cp[i];

   delete cp;
   cp = cp2;

There are three problems with this code.  First, many more
constructions are done than are necessary or desirable.  Second, it
depends on the user getting operator= right, which they most likely
didn't.  Third, it calls some destructors.  If myClass does reference
counting, lots of things will break.

Consider the following alternative:

    #include <new.h>
    myClass *cp = new myClass[10];
    ... decide to realloc ...
    {
	myClass *cp2 = new char[sizeof(myClass) * newSize];
	(void) memcpy(cp2, cp, sizeof(myClass) * 10);
        // construct only the new ones
	(void) new(cp2[10]) myClass[newSize - 10];
	cp = cp2;
    }

No destructors are called, and only the new items are reinitialized.
This is a case that is handled well by the new variant of operator
new().

3. Abuse of the construction mechanism.

This is the case of allocating truly variable-sized objects.  It is
not addressed by operator new(), nor should it be.  The existing
constructor technology does *not* support this concept.  The closest I
can find a way to come that works properly is as follows:

   class VarObject {
     public:
       VarObject(int bytesize);
   } ;

   VarObject *
   buildVarObject(int bytesize)
   {
       void *p = new char[bytesize];
       return new(p) VarObject(bytesize);
   }

Since one can always allocate the variable-sized component on the
heap, the objective is simply to eliminate the extra dereference.
There are several good reasons not to do this.

First, this object cannot be built on a stack, because it's length
isn't known to the compiler.  The semantic implications of a heap-only
object aren't clear.

Second, the implementation of such objects tends to be convoluted to
later readers.

Finally, the savings obtained tends to be *very* small.  If you only
plan to access a single element in the variable sized portion, the
extra load probably doesn't matter, and if you plan to iterate through
it, you probably want to load the base address of the variable portion
anyway for efficiency.

The real issue is the (largely specious) argument about the cost of
malloc().  If you are truly concerned about the cost of doing the
malloc within new, arrange for new to be overloaded so that you can
allocate and manipulate your own arena.  This is a much smarter
strategy than trying to abuse the mechanisms that are present, and
puts the complexity in a place where the rationale for the complexity
is clear.

Jonathan Shapiro
Silicon Graphics, Inc.

bright@Data-IO.COM (Walter Bright) (12/12/89)

Variable sized objects are quite useful and work fine. The only thing
you cannot do is derive from a variable size class.

Does anyone know how to *prevent* a derivation from a particular class
(i.e. cause a compile-time error if you try it)?

jeenglis@nunki.usc.edu (Joe English) (12/12/89)

bright@dataio.Data-IO.COM (Walter Bright) writes:
>Variable sized objects are quite useful and work fine. The only thing
>you cannot do is derive from a variable size class.
>
>Does anyone know how to *prevent* a derivation from a particular class
>(i.e. cause a compile-time error if you try it)?

One way would be to add language support for variable-sized
objects by allowing zero-length arrays as the last
data member (and *only* as the last member) of a class.  
Then the compiler could take care of disallowing 
inheritance.

Then again, C++ already has enough tricky semantics...


--Joe English

  jeenglis@nunki.usc.edu

jimad@microsoft.UUCP (Jim Adcock) (12/14/89)

In article <2240@dataio.Data-IO.COM> bright@dataio.Data-IO.COM (Walter Bright) writes:
>Variable sized objects are quite useful and work fine. The only thing
>you cannot do is derive from a variable size class.
>Does anyone know how to *prevent* a derivation from a particular class
>(i.e. cause a compile-time error if you try it)?

 Either make the constructor private or make a pure virtual destructor.
(Either approach "does the right thing" in the case of variable sized
 objects)

 IMHO variable-sized objects are a hack, and a very bad hack.  They only
 work for objects made on the heap, and compilers don't really know how
 to correctly handle them in non-trivial cases.  Even if a user manages
 to correctly declare all variable-sized objects on the heap, the compiler
 could decide to generate a temporary of the class *not* on the heap,
 leading to incorrect behavior or at best a run-time crash.  No guarantee
 any of this is going to work on a given present compiler port, or future
 compiler, in any case.  C compilers are free to generate code that
 traps, crashes, or just doesn't do what you expect when you index off
 the end of a structure or an array.  One could expect variable sized
 objects to conflict with future compilers and/or memory management
 schemes.

 [standard disclaimer]

turk@Apple.COM (Ken "Turk" Turkowski) (12/20/89)

In article <1881@odin.SGI.COM> shap@delrey.sgi.com (Jonathan Shapiro) writes:
>First, this object cannot be built on a stack, because it's length
>isn't known to the compiler.  The semantic implications of a heap-only
>object aren't clear.

This is one of the defects of C that C++ has inherited.  Consider:

	int MatrixInvert(double *M, rows, cols)
	{
		double LU[rows * cols];
	}

Here, we need a temporary (auto) array that is used only within this procedure.
Unfortunately, C won't let you do this, although Fortran will (is this a step
forward???).  Alloca (auto allocation, from the stack) was devised to overcome
this, but not all systems support it, so one is left with declaring a maximum
size, hoping that it will never need to be larger:

	#define MAXROWS 10
	#define MAXCOLS 10
	int MatrixInvert(double *M, rows, cols)
	{
	#ifdef ALLOCA
		LU = alloca(sizeof(double) * rows * cols);
	#else !ALLOCA
		double LU[MAXROWS * MAXCOLS];
	#endif ALLOCA
		...
	}

Now the analogy in C++ is:
	class Matrix {
		int _rows, _cols;
		double *_m;

		Matrix(int rows, int cols) {
			_rows = rows;
			_cols = cols;
			_m = new double[rows * cols];
		}

		~Matrix() { delete _m; }

		int Invert() {
			Matrix LU(Rows(), Cols());
			...
		}
	}

In the C++ version, the new operator is called, and the corresponding
delete operator is called automatically upon returning.

My question is:  is there a way to determine whether the matrix is auto
or "new"ed?  If it is possible to determine if it is auto, then
alloca() could be used for allocating memory, without the tremendous
overhead of new/malloc().
-- 
Ken Turkowski @ Apple Computer, Inc., Cupertino, CA
Internet: turk@apple.com
Applelink: TURKOWSKI1
UUCP: sun!apple!turk

bright@Data-IO.COM (Walter Bright) (12/21/89)

In article <5891@internal.Apple.COM> turk@Apple.COM (Ken "Turk" Turkowski) writes:
<My question is:  is there a way to determine whether the matrix is auto
<or "new"ed?  If it is possible to determine if it is auto, then
<alloca() could be used for allocating memory, without the tremendous
<overhead of new/malloc().

Yes, there definitely is a way. However, it is very compiler and machine
dependent. For instance, with ZTC in the C and L memory models, it is
sufficient to compare the segment of a pointer to the object with the
SS register. (ZTC large data models have a separate stack segment,
MSC doesn't, so you'll have to do it a different way for MSC).
For S and M models, you'll have to check the range of the
offset to see if it lies within the stack.

In general, the method would be to see if the pointer points into the
region of the stack. You could define a function:
	int isauto(void *p);
and then implement it for each machine you use.