[comp.lang.c++] Chameleon objects

keith@csli.Stanford.EDU (Keith Nishihara) (12/05/89)

When a class has a constructor and virtual functions, cfront1.2 generates
code within the constructor function to set up the virtual function pointers.
If a class is derived from such a base class, and one of the virtual
functions is called from the base class constructor, the wrong virtual
function is called, since during the execution of the base class constructor,
the virtual function pointer table is set up as for the base class,
and is not changed to show the derived class virtual functions until
the derived class constructor is entered.

Here is a real case:

    I have a hierarchical graphical editor in which the user
    manipulates prototypical objects in a layout.  There are several
    types of primitive proto objects, and also compound proto objects
    (which represent occurrences of other layouts included within a
    higher level layout).  These proto objects are derived from a base
    class `proto'.  In order to support execution, part of the objects
    state is represented as an `instance' class;  primitive proto
    classes create instance classes of themselves for each occurrence
    of a compound proto object representing the layout including the
    primitive proto.

    The implementation of this requires that each proto class define
    a virtual function Instantiate(context), which creates an instantiation
    in the given context.  *All* derived classes are required to have
    at least one instance, defined within a context called top_level.
    The most natural way to achieve this is to have the constructor
    for the base class call the virtual function Instantiate().
    However, at the time that this virtual function is called, the virtual
    function table is set up as if for the base class, and not for the
    derived class.

    class proto
    {
    public:
	proto() { ... Instantiate(top_level); ... }
	virtual void Instantiate(context) { }	// Empty.
	...
    };

    class type_a_proto : public proto
    {
    public:
	virtual void Instantiate(context) { <create instance> }
    };

    What happens is that the empty base class Instantiate function
    is called instead of the type_a_proto::Instantiate function when
    a new type_a_proto is created.

    (The fix, of course, is to call the Instantiate function from each
    derived constructor, instead of the base class.  The problem is that
    it is then not possible to derive further sub classes of a derived type,
    as the intermediate class will create unwanted instantiations of
    themselves as their intermediate constructor functions are called.)

Is this reasonable semantics for virtual function calling?  I know that it
is somewhat bogus to do anything with the derived object before its
constructor has been called, but I do not find anything which says
that it is illegal.  In this case, my virtual function calls are not
actually operating on the derived object, but are rather creating a
parallel structure in a different context.

Do other C++ compilers exhibit the same behaviour?

Neil/.		Neil%teleos.com@ai.sri.com	...decwrl!argosy!teleos!neil

madany@m.cs.uiuc.edu (12/06/89)

/* Written  3:07 pm  Dec  4, 1989 by keith@csli.Stanford.EDU in m.cs.uiuc.edu:comp.lang.c++ */
/* ---------- "Chameleon objects (calling virtual" ---------- */

>>    (The fix, of course, is to call the Instantiate function from each
>>    derived constructor, instead of the base class.  The problem is that
>>    it is then not possible to derive further sub classes of a derived type,
>>    as the intermediate class will create unwanted instantiations of
>>    themselves as their intermediate constructor functions are called.)

Not quite true.  You can define two constructors for each class,
with different numbers of parameters.
For example:

	SubClass::SubClass( parameters ) : Class( parameters, 0 )
	{
		Instantiate(context);
		...
	}
and
	SubClass::SubClass( parameters, int ) : Class( parameters, 0 )
	{
		...
	}

Then the compiler could distinguish between the two constructors.  
For each class, use the constructor without the extra int when
creating objects of that class, and use the constructor with the
int when calling up the constructor chain.

Now you have a way to avoid calling some function twice, you can 
insert code that should only be called once for an object in the constructor
used for object creation, and you can subclass to your hearts content.

>> Is this reasonable semantics for virtual function calling?  I know that it
>> is somewhat bogus to do anything with the derived object before its
>> constructor has been called, but I do not find anything which says
>> that it is illegal.  In this case, my virtual function calls are not
>> actually operating on the derived object, but are rather creating a
>> parallel structure in a different context.

This semantics for calling virtual functions in constructors seems reasonable
to me, though it can be frustrating in cases similar to the one you described.
It makes sense that an object belongs to the class of a currently executing
constructor or destructor and not to any subclass.

-peter madany

roger@procase.UUCP (Roger H. Scott) (12/07/89)

In article <11266@csli.Stanford.EDU> Neil%teleos.com@ai.sri.com writes:
>When a class has a constructor and virtual functions, cfront1.2 generates
>code within the constructor function to set up the virtual function pointers.
>If a class is derived from such a base class, and one of the virtual
>functions is called from the base class constructor, the wrong virtual
>function is called, since during the execution of the base class constructor,
>the virtual function pointer table is set up as for the base class,
>and is not changed to show the derived class virtual functions until
>the derived class constructor is entered.

This is one of the really nasty unsolved problems in C++.  As much as I hate
this behavior, I have to admit that it is really the only correct behavior
for virtual functions.  The function that is called as a result of a virtual
call is determined by the dynamic type of the object, and it seems pretty
clear that the dynamic type of an object executing T::T() is T, regardless
of any subclassing.  The analogous thing holds true during destructors -
during the execution of T::~T() the dynamic type of an object is demoted to T.

Perhaps what is needed here is a "finalization" function that is automatically
invoked by the compiler immediately after normal construction of an object.
The programmer could declare this function to be virtual in the base class
and then redefine it in derived classes to "finalize" the object in the
appropriate way(s).  The syntax below is not a suggestion:

    class Base {
    public:
	Base(); // do invariant Base stuff
	virtual !Base(); // [finalizer] do Base variant of finalization
	...
    };

    class Derived : public Base {
    public:
	Derived(); // do invariant Derived stuff
	!Derived(); // do Derived variant of finalization
	...
    };

    ...
    Base *p = new Derived; // p = (tmp = new Derived, tmp->!Base(), tmp)
    ...

I'm not at all thrilled with the prospect of Yet Another Language Extension,
so here's an approach that works in 2.0 C++ as-is:

    // constructors are private so you won't "forget" to finalize ...
    class Base {
	Base(); // do invariant Base stuff
    protected:
	virtual Base *Finalize(); // do Base variant of finalization
    public:
	static Base *New() {return (new Base)->Finalize();}
	...
    };

    class Derived : public Base {
	Derived(); // do invariant Derived stuff
    protected:
	Base *Finalize(); // do Derived variant of finalization
    public:
	static Derived *New() {return (Derived *)(new Derived)->Finalize();}
	...
    };

    ...
    Base *p = Derived::New();
    ...

[Digression #1]
By the way, an advantage to using static member functions for public
construction rather than C++ constructors is that static member functions are
[more nearly] first-class entities in C++ than constructors - you can take
their address and treat (pointers to) them as variables.  Such is not the
case with T::T().

    typedef Base *BaseMaker();

    // Create a Base (or a subclass of Base) and use it ...
    void makeABaseAndDoSomethingWithIt(BaseMaker *makebase) {
	...
	Base *b = (*makebase)();
	...
    }

    void foo() {
	makeABaseAndDoSomethingWithIt(&Base::New);
	// The cast in the following line should not be necessary -
	// see (***) note following.
	makeABaseAndDoSomethingWithIt((BaseMaker *)&Derived::New);
    }

[Digression #2 - for Language Lawyers only]
(***) Note:
"Derived *(*)()" [pointer to function returning pointer to Derived]
should be type compatible with "Base *(*)()" [pointer to function
returning pointer to Base].  These types were compatible in 1.2.
AT&T maintains that these are incompatible for the same reasons that
"Derived **" is incompatible with "Base **", but the two cases are
*not* analogous - there is no danger of "unsafe" things happening
in the former case.  It is not as if you could assign to the
"object" pointed to by a pointer-to-function and thus alter what
will be returned when that p-to-f is called through.

Genuine unsafe example:
    Base *IPointToABase = new Base;
    void f(Base **pp) {
	*pp = IPointToABase; // BECAUSE YOU *CAN* DO THIS ...
    }
    Derived *IPointToADerived = new Derived;
    void g() {
	Derived **mypp = &IPointToADerived;
	f(mypp); // ... YOU *CAN'T* DO THIS, FOR FEAR OF ...
	Derived *dp = *mypp; // ... GETTING A "Base *" HERE!
    }

Bogus pseudo-analogy:
    Base *IReturnABase() {return new Base;}
    void f(Base *(*pf)()) {
	 *pf = IReturnABase; // BECAUSE YOU *CAN'T* DO THIS ...
	  ...
    }
    Derived *IReturnADerived() {return new Derived;}
    void g() {
	Derived *(*mypf)() = &IReturnADerived;
	f(mypf); // ... YOU *SHOULD* BE ABLE TO DO THIS, SECURE
		 // IN THE KNOWLEDGE THAT ...
	Derived *dp = (*mypf)(); // ... THIS CAN'T YIELD A "Base *"!
    }

jimad@microsoft.UUCP (Jim Adcock) (12/08/89)

In article <11266@csli.Stanford.EDU> Neil%teleos.com@ai.sri.com writes:
>When a class has a constructor and virtual functions, cfront1.2 generates
>code within the constructor function to set up the virtual function pointers.
>If a class is derived from such a base class, and one of the virtual
>functions is called from the base class constructor, the wrong virtual
>function is called, since during the execution of the base class constructor,
>the virtual function pointer table is set up as for the base class,
>and is not changed to show the derived class virtual functions until
>the derived class constructor is entered.
>
>Do other C++ compilers exhibit the same behaviour?
>
>Neil/.		Neil%teleos.com@ai.sri.com	...decwrl!argosy!teleos!neil

One would hope so, since this seems to be the behavior described in
section 12.7 page 83 of the C++ Reference Manual (short quote:)

"Member functions may be called in constructors and destructors.  This implies
 that virtual functions may be called (directly or indirectly).  The function
 called will be the one defined in the constructor's (or destructor's) own
 class or its bases, but *not* any function redefining it in a derived class.
 This ensures that unconstructed objects will not be accessed during
 construction or destruction.  For example: ...."

 ---

Given a class Base, and a class Derived, an object created of class Derived
is done so by first invoking a Base constructor on the object, at which
time the object is a Base, then by invoking a Derived constructor on the
object, at which time it has become a Derived.  [Likewise in multiple 
inheritence, a MIDerived starts off solely as a Base1 during the Base1
construction phase, is solely a Base2 during the Base2 construction
phase, and only becomes a MIDerived (and thus also a Base1 and Base2)
during the MIDerived construction phase.]  This description is also true
(in reversed order) during destruction.  After the Derived destructor is
called, the object has reverted to only being a Base, after the Base
desctructor is called it isn't (hardly(*)) anything. 

(hardly(*))
   == fudge factor to account for static class member functions operator new
      and operator delete.

vinoski@apollo.HP.COM (Stephen Vinoski) (12/08/89)

In article <11266@csli.Stanford.EDU> Neil%teleos.com@ai.sri.com writes:
>If a class is derived from such a base class, and one of the virtual
>functions is called from the base class constructor, the wrong virtual
>function is called, since during the execution of the base class constructor,
>the virtual function pointer table is set up as for the base class,
>and is not changed to show the derived class virtual functions until
>the derived class constructor is entered.

From Lippman's C++ Primer, page 352:

  "There are three cases in which an invocation of a virtual function is
resolved statically at compile time:

  1.  When a virtual function is invoked through an object of the class type.
                                   .
                                   .
                                   .

  2.  When a virtual function is explicitly invoked through a pointer or
reference using the class scope operator.
                                   .
                                   .
                                   .

  3.  When a virtual function is invoked within either the constructor or the
destructor of a base class.  In both cases, the base class instance of the
virtual function is called since the derived class object is either not yet
constructed or already destructed."

I don't see how it could be done any other way.


-steve

| Steve Vinoski       | Hewlett-Packard Apollo Div. | ARPA: vinoski@apollo.com |
| (508)256-6600 x5904 | Chelmsford, MA    01824     | UUCP: ...!apollo!vinoski |
| "My second wife isn't even born yet."                                        |

strick@osc.COM (henry strickland) (12/12/89)

In article <47479d37.12160@espol> roger@procase.UUCP (Roger H. Scott) writes:
>
>[Digression #2 - for Language Lawyers only]
>"Derived *(*)()" [pointer to function returning pointer to Derived]
>should be type compatible with "Base *(*)()" [pointer to function
>returning pointer to Base].  These types were compatible in 1.2.

In 1.2 you did get away with this, because there were no 
multiply inherited bases buried within an object -- all the
bases overlaid each other, starting at the same address.

I don't think this is for Language Lawyers only -- 
all 2.0 users need to understand [it took me a while to realize it,
and I'm still finding cases where I had missed one of the aftershocks
of it] that all pointers (of different pointer-to-base-class types) 
to an object do *not* necessarily contain the same absolute value.

The following program demonstrates a simple example where
(Base*)&x and (Derived*)&x are not the same absolute value,
for a Derived x.

==============================================

strick@gwarn /tmp 445 % cat > ex.c
        extern "C" void printf( char const*, ... );

        class Base { int b; };
        class Other { int o; };
        class Derived : public Other, public Base { int d; };

        Derived x;

        main(int, char*[] )
        {
                ::printf("%x %x\n", (Base*) &x, (Derived*) &x);
                return 0;
        }

strick@gwarn /tmp 446 % CC2 ex.c
CC2  ex.c:
/usr/local/bin/gcc     ex.c -lC2
strick@gwarn /tmp 447 % a.out
20094 20090
strick@gwarn /tmp 448 %

==============================================

If you add the following lines to the end of the above code, 
you have a counterexample, that would be wrong if it would compile.

	typedef Base *(*PFPBase)();
	typedef Derived *(*PFPDerived)();

	Base*	  BaseOfX() { return &x; }

	void func()
	{
		PFPBase f= BaseOfX;

		Base* mumble= (*f)();

		PFPDerived g= BaseOfX;  // error:
				// bad initializer type Base *(*)() 
				// for g ( PFPDerived  expected)

				// the above would lead to the next line not 
				// working right, if the compiler allowed 
				// it, because fleezle would get the absolute
				// value (Base*)&x rather than 
				// the absolute value (Derived*)&x .

		Derived* fleezle= (*g)();
	}

===============================================

Language Laypeople will be bit by all this on simple chains like

	(Base*) (void*) &x;       // ERROR NOT CAUGHT BY COMPILER
				  // (because you used casting)

which could easily happen if you store the address of your x into
a (void*)-collection object, then take it from the collection and
cast it to a (Base*).  This will NOT work with multiple inheritance.

===============================================

					strick@osc.com   uunet!osc!strick
					( formerly strick@gatech.edu )

dove@joker.uucp (Webster &) (12/17/89)

In article <47479d37.12160@espol> roger@procase.UUCP (Roger H. Scott) writes:

   From: roger@procase.UUCP (Roger H. Scott)
   Subject: Re: Chameleon objects (calling virtual functions from constructors)
   Date: 7 Dec 89 09:01:00 GMT

   Perhaps what is needed here is a "finalization" function that is automatically
   invoked by the compiler immediately after normal construction of an object.
   The programmer could declare this function to be virtual in the base class
   and then redefine it in derived classes to "finalize" the object in the
   appropriate way(s).

Yes Please!!  We should have a means of automatically performing
"final integration" of objects after construction is complete.
--
		Dr. Webster Dove
		Special Computing Applications
		Advanced Technology Engineering
		Sanders Associates (a Lockheed Company)
		uunet!rocket!dove

brad@sqwest.sq.com (Brad Might) (12/20/89)

>   From: strick@osc.COM (henry strickland)
>
>   The following program demonstrates a simple example where
>   (Base*)&x and (Derived*)&x are not the same absolute value,
>   for a Derived x.

>	... code example

	Can I do the following with multiple inheritance then ?

	class Derived : public Other, public Base ...

	some fn Foo returns *Other.

	can i do 

	Derived *d = (Derived *)Foo() ; 

	as I could if Derived was derived strictly from Other ?

	
-- 
Brad Might					brad@sq.com (brad@sq ?)
SoftQuad West					brad!sq!utzoo!...
(604) 585-1999

roger@decvax.UUCP (Roger H. Scott) (01/04/90)

In article <1727@osc.COM> strick@osc.com (henry strickland) writes:
>In article <47479d37.12160@espol> roger@procase.UUCP (Roger H. Scott) writes:
>>
>>[Digression #2 - for Language Lawyers only]
>>"Derived *(*)()" [pointer to function returning pointer to Derived]
>>should be type compatible with "Base *(*)()" [pointer to function
>>returning pointer to Base].  These types were compatible in 1.2.
>
>In 1.2 you did get away with this, because there were no 
>multiply inherited bases buried within an object -- all the
>bases overlaid each other, starting at the same address.
>
>...
>The following program demonstrates a simple example where
>(Base*)&x and (Derived*)&x are not the same absolute value,
>for a Derived x.
==============================================
        extern "C" void printf( char const*, ... );

        class Base { int b; };
        class Other { int o; };
        class Derived : public Other, public Base { int d; };

        Derived x;

        main(int, char*[] )
        {
                ::printf("%x %x\n", (Base*) &x, (Derived*) &x);
                return 0;
        }

	typedef Base *(*PFPBase)();
	typedef Derived *(*PFPDerived)();

	Base*	  BaseOfX() { return &x; }

	void func()
	{
		PFPBase f= BaseOfX;

		Base* mumble= (*f)();

		PFPDerived g= BaseOfX;  // error:
				// bad initializer type Base *(*)() 
				// for g ( PFPDerived  expected)
		
				// the above would lead to the next line not 
				// working right, if the compiler allowed 
				// it, because fleezle would get the absolute
				// value (Base*)&x rather than 
				// the absolute value (Derived*)&x .
### Of course this is wrong - you are going the wrong way!  I claimed that
    "Derived *(*)()" should be compatible with "Base *(*)()", not vice versa
    as you are doing here.  With this clarification do you still maintain that
    I am wrong and that the 2.0 behavior is right and/or necessary?

		Derived* fleezle= (*g)();
	}
>

shopiro@alice.UUCP (Jonathan Shopiro) (01/06/90)

In article <64@espol.decvax.UUCP>, roger@decvax.UUCP (Roger H. Scott) writes:
> In article <1727@osc.COM> strick@osc.com (henry strickland) writes:
> >In article <47479d37.12160@espol> roger@procase.UUCP (Roger H. Scott) writes:
> >>
> >>[Digression #2 - for Language Lawyers only]
> >>"Derived *(*)()" [pointer to function returning pointer to Derived]
> >>should be type compatible with "Base *(*)()" [pointer to function
> >>returning pointer to Base].  These types were compatible in 1.2.
> >
> >In 1.2 you did get away with this, because there were no 
> >multiply inherited bases buried within an object -- all the
> >bases overlaid each other, starting at the same address.
> >
> >...
class Base { int b; };
class Other { int o; };
class Derived : public Other, public Base { int d; };

Derived x;

// Aren't these typedefs easier to read than what you wrote?
typedef Base*  FPBase();
typedef Derived*  FPDerived();

Base*	  BaseOfX() { return &x; }
Derived*  DerivedOfX() { return &x; }

void func()
{
	FPBase* f = &BaseOfX;	// okay
	FPDerived*  g = &DerivedOfX;	// okay

	Base* mumble = (*f)();	// okay
	Base* fleezle = (*g)();  // works okay, takes care of offset
	// compare the generated code with the previous

// If I remember correctly, this is what you'd like to be able to do.
	f = &DerivedOfX;  // or even harder ...
	f = g;
// While this is apparently type-safe, the compiler can't do it because
// FPBase and FPDerived are used differently, as the above example shows.
// The only way this could be done is for the compiler to lay down a little
// function that did the offset adjustment and then use the address of
// that.  The first assignment to f could be done with a hidden static
// function, but for the second assignment it would be much harder
// (suppose the value of g changes later, suppose func is called
// recursively).
}
-- 
		Jonathan Shopiro
		AT&T Bell Laboratories, Warren, NJ  07060-0908
		research!shopiro   (201) 580-4229