[comp.lang.eiffel] gnu c++ mi class layout

strick@osc.COM (henry strickland) (07/15/89)

[]	CLASS LAYOUT FOR G++ (1.35.1- / sun3-os4-nfp)
[]	
[]	Henry Strickland    <strick@osc.com>   (also <strick@gatech.edu>)
[]	Object Sciences Corporation      Menlo Park  CA

I have attempted to reverse-engineer how G++ (v1.35.1-) allocates
storage for classes, including multiple inheritance and virtual base
cases.  This was done by trying things and looking at the output:  so
it could be wrong.  This is for sun3 with os level 4 and no floating
point processor, but I don't see why that would matter any.

IF ANYONE KNOWS WHERE THE FOLLOWING IS WRONG, PLEASE LET ME KNOW!

Also, if anyone knows if cfront 2.0 does things differently,
please let me know.  I do not have access to cfront 2.0 yet.

I am most hesitant about the algorithm at the end for determining 
virtual base placement.





1. a simple structure

	A struct has fields for the variables in order.  There may be
	some gaps (dead fields) to meet alignment constraints.  
	Funny cases exist,  e.g. structs with no fields, zero-length
	arrays, bit fields, and they're probably compiler-dependent.
	Nonvirtual methods (including inlines) require no storage in
	an instance.

	EXAMPLE

		struct A {
			char	a;		/* variables */
			int	x,y;
			short	b;
			double	z;

			void	print(Stream*);	  /* methods */
		};

	memory layout	 A* -->	| 1: a	
				| 1: gap
				| 4: x
				| 4: y
				| 2: b
				| 0: gap? (not on sun3, but on vax, i think)
				| 8: z

2. a simple structure with a virtual function table

	There will be ONE anonymous field ("v_ptr") of type (void*) at
	the end of the structure IF the struct has one or more virtual
	functions.  The format of this virtual function table dependent
	on whether multiple inheritance is used or not;  anyway, it is
	not described in this document.

	EXAMPLE
		struct B {
				char	a;		/* variables */
				int	x,y;
				short	b;
				float	z;

				void	print(Stream*); /* methods */

			virtual	void	one();		/* virtual methods */
			virtual	int	two(char*);
			virtual	int	one(int);
			virtual	char*	three(char*, int);
		};

	memory layout  B* -->	| 1: a	
		   		| 1: gap
				| 4: x
				| 4: y
				| 2: b
				| 0: gap?
				| 8: z
				| 4: vt$B	/* virtual func table ptr */
				|
				|                                |
				|                                |
				V   increasing memory addresses  V

3. single or multiple inheritance, non-virtual bases

	The non-virtual superclasses are allocated, in order of
	appearance on the class definition line, and then the member
	variables and possibly the virtual function pointer of the
	current struct.  If the superclasses themselves have 
	superclasses or virtual functions, then the layout method
	is recursive, with each superclass having a full layout
	within itself.

	EXAMPLE
		struct X { /* ... sizeof X is 20 ... */ };
		struct Y { /* ... sizeof X is 12 ... */ };
		struct Z { /* ... sizeof X is 36 ... */ };

		struct M : X, Y, Z {
				char	a;
				int	x,y;
				short	b;
				float	z;

				void 	print(Stream*);

			virtual	void	one();
			virtual	int	two(char*);
			virtual	int	one(int);
			virtual	char*	three(char*, int);
		};

	
	memory layout   M* -->	| 20: X		/* superclasses first */
				| 12: Y	
				| 36: Z	
				| 1: a		/* M's variables begins */
		   		| 1: gap
				| 4: x
				| 4: y
				| 2: b
				| 0: gap?
				| 8: z
				| 4: vt$M	/* virtual func table ptr */

	The case of single inheritance is just the multiple-inheritance
	case with only one superclass.  The supers have their virtual
	function table pointers if they ordinarily need it.  The class
	M only has its own virtual table pointer if M defines new
	virtual functions that did NOT exist in any of the supers.  If
	M only reimplements virtual functions declared in supers, then
	M has no v_ptr of its own.

4. multiple inheritance, with virtual bases 


	A class in which itself or ANY of its supers (or any supers of
	supers, transitively ) has one or more virtual bases, the
	struct will have two sizes:  A "lean" size, which does not
	count the storage for the virtual bases, and a "full" size,
	which does count storage for them.

	Virtual bases are NOT included inside the "lean" structure the
	way that normal bases are.   Instead, only a pointer to the
	virtual base is stored inside the "lean" structure.  These
	virtual base pointers come at the end of the normal
	superclasses and before any variables of the current class.
	There will be one pointer for each virtual base declared as a
	virtual super of the CURRENT class.  If there are virtual bases
	that are found in the closure of supers but NOT listed in the
	current struct as a virtual super, there will be no virtual
	base pointer for this base at this level.  These pointers
	appear in REVERSE order of their declaration:  see the diagram.  


	EXAMPLE
                struct F { /* ... sizeof X is 40 ... */ };
                struct G { /* ... sizeof X is 24 ... */ };

                struct X { /* ... sizeof X is 20 ... */ };
                struct Y { /* ... sizeof X is 12 ... */ };
                struct Z { /* ... sizeof X is 36 ... */ };


		struct M : virtual F, virtual G, X, Y, Z {
                                char    a;
                                int     x,y;
                                short   b;
                                float   z;

                                void    print(Stream*);

                        virtual void    one();
                        virtual int     two(char*);
                        virtual int     one(int);
                        virtual char*   three(char*, int);
                };

	
	*LEAN* layout   M* -->	| 20: X		/* non-virtual 
				| 12: Y		      superclasses first 
				| 36: Z	                    if any */
				| 1: a		/* M's variables begins 
		   		| 1: gap                    if any */
				| 4: x
				| 4: y
				| 2: b
				| 0: gap?
				| 8: z
				| 4: POINTER TO G  /* virtual base pointers */
				| 4: POINTER TO F  /* first last */
				| 4: v_ptr	   /* if any */



	When an object is instantiated (with operator new) a "full"
	structure of the specified class is allocated.  The full
	structure includes the lean structure, above, followed by lean
	structures of all the virtual bases in the superclass closure.

	The order of the virtual bases is not trivial.  The following
	algorithm FindVirtualBaseOrder() seems to find it.  This may
	not be right -- it is hard to check all the possibilities, but
	this explains what I've seen so far, and is a fairly likely
	implementation.  It does have its nutty aspects, though, and
	makes me nervous that I may have missed something.  I would
	have to delve into the compiler code to be absolutely sure.

//////////// BEGIN ALGORITHM

static List b;
static Class orig;

FindVirtualBaseOrder( Class c )
{
	b = NIL;	// collect virtual bases on this list
	orig = c;	// remember the original class

	TraverseClass( c );

	return b;

	// ... the order of the bases in the List b is the order of the	
	//	bases after the lean structure for class c 
	//  i.e.  the most recent addition to b appears first, and the
	//	very first thing that was added comes at the very
	//	end of the full structure.
}

TraverseClass(Class x)   
{
	// this implements a special kind of post-order traversal,
	// in which we visit traverse the supers of our current
	// class, and then record all of our virtual supers.
	// (it is different from a normal post-order in that we do not
	// record each super immediately after traversing it, but
	// rather come back and get them after doing all supers.)
	// It is also strange that the recording of virtual supers
	// is one backwards ONLY for the original class, and done
	// frontwards for virtual supers of any transitively-super class.

	FOR EACH superclass s OF x 
		in the order of declaration in the class definition of x
	{
		TraverseClass( s ) 
	}

	FOR EACH superclass s OF x
		IF s == orig THEN          // the original class
		 in the REVERSE order of declaration in the class def of x
		ELSE			// all supers
		 in the order of declaration in the class definition of x
	{
		IF s is a virtual superclass of x AND
		   s is not already in list b
		{
			add s to the FRONT of b
		}
	}
}

//////////// END ALGORITHM

	EXAMPLE

		class F {...};  
		class G {...}; 
		class H {...};
		class J {...};
		class K {...};
		class L {...};
		class U  {...};  	
		class V  {...};

		class A : virtual U {...};

		class B : virtual V {...};

		class X : virtual U, A, virtual K, B, virtual L {...};	

		class Y : virtual V, virtual F, A, B, virtual J {...};

		class Z : virtual G, virtual U, virtual V, X, Y,
					virtual F, virtual H {...};

 
	Flow of algorithm:

	
	FindVirtualBaseOrder Z 
	TraverseClass Z
		TraverseClass G
		TraverseClass U	
		TraverseClass V	
		TraverseClass X	
			TraverseClass U	
			TraverseClass A	
				TraverseClass U	
				add U			<== U
			TraverseClass B	
				TraverseClass V	
				add V			<== V
			add K				<== K  // frontwards
			add L				<== L
		TraverseClass Y	
			TraverseClass V	
			TraverseClass F	
			TraverseClass A	
				TraverseClass U	
			TraverseClass B	
				TraverseClass V	
			add F				<== F // frontwards
			add J				<== J  
		TraverseClass F	
		TraverseClass H	
		add H					<== H // backwards!!
		add G					<== G

	So the order of the virtual bases is G, H, J, F, L, K, V, U.

	layout of a full class Z: 

	Z* zp = new Z;

		      zp -->  | Z::X::A ->U     // begin A within X within Z
			      |	Z::X::A vars	// variables for A
			      |	Z::X::A vt$A	// virt func table for A
			      |	Z::X::B ->V
			      |	Z::X::B vars
			      |	Z::X::B vt$B
			      |	Z::X ->U      // i.e. pointer to U vars, below
			      |	Z::X vars
			      |	Z::X vt$X	// end X within Z
			      |	Z::Y::A ->U
			      |	Z::Y::A vars
			      |	Z::Y::A vt$A
			      |	Z::Y::B ->V
			      |	Z::Y::B vars
			      |	Z::Y::B vt$B
			      |	Z::Y ->V
			      |	Z::Y ->F
			      |	Z::Y vars
			      |	Z::Y vt$Y	// end Y within Z
			      |	Z ->H		// Z names 5 virtual bases
			      |	Z ->F		// (notice inverse order)
			      |	Z ->V
			      |	Z ->U
			      |	Z ->G
			      |	Z vars
			      |	Z vt$Z		///// end of lean Z ////
			      |	G vars		// begin of virtual bases
			      |	G vt$G		// run algorithm to 
			      |	H vars		// determine ordering
			      |	H vt$H
			      |	J vars
			      |	J vt$J
			      |	F vars
			      |	F vt$F
			      |	L vars
			      |	L vt$L
			      |	K vars
			      |	K vt$K
			      |	V vars
			      |	V vt$V
			      |	U vars
			      |	U vt$U		// end of full structure
				
				
5. other comments

	I have not been able to get a virtual base that has a virtual
	base to compile with "g++ version 1.35.1-".  I always get this
	from the compiler:

	> Failed assertion prev != NULL_TREE at line 2069 of `cplus-search.c'.
	> g++: Program cc1plus got fatal signal 6.

	From my understanding of C++, I see no reason why a virtual
	base cannot have a virtual base,  and I see no reason why the
	above algorithm wouldn't still hold, but I haven't been able to
	try it.

////////////////////////////////////////////////////////////////////////////

					strick@osc.osc.com     415-325-2300
					uunet!lll-winken!pacbell!osc!strick
					( strick@gatech.edu )

				They know their limits 'cause they
				cross 'em every night.           -- DEVO
-- 
					strick@osc.osc.com     415-325-2300
					uunet!lll-winken!pacbell!osc!strick
					( strick@gatech.edu )