[comp.lang.c++] Record layout for multiple inheritance

pugh@mimsy.UUCP (Bill Pugh) (07/12/89)

I've several rumors about techniques for record layout with multiple
inheritance, but no solid information. Does anyone know for any 
results in this area?

The basic problem is as follows. We have record types with the following
fields:
  type x = {a : int, b : int}; 
	y = {a : int, c : int};
	z = {a : int, b : int, c : int};

  We wish to have z be a subtype of both x and y. How can we layout the
fields in x, y and z so that, for example, we can extract the field c
in constant time from a record that might be either type y or z.

  One answer is to place a at offset 0, b at offset 4 and c at offset -4. 
By not always having pointers to records point to the first field, we
can get simple constant time access to fields. It is not always possible
to layout fields this way (for example, it is not possible in the case of
triple inheritance).

  I heard that Bjarne Stroustrup originated this idea, but have been unable
to find any references. In his book, Bertrand Meyer also claims to have a 
technique for doing constant-time access (section 15.4.2), although he
does not give any details. Is this described anywhere?

In summary, does anyone have pointers to published work on this technique?

	Bill Pugh
	Assistant Professor
	University of Maryland, College Park
	pugh@mimsy.umd.edu

florman@randvax.UUCP (Bruce Florman) (07/13/89)

In article <18498@mimsy.UUCP> pugh@mimsy.UUCP (Bill Pugh) writes:
>I've several rumors about techniques for record layout with multiple
>inheritance, but no solid information. Does anyone know for any 
>results in this area?
>
>The basic problem is as follows. We have record types with the following
>fields:
>  type x = {a : int, b : int}; 
>	y = {a : int, c : int};
>	z = {a : int, b : int, c : int};
>
>  We wish to have z be a subtype of both x and y. How can we layout the
>fields in x, y and z so that, for example, we can extract the field c
>in constant time from a record that might be either type y or z.
>
>  One answer is to place a at offset 0, b at offset 4 and c at offset -4. 
>By not always having pointers to records point to the first field, we
>can get simple constant time access to fields. It is not always possible
>to layout fields this way (for example, it is not possible in the case of
>triple inheritance).
>
>  I heard that Bjarne Stroustrup originated this idea, but have been unable
>to find any references. In his book, Bertrand Meyer also claims to have a 
>technique for doing constant-time access (section 15.4.2), although he
>does not give any details. Is this described anywhere?
>
>In summary, does anyone have pointers to published work on this technique?


    I don't have any pointers to published material, but I can tell you 
how I've gone about it in an object system that I wrote.  As it happens, 
the system that I wrote was Lisp based, but the techniques are applicable 
anywhere.  I don't know how Eiffel or C++ (or Smalltalk or Trellis-Owl or 
whatever) do it, but I suspect that they use some variation on this 
approach.

    First, the exact nature of the object system in your example above is 
not clear to me.  Does the fact that both x and y have a field named `a' 
indicate that they are both subtypes of some other record with a field 
named a?  To facilitate my explanation, I will change the example a little 
bit, so we won't have to deal with this question.

    Suppose the source declarations are as follows (in some pseudo-
Pascalish syntax).

    X = object()
	a: integer;
	b: integer;
    end;

    Y = object()
	c: integer;
	d: integer;
	e: integer;
    end;

    Z = object(X, Y)
	f: integer;
    end;

This syntax is intended to convey that type Z inherits from both X and Y, 
and it also defines its own field named f.

    At compile-time, we can determine an offset to each of these fields, 
which I will call the "field offset."  To simplify this example, I will 
assume that the size of an integer is 1.  You can scale the offsets 
appropriately for your own architecture.  So the offsets associated with 
each field are ...

	X.a -> 0
	X.b -> 1

	Y.c -> 0
	Y.d -> 1
	Y.e -> 2

	Z.f -> 0

    At link-time (system assembly time in Eiffel), it is possible to 
assign a unique integer index, called the "class index" to every class 
in the system.  For this simple example the indices could be ...

	X -> 0
	Y -> 1
	Z -> 2

    At run-time, there is a data structure called the "class descriptor" 
associated with each class in the system.  One of the things stored in 
the class descriptor is a dope vector, which is indexed by the class 
indices.  The values stored in this vector are offsets from the address 
of an object to the start of that class' fields, and we'll call them the 
"dynamic offsets."  For our simple example, the class descriptors would 
look something like this.

	X -> dope vector: [ 1, -, - ] other stuff: ...
	Y -> dope vector: [ -, 1, - ] other stuff: ...
	Z -> dope vector: [ 1, 3, 6 ] other stuff: ...

The dope vector for class X has a dynamic offset at index 0, because 0 
is the class index of X.  The other entries are unused, because X does 
not inherit from Y or Z.  Similarly, the dope vector for Y has only one 
valid entry, at index 1.  Z on the other hand, which inherits from both 
X and Y, and also has its own fields, has valid dynamic offsets at all 
the indices.

    So finally, the run-time layout of the instances of these classes is 
as follows.

	instance of X:	[ X, a, b ]
	instance of Y:	[ Y, c, d, e ]
	instance of Z:	[ Z, a, b, c, d, e, f ]

The first chunk of data in an object is something which identifies 
the class to which it belongs.  This might be a pointer to the class 
descriptor itself, or it might simply be a tag.  The remainder of the 
fields are laid out sequentially, keeping all the fields from any one 
class together.

    Now, to access field d of an object which might be an instance of 
either Y or Z, the following steps are executed.

    1.	Fetch the class descriptor from the object.
    2.	Fetch the dope vector from the class descriptor.
    3.	Fetch the dynamic offset at index 1 from the dope vector (since 
	1 is the "class index" of Y, and d was defined in class Y).
    4.	Add 1 to the dynamic offset (since 1 is the "field offset" of d), 
	giving d's actual offset from the address of the object.

In the case where the object is an instance of Y, the value fetched in 
step 3 is 1, so the actual offset of d is 2.  In the case where the 
object is an instance of Z, the value fetched in step 3 is 3, so the 
actual offset of d is 4.  We can see from the run-time object structures 
that these offsets are correct.  If the compiler had chosen to lay out 
an instance of Z differently, say like [ Z, f, c, d, e, a, b ], then Z's 
dope vector would have looked like [ 5, 2, 1 ] and d's actual offset 
would be 2 + 1 = 3.

    From an efficiency standpoint, this may look rather grim, since 
accessing a field requires three memory fetches and another addition.  
Fortunately there are ways to optimize some of this away.  For instance, 
if the dope vector is the first item within the class descriptor, then 
step 2 is essentially a no-op.  Also, in well designed o-o code, most of 
the field accesses will be from the "self" object ("current" in Eiffel, 
"this" in C++), therefore the dope vector and even the dynamic offsets 
may be cached.  Thus the *average* cost of computing the address of a 
field may approach the cost of the single addition, which isn't too bad.

    As I said before, I don't know for sure what techniques are used in 
Eiffel or C++.  This is how I've done it.  If anyone sees any glaring 
inefficiencies in this system, I'd really like to hear about them.  One 
obvious place to look for improvement is in the storage of the dope 
vectors.  When the inheritance hierarchy is very wide but shallow, the 
dope vectors will have a whole lot of empty space in them.  I imagine 
that someone can find a clever scheme to pack them into a smaller space.  
Any suggestions?


>	Bill Pugh
>	Assistant Professor
>	University of Maryland, College Park
>	pugh@mimsy.umd.edu

    Bruce Florman
    Associate Computer Scientist
    The RAND Corporation
    florman@rand.org

enuutila@hila.hut.fi (Esko Nuutila) (07/21/89)

In article <2129@randvax.UUCP> florman@rand-unix.UUCP (Bruce Florman) writes:

[The description of the record layout mechanism deleted]

>    As I said before, I don't know for sure what techniques are used in 
>Eiffel or C++.  This is how I've done it.  If anyone sees any glaring 
>inefficiencies in this system, I'd really like to hear about them.  One 
>obvious place to look for improvement is in the storage of the dope 
>vectors.  When the inheritance hierarchy is very wide but shallow, the 
>dope vectors will have a whole lot of empty space in them.  I imagine 
>that someone can find a clever scheme to pack them into a smaller space.  
>Any suggestions?

Here is my proposal. The basic assumption here is that the empty
positions in the dope vectors are not needed in any way. They might be
used, for example, for run-time type checking, and in that case the
following packing mechanims cannot be used.

I am going to explain the mechanism by giving an example.

Let us have classes A, B, C, D, E, F.

class A		{var a1, a2, a3: integer}
class B 	{var b1, b2: integer}
class C(A, B) 	{var c1, c2, c3, c4: integer}
class D(C) 	{}
class E 	{var e1: integer}
class F(A, E)   {var f1, f2: integer}

The run-time object layouts are (this is one possibility):

instance of A: [A, a1, a2, a3]
instance of B: [B, b1, b2]
instance of C: [C, a1, a2, a3, b1, b2, c1, c2, c3, c4]
instance of D: [D, a1, a2, a3, b1, b2, c1, c2, c3, c4]
instance of E: [E, e1]
instance of F: [F, a1, a2, a3, e1, f1, f2]

(In position 0, there is the class identifier)

The dope vectors are:
   A   B   C   D   E   F
A  1   -   -   -   -   -
B  -   1   -   -   -   -
C  1   4   6   -   -   -
D  1   4   6  10   -   -
E  -   -   -   -   1   -
F  1   -   -   -   4   5

First, we find out that the D column is unnecessary. Class D has no
local variable declarations, thus they cannot be accessed. All columns
corresponding to a class with no local variable declarations can be
deleted.  The result is here:

The dope vectors are:
   A   B   C   E   F
A  1   -   -   -   -
B  -   1   -   -   -
C  1   4   6   -   -
D  1   4   6   -   -
E  -   -   -   1   -
F  1   -   -   4   5

This saves some space, but we can do better. First, we notice that
those positions in dope vectors that are empty are never going to be
accessed. Then we forget the implicit assumption that we should have
different dope vectors for different classes and try to merge the dope
vectors into a single dope vector that contains all offsets. In our
case this goes as follows:

A              1   -   -   -   -
B          -   1   -   -   -
C              1   4   6   -   -
D              1   4   6   -   -
E  -   -   -   1   -
F              1   -   -   4   5
Result         1   4   6   4   5

The class descriptor of class X then contains an offset to the
starting point of the dope vector for class X. In our case the offsets
would be A : 0, B : -1, C : 0, D : 0, E : -3, and F : 0. In order to
achieve a faster access it would be better to store a memory address
in the class descriptor instead of the offset. In our case the
addresses would be A : i, B : i-1, C : i, D : i, E : i-3, and F : i,
where i is the address of the merged dope vector. The fields are
accessed in the same way as in the original mechanism, only the dope
vector access is different.

Computing an optimal merge is very likely an NP-complete problem.
However, a simple algorithm that merges the dope vectors one after
another with the result vector works well. When merging a dope vector
with the result vector, we first find out the left-most position of
the dope vector that contains an offset. Then we match the dope vector
with the result vector so that the left-most position of the dope
vector that contains an offset is compared with the left-most position
of the result vector etc. Comparison is successful if at every
position the result vector and the dope vector have the same offset,
or one of them contains no offset at that position. If the comparison
succeeds, the dope vector is merged with the result vector yielding a
new result vector. Else we shift the dope vector one position to the
right and try again.

For example, if the result vector is currently

1 3 - 4 - 2

and the next dope vector is

- 1 4 - - 5

the first comparison is

  1 3 - 4 - 2
- 1 4 - - 5

No match, we shift dope vector to the right:

  1 3 - 4 - 2
  - 1 4 - - 5

No match, shift right

  1 3 - 4 - 2
    - 1 4 - - 5

Match, the new result vector is

  1 3 1 4 - 2 5

What about the savings? In the example problem the unpacked dope
vectors take 6*6=36 positions. After merging, only 5 positions are
needed, that is about 14%. One might wonder whether this is only a
well chosen example. To get a more realistic picture about the savings
I implemented a Lisp program in the Symbolics that computes the dope
vectors for Flavors and merges them. As a test input data I used
sys:*all-flavor-names*, i.e. a list that contains all flavors that are
defined in the system. In our case (length sys:*all-flavor-names*)=2680.
1504 flavors contained no instance variables. With no packing the dope 
vectors would have taken 7182400 positions. The result vector of the
merge takes 4027 positions (0.056%)! I have also tried with smaller
examples, e.g., all flavors in package TCP-IP and their component
flavors. There were 215 flavors, and the result vector size was 477 =
1% of the unpacked size. 

Bruce Florman told me that the actual implementation of his system
utilizes the fact that the "dope vector matrix" is triangular (see the
example above) and does not allocate space for the dope vectors beyond
the main diagonal. So the savings compared to his system are slightly
different.  For example, in the sys:*all-flavor-names* example the
figures would be 3592540 instead of 7182400 compared to 4027, i.e.
0.112% instead of 0.056% of the unpacked size.

Similar techniques may have been used in compilers for multiple
inheritance languages, but I don't have any references. I got the
article by Krogdahl in Bit vol 25. 1985, that was mentioned in
comp.lang.misc, but the scheme used there was different. He claims
that he has a solution to the problem that takes no extra space, but
it is described in a technical report that I have not found yet.

>    Bruce Florman
>    Associate Computer Scientist
>    The RAND Corporation
>    florman@rand.org

Esko Nuutila
Helsinki University of Technology
Department of Computer Science
Internet: enuutila@hila.hut.fi

Internet: enuutila@hila.hut.fi