[comp.std.c++] Tags, typecodes, experience with these things.

chased@rbbb.Eng.Sun.COM (David Chase) (03/13/91)

I've been trying to follow this discussion, but there's a bit too much
heat and a little too little content.  Could someone please sit down
and take the time to summarize what appear to be the two positions?

Now, a small contribution.  I implemented the C-generating back-end
for Modula-3 back when Olivetti had a research center in Menlo Park.
The language specified a "TYPECODE" for each type.  These were
implemented with substantial support from the language system.  The
major differences (for purposes of this discussion) between M-3 and
C++ are:

1) C++ has multiple inheritance, Modula-3 does not.
2) Modula-3 has more "interesting" ways to do information-hiding
   (Strictly speaking, the compiler might not know where member fields
   or functions are located.  This can be resolved at (pre-)link
   time.)
3) M-3 has no constructors or destructors (yet).
4) In M-3, ALL member functions are virtual.

Things to note:

a) If all member functions are virtual, the run-time cost for typecodes
is as small as it can get.  That information can be stored in the
virtual function table at little cost.

b) The (known) lookup algorithms for single inheritance are much
simpler and cheaper than those for multiple inheritance.  "Is x a T"
can be implemented with two comparisons.  "TYPECASE" is more complex,
but take time O(log # cases).  The data structure implementing these
things can be updated in the event of dynamically-loaded code (with
new and old types in it) in a multi-threaded system, with minimal
synchronization requirements (readers must see writes in the same
order than writers make them -- NARROW sometimes takes longer in the
event that a race is lost, but it never lies).

c) TYPECASE is NOT equivalent to "switch".

d) TYPECODES are NOT necessarily repeated from run to run.  Thus, they
are not useful by themselves for saving and restoring pickled
data.  (What if you relink in a different order?  The data should
still be good, etc.)

We used this information to help implement a "pickling" (storing data
to disk) system.  The pre-linker also generated "fingerprints" which
were in fact large (63-bit?) numbers carefully hashed from the
structure of a type, and these served to identify types from run to
run.  (We cheated, and did not also write out the type strings, so
there was a vanishingly small probability of our algorithm being
thwarted.)

At run-time, the interface went from [type or object] to typecode to
type-fingerprint.  The typecodes might vary, but the fingerprints did
not.  To read the data back in, fingerprints were mapped to typecodes
which gave access to a (low-level) interface to virtual function
tables, object size information, user-defined (*) unmarshalling code,
etc.

(*) the default marshalling and unmarshalling code was automatically
generated by a language-processing tool.  It is also possible to
perform a slower interpreted unmarshalling based on the structure of
the type.  Again, this information is stored behind an interface which
makes use of TYPECODEs.

A second use for typecodes was in implementing a verifiable type case
("NARROW") and TYPECASE.  One can argue that TYPECASE is not
object-oriented (I understand that argument, and TYPECASE clearly
becomes less necessary if you have correctly working multiple
inheritance), but NARROW appears to be necessary fairly often in C++
code (perhaps less so when templates are added, but I don't have any
experience with that, so I can't be sure).

We didn't use typecodes for garbage collection, but that was because
(i) it would have been too inefficient and (ii) we never had the time
to do that much engineering of the garbage collector.  In principle,
however, the information to help the GC could be attached to the
virtual function table along with the typecode.

----------------------------------------------------------------

In summary, TYPECODEs were fantastically useful, even if their only
use was to provide a nice name for a type in the interface to run-time
information about types.  Please consider the value of an interface to
run-time type information that actually hides some of the
implementation details.

I also think that the language system should support these features,
but that is probably a religious argument.  Certainly, if nifty
features are confined to objects with vptrs the costs can be kept low.

David Chase
Sun

jimad@microsoft.UUCP (Jim ADCOCK) (03/15/91)

In article <9661@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes:
|I also think that the language system should support these features,
|but that is probably a religious argument.  Certainly, if nifty
|features are confined to objects with vptrs the costs can be kept low.

Help me out.  The obvious implementations to me would be either to 
store such information in a given vtable slot [slot "0" say] or
indirect via a vtable function that is always automatically defined
[for classes that support runtime type info.]  Either approach would
seem to defeat the not-too-uncommon situation in C++ where two classes
can share a vtable in common.  Or am I missing something?

chased@rbbb.Eng.Sun.COM (David Chase) (03/19/91)

In article <71305@microsoft.UUCP> jimad@microsoft.UUCP (Jim ADCOCK) writes:
>In article <9661@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes:
>|I also think that the language system should support these features,
>|but that is probably a religious argument.  Certainly, if nifty
>|features are confined to objects with vptrs the costs can be kept low.
>
>Help me out.  The obvious implementations to me would be either to 
>store such information in a given vtable slot [slot "0" say] or
>indirect via a vtable function that is always automatically defined
>[for classes that support runtime type info.]  Either approach would
>seem to defeat the not-too-uncommon situation in C++ where two classes
>can share a vtable in common.  Or am I missing something?

No, you're not missing anything.  That's the technique I'd use.  I
wasn't aware that space consumed by a vtables was a big deal, except
in those cases where the compilation-and-linking system produced many
vtables for a single class (I've heard about that).  How big do
vtables usually get, how often can they be shared, and (given a
one-per-class implementation) how much space does this sharing usually
save?

David Chase
Sun

jimad@microsoft.UUCP (Jim ADCOCK) (03/23/91)

In article <9997@exodus.Eng.Sun.COM> chased@rbbb.Eng.Sun.COM (David Chase) writes:
|No, you're not missing anything.  That's the technique I'd use.  I
|wasn't aware that space consumed by a vtables was a big deal, except
|in those cases where the compilation-and-linking system produced many
|vtables for a single class (I've heard about that).  How big do
|vtables usually get, how often can they be shared, and (given a
|one-per-class implementation) how much space does this sharing usually
|save?

I guess I'm not convinced either that inability to share vtables is that
big a deal.  However, size of vtables can be limiting in the kind of 
programming that implements many many methods per class, and only changes
a few methods when deriving [ala Smalltalk.]   In such situations, vtable
costs can be large compared to the code size of the few methods being
changed.  For right now, C++ programmers avoid the problem by not using
such a coding style.  Eventually, smarter compilers may be necessary that
know how to dispatch methods using a variety of techniques automatically
chosen to match the situation.  A simple example: at the cost of an 
additional indirection, vtables could be broken into subtables, and the
common subtables shared between classes.  When a method is overridden,
only the subtable needs to change, not an entire vtable.  Still, I don't
think the situation comes up often today where a C++ programmer would be
willing to pay an additional indirection on every dispatch in order to get
smaller vtables.