[comp.lang.c++] Shippable C++ Objects

davidm@uunet.UU.NET (David S. Masterson) (11/16/89)

			Request for Comments
			  (*please post*)

Based on previous discussions within these groups and some work I am currently
doing, I'd like to get some comments (please post) on the following ideas for
the shipment of C++ objects between processes (and, by implication, the
persistence of the object).  Hopefully, this will generate some interesting
discussions.  Below I list some ideas/requirements for the shippability
between processes of general objects in C++:

1.  Assume an Event interface between processes.  Therefore, when shipping an
object, you can be very sure that something on the other side (in this case,
a callback) will understand what it is because of the event type.

2.  Assume each class of objects has methods for building a shippable byte
array representation of the object and returning a pointer to that shippable
form.  This means that each object is responsible for building a shippable
form of itself.  By implication, the parent of the object and contained
objects can build shippable representations of themselves and, so, are called
at appropriate times by the child's shippable method, which then copies the
information into its shippable form.  This capability MUST exist on an object
for an object to be shipped (the implications for storage are similar if not
the same).

3.  Assume each class of objects has methods for filling (or constructing)
itself from the shippable byte array representation made in (2).  Since the
object knows what it did to create (2), it should know what it can do to get
itself back from that (2) (most likely a reverse operation).  Therefore, all
initilization functions would only need to take a char* (void* ??) pointer.
This also implies that Event objects need only contain some header followed by
an object type and an unknown length byte array.

4.  All parts of an object must be handled in (2) and (3), even to the point
of just deciding that part of an object doesn't need to be shipped.
Therefore, even memory pointers must be resolved in the processing of (2) and
(3).

5.  Base type objects are copiable into shippable objects as is.  Their size
and type will be well-known by both sides of the shipping software, so the
only thing to resolve is recognizing them in the stream of bytes.  If all
other aspects of shippability are resolved, then recognition is not necessary
as they MUST be at well-known spots in the data stream.

6.  Copying of object contents out of shippable form MUST be done with
knowledge of copying the object contents into shippable form.  Suggested might
be that an object gets its parent shippable form, followed by top to bottom
contents of itself (NOTE: next paragraph) and copies those into the shippable
output form for the current object.  The initializer from a shippable form
would then unwind this in a similar manner.

7.  Memory pointers are the special case.  In the process of encountering a
memory pointer, converting to shippable form seems to require a need for a
symbol table.  Basically, the value of the memory pointer would be given to
the symbol table which would return a symbol for that memory pointer (either a
new one or the previous one).  If the symbol table says that this memory
pointer has been seen before, then there is nothing to do other than copy the
symbol (with the Seen_Before flag) into the byte stream.  If the memory
pointer is new to the symbol table, then the symbol is copied to the byte
stream (with the New flag) and the pointer is then followed to produce the
shippable form of what it points to.

8.  This may lead to one extra copy of an object in the shippable stream due
to the potential for a circular pointer chain.  This should be resolvable by
well-known methods and object design criteria.  For instance, it is highly
unlikely (read NEVER HAPPEN) that one object would point into the middle of
another object (especially with the tendancy toward private data), so putting
the memory address of each new object as a whole in the symbol table and a
corresponding symbol into the byte stream should eliminate the possibility of
circular reference problems in object pointer chains.  That is, when
processing an object, enter its memory address into the symbol table so that
if any succeeding objects point to it, no further processing need be done.  In
the case of circular pointers from ivar to ivar within an object, they should
be resolvable through the definition of the object (the designer should be
able to handle it).

9.  Reconstructing the object on the other side would mean nothing more than
copying well-known values from the byte stream into their proper places and
doing some special tracking of pointer symbols in the byte stream to make sure
they get relinked properly.  When a pointer type is encountered in
constructing an object, a symbol had better be on the input stream (trust the
transmission media).  This symbol would be stripped out and the symbol table
queried to determine if it has been seen before.  If not, an object is
new()'ed and a char* pointer to the area after the symbol in the byte stream
would be passed to its init function (after the value of new() is entered in
the symbol table).  If it has, then the value is copied from the symbol table
into the current pointer.

10.  If the methods for build_shippable() and init_from_shippable() are
virtual on the object, then object type for the shippable form will have to be
correct even if the object referred to when build_shippable() is invoked is of
pointer to parent type.  Because the object type will be correct, then the
proper event callback on the other side will get invoked.

11.  The implication of all this is that objects that are virtual in nature
and have memory pointers imbedded in them can be shipped from process to
process without much work and, therefore, object instances may be passed back
and forth without limitation on their representation in the C++ sense.

For example:

	class A {			class X {
		X *alpha;			int abc, cde;
	}				}
	class B:A {			class Y {
		Y *beta;			float m, n;
	}				}
	class C:B {			class Z {
		Z *gamma;			char x[10];
	}				}

would translate into:

	Shippable_of_C {		Symbol		Value
		Label(C);		Label(C)	&C
		Label(C);		Label(C)	&B
		Label(C);		Label(C)	&A
		Label(X);		Label(X)	&X
		Label(X);		Label(Y)	&Y
		int	abc, cde;	Label(Z)	&Z
		Label(Y);
		Label(Y);
		float	m, n;
		Label(Z);
		Label(Z);
		char	x[10];
	}

Decomposition of this should be relatively easy.  The internal format
of Shippable_of_C is not important, so long as (2) and (6) are
followed carefully.

Note that when C enters the address of itself into the Symbol Table,
the returned value will also suffice for both B and A.  The two extra
Label(C) values in the Shippable form are necessary, though, as the
init_from_shippable() function for each object cannot make assumptions
about whether it has children or not.  Therefore, initializing C will
strip the first Label(C) and call initialize of B (which will strip
the second and so on...).

Also note that when (say) B is processing the pointer to Y, it knows
it is processing a pointer and, therefore, should expect Label(Y) on
the input stream.  It can then strip this value and enter it with the
address of the Y object that it will next allocate (with new()).  It
should not call a constructor for Y with the values in the stream
until it has entered the address of Y into the symbol table.  This is
in case of circular references within Y.  So it calls "new Y()",
enters the returned value into the symbol table, then calls "Y.init()"
with the char* (void* ??) pointer to the area after the Label(Y) to
get it initialized.  Y will then strip the expected Label(Y) and enter the
following information in the byte stream into itself.


--
===================================================================
David Masterson					Consilium, Inc.
uunet!cimshop!davidm				Mt. View, CA  94043
===================================================================
"If someone thinks they know what I said, then I didn't say it!"

richard@pantor.UUCP (Richard Sargent) (11/16/89)

[edited quote follows]

> From: cimshop!davidm@uunet.UU.NET (David S. Masterson)
> Newsgroups: comp.lang.c++,comp.object
> Subject: Shippable C++ Objects (RFC)
> Message-ID: <CIMSHOP!DAVIDM.89Nov15101037@uunet.UU.NET>
> Date: 15 Nov 89 18:10:37 GMT
> 
> 			Request for Comments
> 			  (*please post*)
> 
> Based on previous discussions within these groups and some work I am currently
> doing, I'd like to get some comments (please post) on the following ideas for
> the shipment of C++ objects between processes (and, by implication, the
> persistence of the object).  Hopefully, this will generate some interesting
> discussions.  Below I list some ideas/requirements for the shippability
> between processes of general objects in C++:
> 

...

> 
> 3.  Assume each class of objects has methods for filling (or constructing)
> itself from the shippable byte array representation made in (2). ...

Here's the nub of the problem! The rest of the proprosal is quite
straight forward. In fact, I believe that I read a paper on this
in one of the OOPSLA or C++ Workshop proceedings.

"The object can reconstruct itself from the byte stream" requires
that the object already exist. It still seems to me that you *have*
to have a case statement somewhere which knows about all (interesting)
object classes. The cases create each new type of object, probably using
a class constructor which has a ByteStreamRepresentation argument.
The created object is then added into the receiving program's
organization. How to do this from a general purpose class library
without encoding knowledge of the application is another (lesser)
problem.

The real trick will be to eliminate the need for such a switch!
If anyone figures this out for C++, I'm waiting with bated breath
for the answer.

...

> --
> ===================================================================
> David Masterson					Consilium, Inc.
> uunet!cimshop!davidm				Mt. View, CA  94043
> ===================================================================
> "If someone thinks they know what I said, then I didn't say it!"

I sure would like to see a viable general purpose solution to this
problem in C++. I'm afraid that the solution will require a lot
of the facilities from Smalltalk.


Richard Sargent                   Internet: richard@pantor.UUCP
Systems Analyst                   UUCP:     uunet!pantor!richard

peterd@cs.washington.edu (Peter C. Damron) (11/17/89)

In article <CIMSHOP!DAVIDM.89Nov15101037@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes:
>... I'd like to get some comments (please post) on the following ideas for
>the shipment of C++ objects between processes (and, by implication, the
>persistence of the object).

>1.  Assume an Event interface between processes.  Therefore, when shipping an
>object, you can be very sure that something on the other side (in this case,
>a callback) will understand what it is because of the event type.

How does the event type relate to the object type/class?
How do you assure that the object methods are the same in both processes?
I assume that the different processes are in different address spaces.

>2.  Assume each class of objects has methods for building a shippable byte
>array representation of the object ...

>3.  Assume each class of objects has methods for filling (or constructing)
>itself from the shippable byte array representation made in (2)...

It sounds to me like you are implementing write() and read() in 2 & 3.

>4.  All parts of an object must be handled in (2) and (3), even to the point
>of just deciding that part of an object doesn't need to be shipped.
>Therefore, even memory pointers must be resolved in the processing of (2) and
>(3).

If you want to ship a single node, do you have to ship the whole graph
that contains it?

>7.  Memory pointers are the special case.  In the process of encountering a
>memory pointer, converting to shippable form seems to require a need for a
>symbol table...

Now this is the hard part.  What you want is remote & local references
to objects (everything is an object right?).

>9.  Reconstructing the object on the other side would mean nothing more than
>copying well-known values from the byte stream into their proper places and
>doing some special tracking of pointer symbols in the byte stream to make sure
>they get relinked properly...

It sounds like you are limiting the shippable format to be an LL(0) language.
This may be too restrictive.

>10.  If the methods for build_shippable() and init_from_shippable() are
>virtual on the object, then object type for the shippable form will have to be
>correct even if the object referred to when build_shippable() is invoked is of
>pointer to parent type.  Because the object type will be correct, then the
>proper event callback on the other side will get invoked.

How do you represent virtual pointers accross processes?
I get the feeling you thought about data but not code.

I suggest you read about systems that have already tried to deal with
distributed objects, like Eden and Emerald here at University of Washington.
Try the references below for starters.

Hope this helps,
Peter.

---------------
Peter C. Damron
Dept. of Computer Science, FR-35
University of Washington
Seattle, WA  98195

peterd@cs.washington.edu
{ucbvax,decvax,etc.}!uw-beaver!uw-june!peterd
---------------

%T Distribution and Abstract Types in Emerald
%A Andrew P. Black
%A Norman C. Hutchinson
%A Eric Jul
%A Henry M. Levy
%A L. Carter
%J IEEETSE
%V 13
%N 1
%D JAN 1987

%T Fine-Grained Mobility in the Emerald System
%A Eric Jul
%A Henry M. Levy
%A Norman C. Hutchinson
%A Andrew P. Black
%J TOCS
%D FEB 1988

%T Replication in Distributed Systems: The Eden Experience
%A Jerre D. Noe
%A Andrew Proudfoot
%A Calton Pu
%J PROC Fall Joint Computer CONF (FJCC)
%C Dallas, TX
%D NOV 1986
%P 1197-1208

%T The Architecture of the Eden System
%A Edward D. Lazowska
%A Henry M. Levy
%A Guy T. Almes
%A Michael J. Fischer
%A Robert J. Fowler
%A Stephen C. Vestal
%J PROC 8th SYMP on Operating Systems Principles
%D DEC 1981
%P 148-159

db@helium.East.Sun.COM (David Brownell) (11/17/89)

In article <31.UUL1.3#5109@pantor.UUCP>
	richard@pantor.UUCP (Richard Sargent) writes:
>
>> From: cimshop!davidm@uunet.UU.NET (David S. Masterson)
>> Subject: Shippable C++ Objects (RFC)
>> 
>> 3.  Assume each class of objects has methods for filling (or constructing)
>> itself from the shippable byte array representation made in (2). ...
>
>Here's the nub of the problem! The rest of the proprosal is quite
>straight forward. In fact, I believe that I read a paper on this
>in one of the OOPSLA or C++ Workshop proceedings.

It's also, surprise!, essentially the same idea that shows up in
networking for how to serialize and deserialize RPC arguments.
Commercial implementations of the idea (not in C++) include
XDR, ASN.1, and an analogue that NCS's NIDL compiles into (sorry,
I forget the name).

If you've never looked at RPC systems before, do so -- you'll find
that they look very much like distributed object systems.  (Sun's
RPC has a limitation of one object per type per system, but then
again any OO programmer worth his/her salt can implement objects
that manage access to many other objects ... like NFS does for "file"
objects, for example.)

>"The object can reconstruct itself from the byte stream" requires
>that the object already exist. It still seems to me that you *have*
>to have a case statement somewhere which knows about all (interesting)
>object classes. The cases create each new type of object, probably using
>a class constructor which has a ByteStreamRepresentation argument.

Sort of; "object" is distinct from a data format.  The process is taking
data in one format from one data store into another, and constructing a
NEW object by binding methods to the new data representation.  What's
shipped is data, not data plus code, and the new methods might not
implement the class used by the message sender. The key points are needing
to use a constructor function, and needing to choose which of several
constructors to use.

No application will be able to understand all types/classes ... they get
incrementally added to real systems over time.  Apps need to be able to
reject or ignore message types they can't understand by examining a type
code stored early in the message; there's a "default" branch in that
case statement!

>The created object is then added into the receiving program's
>organization. How to do this from a general purpose class library
>without encoding knowledge of the application is another (lesser)
>problem.

One interesting trick, irrelevant to today's C++, is to ship source
code implementing the class (including its deserializing code) to the
message recipient when it needs it.  This is what NeWS does, and
probably some other systems I don't know about.  (I can't see any
reasonable networked implementation of Lisp not supporting this,
for example!)  This still doesn't get away from needing to know the
type of the message ("source code followed by data").

    David Brownell			db@east.sun.com
    Sun Desktop Systems Software	sun!suneast!db
    Billerica, MA

vaughan@mcc.com (Paul Vaughan) (11/17/89)

	"The object can reconstruct itself from the byte stream" requires
	that the object already exist. It still seems to me that you *have*
	to have a case statement somewhere which knows about all (interesting)
	object classes.

Suppose that in the byte stream there is some token that uniquely
identifies the class of the the object being transmitted.  Then that
token could be used to look up in a table a function to be called to
create such an object.  So, you don't have to have a case statement
(other techniques are possible), but you do have to somehow know of
all interesting classes.  A further technique would be to transmit a
pathname through the byte stream and to use that to dynamically load a
class definition.  It would still be important to have a table of the
classes that were already loaded.

 Paul Vaughan, MCC CAD Program | ARPA: vaughan@mcc.com | Phone: [512] 338-3639
 Box 200195, Austin, TX 78720  | UUCP: ...!cs.utexas.edu!milano!cadillac!vaughan

peterd@cs.washington.edu (Peter C. Damron) (11/18/89)

In article <1044@east.East.Sun.COM> db@helium.East.Sun.COM (David Brownell) writes:
>In article <31.UUL1.3#5109@pantor.UUCP>
>	richard@pantor.UUCP (Richard Sargent) writes:

>>> From: cimshop!davidm@uunet.UU.NET (David S. Masterson)
>>> 3.  Assume each class of objects has methods for filling (or constructing)
>>> itself from the shippable byte array representation made in (2). ...

>>Here's the nub of the problem! ...

>It's also, surprise!, essentially the same idea that shows up in
>networking for how to serialize and deserialize RPC arguments.

>>"The object can reconstruct itself from the byte stream" requires
>>that the object already exist. It still seems to me that you *have*
>>to have a case statement somewhere which knows about all (interesting)
>>object classes. The cases create each new type of object, probably using
>>a class constructor which has a ByteStreamRepresentation argument.

It seems to me that this ByteStreamRepresentation is a string language
where the objects are the words or tokens of the language.  The only
way to parse that language in which you know the type of the object
before you see it occurs if the langauge is LL(0) and you are using
a top-down parser.  LL(0) is not a very powerful class of languages.

Given that you want to send your LL(1) or LR(1) string of objects,
you could first convert these objects into LL(0) objects by adding
"syntactic sugar" into the language.  This amounts to adding tokens/objects
that tell the type/class of the following object.  This does not eliminate
the "case statement" described above, but it potentially splits
the case statement accross many syntactic sugar objects and it probably
makes it easier to adapt to changes in the language of the byte stream.

>Sort of; "object" is distinct from a data format.  The process is taking
>data in one format from one data store into another, and constructing a
>NEW object by binding methods to the new data representation.  What's
>shipped is data, not data plus code, and the new methods might not
>implement the class used by the message sender. The key points are needing
>to use a constructor function, and needing to choose which of several
>constructors to use.

This is a good point.  In OO systems, code is bundled with objects.
If you want to maintain the object type/class accross an address boundary,
then code has to get moved/referenced as well as data.

>No application will be able to understand all types/classes ... they get
>incrementally added to real systems over time...

Good point.  See above about the syntactic sugar.


The point that everyone seems to be missing in this disscussion is:

Why do you want to convert objects in the byte streams in the first place?

Why not just move objects "as is" to another address space?
Of course, this implies that you have ways to address objects in another
address space, and that is another problem.

Better yet, just get rid of the different address spaces.
After all, partitioning objects into address spaces in an object
oriented system are just an efficiency hack.

The object is the address space.

Hope this helps,
Peter.

---------------
Peter C. Damron
Dept. of Computer Science, FR-35
University of Washington
Seattle, WA  98195

peterd@cs.washington.edu
{ucbvax,decvax,etc.}!uw-beaver!uw-june!peterd

davidm@uunet.UU.NET (David S. Masterson) (11/18/89)

In article <9832@june.cs.washington.edu> peterd@cs.washington.edu (Peter C. Damron) writes:
   In article <CIMSHOP!DAVIDM.89Nov15101037@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes:
   >1.  Assume an Event interface between processes.

   How does the event type relate to the object type/class?
   How do you assure that the object methods are the same in both processes?

The natural assumption is that if I have a byte stream that represents an
object to be sent to another process, I will issue an event to that process
that will cause that process to invoke the reconstruction method.  My
assumption (perhaps invalid) was that all processes are built from the same
object library, therefore the object methods are the same in both processes
because its the same object class.  There has to be some assumptions that the
project that both processes are involved in are part of the same effort (in
other words, they reuse each others code).

   >2.  Assume each class of objects has methods for building a shippable byte
   >array representation of the object ...
   >3.  Assume each class of objects has methods for filling (or constructing)
   >itself from the shippable byte array representation made in (2)...

   It sounds to me like you are implementing write() and read() in 2 & 3.

Exactly!  Except that this write() and read() are to memory instead of to disk
(this should give a C++ program more flexibility in what to do with that
object).  Now the application, not the object, has the choice of where to send
the object and how to get it there (this might be wrapped in a higher object).

   >4.  All parts of an object must be handled in (2) and (3), even to the
   >  point of just deciding that part of an object doesn't need to be shipped.

   If you want to ship a single node, do you have to ship the whole graph
   that contains it?

Would the node make sense to the receiving process without the object that its
next pointer points to?  If so, then that comes under the heading of "doesn't
need to be shipped" (at least this particular time).  However, if you ship a
node to another process without the object that is referred to by the node's
next pointer (essentially just the data within the node), is it still a node?

   >7.  Memory pointers are the special case.

   Now this is the hard part.  What you want is remote & local references
   to objects (everything is an object right?).

I'm not sure what you mean here.  Each process will use the object in its
non-shipped, internal form, so to each process the object will be local at the
time of its reference.  Its up to the application to decide whether there are
any conflicts (from a lock standpoint) about this way of doing business.

   >9.  Reconstructing the object on the other side would mean nothing more
   >  than copying well-known values from the byte stream into their proper
   >  places and doing some special tracking of pointer symbols in the byte
   >  stream to make sure they get relinked properly...

   It sounds like you are limiting the shippable format to be an LL(0)
   language.  This may be too restrictive.

Why?  The shippable format only exists during the move of an object from one
process to another.  Breaking an object down into simple terms for this
purpose should make it easier to work with.

   >10.  If the methods for build_shippable() and init_from_shippable() are
   >  virtual on the object, then object type for the shippable form will have
   >  to be correct even if the object referred to when build_shippable() is
   >  invoked is of pointer to parent type.  Because the object type will be
   >  correct, then the proper event callback on the other side will get
   >  invoked.

   How do you represent virtual pointers accross processes?
   I get the feeling you thought about data but not code.

Since the processes are dealing with the same class of object, then virtual
pointers should take care of themselves (they must be right for the particular
process).  Code cannot be shipped, therefore the implication is that both
processes are already compiled with the same code.  Since the "linearization"
of the object by the method I have suggested deals with each object in an
object to be shipped individually, virtual pointers can be safely ignored
(they are outside the definition of an object) because each object only deals
with what it knows about.  Therefore, the callback method via the event type
will know what object needs construction and so construct it with the proper
virtual functions already there.

   I suggest you read about systems that have already tried to deal with
   distributed objects...

I'll try, but the company I work for doesn't have ready access to a reference
library.

   Peter C. Damron
   Dept. of Computer Science, FR-35
   University of Washington
   Seattle, WA  98195

   peterd@cs.washington.edu
   {ucbvax,decvax,etc.}!uw-beaver!uw-june!peterd

--
===================================================================
David Masterson					Consilium, Inc.
uunet!cimshop!davidm				Mt. View, CA  94043
===================================================================
"If someone thinks they know what I said, then I didn't say it!"

davidm@uunet.UU.NET (David S. Masterson) (11/18/89)

In article <31.UUL1.3#5109@pantor.UUCP> richard@pantor.UUCP (Richard Sargent) writes:

   [edited quote follows]

   > From: cimshop!davidm@uunet.UU.NET (David S. Masterson)

   ...

   > 
   > 3.  Assume each class of objects has methods for filling (or constructing)
   > itself from the shippable byte array representation made in (2). ...

   Here's the nub of the problem! ...
   The real trick will be to eliminate the need for [...] a switch!
   If anyone figures this out for C++, I'm waiting with bated breath
That was why I suggested the Event interface (a la X windows) as the method
for passing things to new processes.  Each Event would be registered with an
appropriate callback (which might be wrappable as an object).  Therefore,
whenever an application takes on new functionality, it acquires a new callback
which would be registered with the event processor in the usual fashion.  So,
a program capable of accepting an object of (say) XClass would have already
registered the construction callback XClassConstruct with its Event processing mechanism.

--
===================================================================
David Masterson					Consilium, Inc.
uunet!cimshop!davidm				Mt. View, CA  94043
===================================================================
"If someone thinks they know what I said, then I didn't say it!"

dld@F.GP.CS.CMU.EDU (David Detlefs) (11/18/89)

Richard Sargent and Paul Vaughan, among others, have been having a
discussion under this Subject, in which there is a shared assumption
something like:

    When you ship an object across a network, the object needs to
  somehow contain information describing its type; otherwise, it
  cannot be reconstructed.

I think this is a (somewhat) false assumption.  Consider a strongly
typed RPC interface.  The sender must send the right type of object,
or else type-checking would fail.  The receiver knows the expected
type of the RPC argument, and can use his knowledge of the type to
construct (at compile-time) to reconstruct the object from the
bit-stream.  As David Brownell pointed out, all RPC systems solve this
problem.  Masterson's proposal goes farther in that it posits shipping
the transitive pointer-closure of the object, while all RPC systems
that I know about require objects with "in-line" data.

--
Dave Detlefs			Any correlation between my employer's opinion
Carnegie-Mellon CS		and my own is statistical rather than causal,
dld@cs.cmu.edu			except in those cases where I have helped to
				form my employer's opinion.  (Null disclaimer.)

tuck@zeta.cs.unc.edu (Russ Tuck) (11/19/89)

> (Original heading and author lost.  Sorry.)
>	"The object can reconstruct itself from the byte stream" requires
>	that the object already exist. It still seems to me that you *have*
>	to have a case statement somewhere which knows about all (interesting)
>	object classes.

As I understand it (and I'm sure someone will point it out if I'm wrong),
the receiving program only needs to know about the *top-level* classes it 
receives.  Those top-level classes will take care of the other classes
they may contain (ie, as member data), automatically and transparently
to the receiving program.

So there's a case statement or call table containing the a few classes
the receiving program must know about, but there's no need for the receiving
program to know about the lower-level classes used by the implementation
of those top-level classes. 

This is just like a called subroutine must know about the classes named
in its argument list.
 
Russ Tuck		               tuck@cs.unc.edu
UNC Dept. of Computer Science          ...!mcnc!unc!tuck
CB# 3175 Sitterson Hall
Chapel Hill, NC 27599-3175, USA        (919) 962-1755 or 962-1932

davidm@uunet.UU.NET (David S. Masterson) (11/20/89)

In article <4042@cadillac.CAD.MCC.COM> vaughan@mcc.com (Paul Vaughan) writes:

   Suppose that in the byte stream there is some token that uniquely
   identifies the class of the the object being transmitted.  Then that
   token could be used to look up in a table a function to be called to
   create such an object.  So, you don't have to have a case statement
   (other techniques are possible), but you do have to somehow know of
   all interesting classes.  A further technique would be to transmit a
   pathname through the byte stream and to use that to dynamically load a
   class definition.  It would still be important to have a table of the
   classes that were already loaded.

An interesting idea!  Is the capability of run-time loading of libraries of
objects supported within C++ yet?  I would think that, in order for a program
to make use of an object, its declaration would have to be known to the
program at compile time.  Therefore, I don't see how this could be done. :-(

--
===================================================================
David Masterson					Consilium, Inc.
uunet!cimshop!davidm				Mt. View, CA  94043
===================================================================
"If someone thinks they know what I said, then I didn't say it!"

graham@cabernet.newcastle.ac.uk (Graham D. Parrington) (11/20/89)

dld@F.GP.CS.CMU.EDU (David Detlefs) writes:

>I think this is a (somewhat) false assumption.  Consider a strongly
>typed RPC interface.  The sender must send the right type of object,
>or else type-checking would fail.  The receiver knows the expected
>type of the RPC argument, and can use his knowledge of the type to
>construct (at compile-time) to reconstruct the object from the
>bit-stream.  As David Brownell pointed out, all RPC systems solve this
>problem.  Masterson's proposal goes farther in that it posits shipping
>the transitive pointer-closure of the object, while all RPC systems
>that I know about require objects with "in-line" data.

Dave is missing one vital point about O-O systems here - that of
inheritence. Just because I declare a routine to accept an X at
compile time does not mean that I am not free to pass a Y or Z in
it place (providing Y and Z are derived from X). The implications
of this (at least in a C++ context) is that the receiver wil construct
the wrong receiveing type (always a base type and never the correct
derived type) and hence all virtual functions will be dispatched
incorrectly (from the senders point of view). Thus local and distributed
versions of the same program will exhibit different behaviour!


Graham Parrington, Computing Laboratory, University of Newcastle upon Tyne
ARPA  = Graham.Parrington%newcastle.ac.uk@nsfnet-relay.ac.uk
UUCP  = ...!ukc!newcastle.ac.uk!Graham.Parrington
PHONE = +44 91 222 8067

peterson@choctaw.csc.ti.com (Bob Peterson) (11/20/89)

In article <CIMSHOP!DAVIDM.89Nov15101037@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes:
>
>Based on previous discussions within these groups and some work I am currently
>doing, I'd like to get some comments (please post) on the following ideas for
>the shipment of C++ objects between processes (and, by implication, the
>persistence of the object).
>
  You really should visit a library and review the last few proceedings of
relevant conferences and workshops, e.g., ACM OOPSLA Conferences since
1986, the OODB Workshops (one in '86 in California, and a second in '88 in
Germany), and recent ACM SIGMOD Conferences.

>2.  Assume each class of objects has methods for building a shippable byte
>array representation of the object and returning a pointer to that shippable
>form.
>
>3.  Assume each class of objects has methods for filling (or constructing)
>itself from the shippable byte array representation made in (2).
  Seems to me that much of this code should be encapsulated in a
translation class, with an object containing only a description of its
type.  The description is passed to the translation routines.  Now a class
implementor doesn't have to (again) write the same conversion routines,
but simply write a type description.  Even writing the type description
could be automated!

>7.  Memory pointers are the special case.  ...
>                Basically, the value of the memory pointer would be given to
>the symbol table which would return a symbol for that memory pointer (either a
>new one or the previous one).
>
  This assumes that a memory address is constant over the life of the
program.  Is this a valid assumption?  For example, a String class might
reallocate storage if a string changes size, resulting in a different
memory address for what, logically, is the same entity.  Sending the same
object twice should result in the same destination structure, without
orphaned storage, regardless of reallocations that may happen. If a
different memory address results in a different symbol the receiving end
can't recognize that the original value is no longer needed.
  How do you prevent your transfer mechanism from shipping senseless
values, i.e., values that make no sense to the receiving process? Examples
might be a window object, or an I/O buffer, or a hash table maintained as
redundant data purely for performance reasons.
  These and other identity issues are discussed in the paper, "Object
Identity," by Khoshafian and Copeland in the _OOPSLA '86 Conference
Proceeedings_ (published as the November 1986 (Volume 21, Number 11) issue
of ACM SIGPLAN Notices), page 406.  This article is a good discussion of
the identity issue.


>===================================================================
>David Masterson					Consilium, Inc.
>uunet!cimshop!davidm				Mt. View, CA  94043
>===================================================================
>"If someone thinks they know what I said, then I didn't say it!"

    Bob

Bob Peterson            Compuserve: 70235,326        Expressway Site,
Texas Instruments       USENET: peterson@csc.ti.com   North Building,
P.O. Box 655474, MS238  (214) 995-6080                 2nd Floor,
Dallas, Texas, USA 75265                                CSC Aisle C3

kipp@warp.sgi.com (Kipp Hickman) (11/21/89)

To eliminate the ``switch'' implied by the object reconstruction mechanism,
all you need is a dictionary.  The key is the tag which identifies the object
in its byte stream form (a string works nicely).  The value is a pointer to
a function to perform the reconstruction.  Usually, the function is encode
thusly:

		void (*reader)(ObjectReader& source, void* where);

The "source" argument specifies the source of the byte stream, and provides
operations to retrieve data (could be a stream, for instance).   The "where"
argument provides an (optional) address where the object should be placed,
for the cases where the object is imbedded, and shouldn't allocate its
own memory.  In C++ 2.0, you implement the above function as follows:

		void FooReader(ObjectReader& source, void* where)
		{
			Foo* f = new(where) Foo(source);
		}

Lots of details left as an exercise to the reader... :-)

					kipp hickman
					silicon graphics inc.

davidm@uunet.UU.NET (David S. Masterson) (11/22/89)

In article <98968@ti-csl.csc.ti.com> peterson@choctaw.csc.ti.com (Bob Peterson) writes:
   In article <CIMSHOP!DAVIDM.89Nov15101037@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes:
   >Based on previous discussions within these groups...

     You really should visit a library...

I'm getting so much good information here ;-).  But I am looking around...

   >2.  Assume each class of objects has methods for building a shippable byte
   >array representation of the object and returning a pointer to that 
   >shippable form.
   >
   >3.  Assume each class of objects has methods for filling (or constructing)
   >itself from the shippable byte array representation made in (2).

     Seems to me that much of this code should be encapsulated in a
   translation class, with an object containing only a description of its
   type.  The description is passed to the translation routines.  Now a class
   implementor doesn't have to (again) write the same conversion routines,
   but simply write a type description.  Even writing the type description
   could be automated!

Having attempted to do basically what you're saying with some people before
coming to the conclusions in my previous article, I really don't see how the
translation of any object into a shippable form could be encapsulated.  There
is just too much infomation that is within an object's definition to be passed
to another class.  The object itself has the best understanding of what it is
composed of and what would be needed to reconstruct it from the information
that it contains.  What form would the "description" of an object take in
order to pass it to the translation routines (I suspect the answer would be so
complex that you would have wound up doing the most of the translation before
you give anything to the translation routines).

   >7.  Memory pointers are the special case.  ...
   >              Basically, the value of the memory pointer would be given to
   >the symbol table which would return a symbol for that memory pointer
   >(either a new one or the previous one).
   >

     This assumes that a memory address is constant over the life of the
   program.  Is this a valid assumption?

I'm not sure about the life of the program, but the assumption for the symbol
table is that the memory pointer would remain constant for the length of time
it takes to build the shippable form of the object (the life of the symbol
table).  This, I think is a valid assumption.

   How do you prevent your transfer mechanism from shipping senseless
   values, i.e., values that make no sense to the receiving process? Examples
   might be a window object, or an I/O buffer, or a hash table maintained as
   redundant data purely for performance reasons.

My assumption was that each object would make the distinction about what of
itself is shippable and what is not (therefore, designer responsibility).
Howwever, the issue of object identity from the standpoint of working with the
object after it is shipped and determining the difference between this last
object shipped of class Foo and the previous object of class Foo was something
I hadn't considered.  I believe this is a question for the application
designer, but I will look for the reference you gave.
--
===================================================================
David Masterson					Consilium, Inc.
uunet!cimshop!davidm				Mt. View, CA  94043
===================================================================
"If someone thinks they know what I said, then I didn't say it!"

wsmith@mdbs.UUCP (Bill Smith) (11/22/89)

>In article <4042@cadillac.CAD.MCC.COM> vaughan@mcc.com (Paul Vaughan) writes:
>
>   Suppose that in the byte stream there is some token that uniquely
>   identifies the class of the the object being transmitted.  Then that
>   token could be used to look up in a table a function to be called to
>   create such an object.  So, you don't have to have a case statement
>   (other techniques are possible), but you do have to somehow know of
>   all interesting classes.  A further technique would be to transmit a
>   pathname through the byte stream and to use that to dynamically load a
>   class definition.  It would still be important to have a table of the
>   classes that were already loaded.
>
>An interesting idea!  Is the capability of run-time loading of libraries of
>objects supported within C++ yet?  I would think that, in order for a program
>to make use of an object, its declaration would have to be known to the
>program at compile time.  Therefore, I don't see how this could be done. :-(
>
>David Masterson					Consilium, Inc.
>uunet!cimshop!davidm				Mt. View, CA  94043

This discussion has been wandering around an idea that I integrated into
my MS thesis project, Leif.   Leif only solved a problem defined in
terms of C, but I think with the hints and ideas suggested this far, it
could be extended to C++ without too much work.  (If an implementor
of C++ wanted to integrate something like this into the language, it might 
be even better.  I'm not prepared to get involved with something as complex
as that, yet I thought I should contribute the ideas anyway.)

I'll try to give the problem statement that I tried to solve.   It's a
little different than the original problem we are discussing, but they
are closely related:

Problem:  How does one store into a flat file a complex data structure, 
including arbitrary pointer references, with the provision that the file 
must be accessible (to some degree) arbitrarily into the future, even if the 
data structures of the loading program no longer match the definitions
that were used to store the data.

If I abstract the term "into a file" to be "to another address space",
this more closely matches the shippable C++ objects problem.  This 
technique, I believe, also solves a problem that Dr. Stroustrup references 
in the first edition of the C++ book:  How can a compiler dynamically load
symbol tables and other language definitions to allow more flexibility
and better performance with commonly used libraries?

Solution:
	First, define the data structures so that the type can be unambiguously
detected from a pointer to the structure.  In Leif, I put a word
at the beginning of each structure.  This was acceptable because I needed the 
field anyway for the most common structs in Leif.  Also, I added a word 
that was used by the data structure traversal algorithms.  In C++, the type 
definition field is free (just assume each class processed in this way has a 
virtual function for the purpose.) This extra traversal field may or may not 
be acceptable (but remember you can't get something for nothing).  It might
be possible to reduce it to a single bit per record, but I'm not positive
on that.

The problem is solved by traversing the structure in a fixed order(*) and then
reconstructing it in the same order when the new copy is built.   LL(0), LL(1) 
grammars and the like play no part in the algorithm and any arbitrary graph 
structure may be stored.  At the header of the file is a description of 
the types stored so that they may be "formal"ly input, even if the classes 
describing the data are unknown to the loading program, preventing "casual"
access.

I don't really want to go through all of the gory details unless requested,
but it works pretty well.  It's not lightning fast, but I didn't really 
try to optimize it for speed either.   I was more interested in being able
to store a parse tree from one version of Leif and be able to retrieve the
text even if the parse tree structure has been altered in an incompatible
way, yet if the structure is still valid all of the data can be retrieved.  
This feature is also valuable in any rapidly changing software 
environment such as an object oriented program.  Otherwise, just one change 
in one class and the stored data goes up in smoke...  (Or you have a nasty 
version control problem that would make the cure about as bad as the disease.)

You can get the source to Leif fairly easily (writing to leif@cs.uiuc.edu
should be a way to get information).  I think the module that implements this 
technique (and some related (IMHO) nifty activities) is in the directory 
dumpload.  My MS thesis describes the method as part of one if its chapters 
but it's not a UIUC tech report yet as far as I know. :-(

Let me know if you want a more clear description of the technique, but
it seems a little pretentious to go into all the details if the solution
doesn't meet a essential requirement that I haven't thought of.

Bill Smith
uunet!pur-ee!mdbs!wsmith
(Not mdbs opinions...)

(*) fixed order means that the order is set deterministically by the algorithm
and the structure of the data, not that different data use the same order.

davidm@uunet.UU.NET (David S. Masterson) (11/28/89)

In my previous article, I requested comments on an idea for shipping objects
between processes.  Having had a little time to play with the idea, I have
found a flaw in the methodology.  The problem concerns container classes of
virtual objects and that a receiving process would not know what type of
object was contained within the container it was requested to reconstruct.
The wish is that the container class should only know about the base virtual
object that it might contain -- not any of the possible derived children.
Part of my idea was based on the feeling that a receiving process, once
informed of the incoming object type, would know what to do with the incoming
stream.  The virtualness of internal objects in the stream means there are
more decision points than this idea could handle.

Oh well, back to the drawing board...

...maybe extending the event callback methodology for both internal and
external events...   `:-/

--
===================================================================
David Masterson					Consilium, Inc.
uunet!cimshop!davidm				Mt. View, CA  94043
===================================================================
"If someone thinks they know what I said, then I didn't say it!"

vasey@gallo.ACA.MCC.COM (Ron Vasey) (11/29/89)

[Posted for Bryan Boreham -- please reply to him.]
---
I have been reading the discussion about shipping C++ objects with
some interest, and I feel that people would be interested in hearing
about the system used in the ET++ class library.

In, ET++ almost everything inherits from class Object, and Object
declares these methods:

    virtual ostream& PrintOn (ostream&s);
    virtual istream& ReadFrom(istream &);

The implementation of PrintOn in Object merely prints out the class
name; each subclass of Object overrides this to output its instance
variables, and call the superclass version.  Here is (a contrived) one
for a class Shape:

ostream &Shape::PrintOn(ostream &s)
{
    Object::PrintOn(s);
    return s << contentRect << pattern;
}

Also in Object, there is a virtual method IsA(), which returns a
pointer to a meta-class structure that knows things like the name of
the class.  One such meta-class object is constructed statically at
program startup for each subclass of Object; the programmer must write
a simple line in her code to invoke this. 

More complicated objects have more sophisticated PrintOn methods, and
their corresponding ReadFrom.  For example, class Collection outputs
each object it holds in turn.

During the outputing of a structure, the meta-class objects build a
table of all the objects output so far, and when the same one comes
round again, they just output an index number.  A sample output of a
collection of shapes might be:

{Collection #0  4
{Shape #0 (20,40) (80,80) }
{Shape #1 (0,40) (20,100) }
{Shape #2 (30,0) (0,50) }
{Shape #1}
}

This is slightly simplified over a real ET++ output.  Here, the second
shape appears twice in the collection, but is only output in full
once.  In this way, arbitrary cyclic structures are linearised and
converted into a form suitable to be transmittted over any channel, or
stored in a file.

For input, the routine to read in an Object* looks for the '{', then
finds the right meta-class; it constructs a blank object, and then
calls the virtual ReadFrom method on that blank object.  Our
Collection will read the number of rows, and then read in each one
separately.  The braces "{}" serve to make sure that nothing has been
dropped, and help humans to read the output.

If, when reading in a new object, the system cannot find a meta-class
with the right name, it tries to dynamically load that class into the
running program.  This is (at present) quite simplistic; for class
FooBar, it looks for a file FooBar.o.

So, there is a table containing a meta-class object for every class in
the system; this table (and the program) can be expanded by dynamic
linking; the system requires some work on the part of the programmer
to set up the meta-class objects and to write the PrintOn/ReadFrom
methods, and it all works pretty well.

ET++ was written at the University of Zurich; its main function is as
a toolkit for Macintosh-like interactive programs; it is in the public
domain and you can compile it with cfront 1.2 or g++ 1.36.1, if you
are not faint of heart.

I hope this is of interest,

Bryan Boreham			bryan@kewill.uucp  
Software Engineer	||	bryan%kewill@uunet.uu.net
Kewill Systems PLC	||  ... uunet!mcvax!ukc!root44!kewill!bryan
Walton-On-Thames	
Surrey, England		Telephone: (+44) 932 248 328