davidm@uunet.UU.NET (David S. Masterson) (11/16/89)
Request for Comments (*please post*) Based on previous discussions within these groups and some work I am currently doing, I'd like to get some comments (please post) on the following ideas for the shipment of C++ objects between processes (and, by implication, the persistence of the object). Hopefully, this will generate some interesting discussions. Below I list some ideas/requirements for the shippability between processes of general objects in C++: 1. Assume an Event interface between processes. Therefore, when shipping an object, you can be very sure that something on the other side (in this case, a callback) will understand what it is because of the event type. 2. Assume each class of objects has methods for building a shippable byte array representation of the object and returning a pointer to that shippable form. This means that each object is responsible for building a shippable form of itself. By implication, the parent of the object and contained objects can build shippable representations of themselves and, so, are called at appropriate times by the child's shippable method, which then copies the information into its shippable form. This capability MUST exist on an object for an object to be shipped (the implications for storage are similar if not the same). 3. Assume each class of objects has methods for filling (or constructing) itself from the shippable byte array representation made in (2). Since the object knows what it did to create (2), it should know what it can do to get itself back from that (2) (most likely a reverse operation). Therefore, all initilization functions would only need to take a char* (void* ??) pointer. This also implies that Event objects need only contain some header followed by an object type and an unknown length byte array. 4. All parts of an object must be handled in (2) and (3), even to the point of just deciding that part of an object doesn't need to be shipped. Therefore, even memory pointers must be resolved in the processing of (2) and (3). 5. Base type objects are copiable into shippable objects as is. Their size and type will be well-known by both sides of the shipping software, so the only thing to resolve is recognizing them in the stream of bytes. If all other aspects of shippability are resolved, then recognition is not necessary as they MUST be at well-known spots in the data stream. 6. Copying of object contents out of shippable form MUST be done with knowledge of copying the object contents into shippable form. Suggested might be that an object gets its parent shippable form, followed by top to bottom contents of itself (NOTE: next paragraph) and copies those into the shippable output form for the current object. The initializer from a shippable form would then unwind this in a similar manner. 7. Memory pointers are the special case. In the process of encountering a memory pointer, converting to shippable form seems to require a need for a symbol table. Basically, the value of the memory pointer would be given to the symbol table which would return a symbol for that memory pointer (either a new one or the previous one). If the symbol table says that this memory pointer has been seen before, then there is nothing to do other than copy the symbol (with the Seen_Before flag) into the byte stream. If the memory pointer is new to the symbol table, then the symbol is copied to the byte stream (with the New flag) and the pointer is then followed to produce the shippable form of what it points to. 8. This may lead to one extra copy of an object in the shippable stream due to the potential for a circular pointer chain. This should be resolvable by well-known methods and object design criteria. For instance, it is highly unlikely (read NEVER HAPPEN) that one object would point into the middle of another object (especially with the tendancy toward private data), so putting the memory address of each new object as a whole in the symbol table and a corresponding symbol into the byte stream should eliminate the possibility of circular reference problems in object pointer chains. That is, when processing an object, enter its memory address into the symbol table so that if any succeeding objects point to it, no further processing need be done. In the case of circular pointers from ivar to ivar within an object, they should be resolvable through the definition of the object (the designer should be able to handle it). 9. Reconstructing the object on the other side would mean nothing more than copying well-known values from the byte stream into their proper places and doing some special tracking of pointer symbols in the byte stream to make sure they get relinked properly. When a pointer type is encountered in constructing an object, a symbol had better be on the input stream (trust the transmission media). This symbol would be stripped out and the symbol table queried to determine if it has been seen before. If not, an object is new()'ed and a char* pointer to the area after the symbol in the byte stream would be passed to its init function (after the value of new() is entered in the symbol table). If it has, then the value is copied from the symbol table into the current pointer. 10. If the methods for build_shippable() and init_from_shippable() are virtual on the object, then object type for the shippable form will have to be correct even if the object referred to when build_shippable() is invoked is of pointer to parent type. Because the object type will be correct, then the proper event callback on the other side will get invoked. 11. The implication of all this is that objects that are virtual in nature and have memory pointers imbedded in them can be shipped from process to process without much work and, therefore, object instances may be passed back and forth without limitation on their representation in the C++ sense. For example: class A { class X { X *alpha; int abc, cde; } } class B:A { class Y { Y *beta; float m, n; } } class C:B { class Z { Z *gamma; char x[10]; } } would translate into: Shippable_of_C { Symbol Value Label(C); Label(C) &C Label(C); Label(C) &B Label(C); Label(C) &A Label(X); Label(X) &X Label(X); Label(Y) &Y int abc, cde; Label(Z) &Z Label(Y); Label(Y); float m, n; Label(Z); Label(Z); char x[10]; } Decomposition of this should be relatively easy. The internal format of Shippable_of_C is not important, so long as (2) and (6) are followed carefully. Note that when C enters the address of itself into the Symbol Table, the returned value will also suffice for both B and A. The two extra Label(C) values in the Shippable form are necessary, though, as the init_from_shippable() function for each object cannot make assumptions about whether it has children or not. Therefore, initializing C will strip the first Label(C) and call initialize of B (which will strip the second and so on...). Also note that when (say) B is processing the pointer to Y, it knows it is processing a pointer and, therefore, should expect Label(Y) on the input stream. It can then strip this value and enter it with the address of the Y object that it will next allocate (with new()). It should not call a constructor for Y with the values in the stream until it has entered the address of Y into the symbol table. This is in case of circular references within Y. So it calls "new Y()", enters the returned value into the symbol table, then calls "Y.init()" with the char* (void* ??) pointer to the area after the Label(Y) to get it initialized. Y will then strip the expected Label(Y) and enter the following information in the byte stream into itself. -- =================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mt. View, CA 94043 =================================================================== "If someone thinks they know what I said, then I didn't say it!"
richard@pantor.UUCP (Richard Sargent) (11/16/89)
[edited quote follows] > From: cimshop!davidm@uunet.UU.NET (David S. Masterson) > Newsgroups: comp.lang.c++,comp.object > Subject: Shippable C++ Objects (RFC) > Message-ID: <CIMSHOP!DAVIDM.89Nov15101037@uunet.UU.NET> > Date: 15 Nov 89 18:10:37 GMT > > Request for Comments > (*please post*) > > Based on previous discussions within these groups and some work I am currently > doing, I'd like to get some comments (please post) on the following ideas for > the shipment of C++ objects between processes (and, by implication, the > persistence of the object). Hopefully, this will generate some interesting > discussions. Below I list some ideas/requirements for the shippability > between processes of general objects in C++: > ... > > 3. Assume each class of objects has methods for filling (or constructing) > itself from the shippable byte array representation made in (2). ... Here's the nub of the problem! The rest of the proprosal is quite straight forward. In fact, I believe that I read a paper on this in one of the OOPSLA or C++ Workshop proceedings. "The object can reconstruct itself from the byte stream" requires that the object already exist. It still seems to me that you *have* to have a case statement somewhere which knows about all (interesting) object classes. The cases create each new type of object, probably using a class constructor which has a ByteStreamRepresentation argument. The created object is then added into the receiving program's organization. How to do this from a general purpose class library without encoding knowledge of the application is another (lesser) problem. The real trick will be to eliminate the need for such a switch! If anyone figures this out for C++, I'm waiting with bated breath for the answer. ... > -- > =================================================================== > David Masterson Consilium, Inc. > uunet!cimshop!davidm Mt. View, CA 94043 > =================================================================== > "If someone thinks they know what I said, then I didn't say it!" I sure would like to see a viable general purpose solution to this problem in C++. I'm afraid that the solution will require a lot of the facilities from Smalltalk. Richard Sargent Internet: richard@pantor.UUCP Systems Analyst UUCP: uunet!pantor!richard
peterd@cs.washington.edu (Peter C. Damron) (11/17/89)
In article <CIMSHOP!DAVIDM.89Nov15101037@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: >... I'd like to get some comments (please post) on the following ideas for >the shipment of C++ objects between processes (and, by implication, the >persistence of the object). >1. Assume an Event interface between processes. Therefore, when shipping an >object, you can be very sure that something on the other side (in this case, >a callback) will understand what it is because of the event type. How does the event type relate to the object type/class? How do you assure that the object methods are the same in both processes? I assume that the different processes are in different address spaces. >2. Assume each class of objects has methods for building a shippable byte >array representation of the object ... >3. Assume each class of objects has methods for filling (or constructing) >itself from the shippable byte array representation made in (2)... It sounds to me like you are implementing write() and read() in 2 & 3. >4. All parts of an object must be handled in (2) and (3), even to the point >of just deciding that part of an object doesn't need to be shipped. >Therefore, even memory pointers must be resolved in the processing of (2) and >(3). If you want to ship a single node, do you have to ship the whole graph that contains it? >7. Memory pointers are the special case. In the process of encountering a >memory pointer, converting to shippable form seems to require a need for a >symbol table... Now this is the hard part. What you want is remote & local references to objects (everything is an object right?). >9. Reconstructing the object on the other side would mean nothing more than >copying well-known values from the byte stream into their proper places and >doing some special tracking of pointer symbols in the byte stream to make sure >they get relinked properly... It sounds like you are limiting the shippable format to be an LL(0) language. This may be too restrictive. >10. If the methods for build_shippable() and init_from_shippable() are >virtual on the object, then object type for the shippable form will have to be >correct even if the object referred to when build_shippable() is invoked is of >pointer to parent type. Because the object type will be correct, then the >proper event callback on the other side will get invoked. How do you represent virtual pointers accross processes? I get the feeling you thought about data but not code. I suggest you read about systems that have already tried to deal with distributed objects, like Eden and Emerald here at University of Washington. Try the references below for starters. Hope this helps, Peter. --------------- Peter C. Damron Dept. of Computer Science, FR-35 University of Washington Seattle, WA 98195 peterd@cs.washington.edu {ucbvax,decvax,etc.}!uw-beaver!uw-june!peterd --------------- %T Distribution and Abstract Types in Emerald %A Andrew P. Black %A Norman C. Hutchinson %A Eric Jul %A Henry M. Levy %A L. Carter %J IEEETSE %V 13 %N 1 %D JAN 1987 %T Fine-Grained Mobility in the Emerald System %A Eric Jul %A Henry M. Levy %A Norman C. Hutchinson %A Andrew P. Black %J TOCS %D FEB 1988 %T Replication in Distributed Systems: The Eden Experience %A Jerre D. Noe %A Andrew Proudfoot %A Calton Pu %J PROC Fall Joint Computer CONF (FJCC) %C Dallas, TX %D NOV 1986 %P 1197-1208 %T The Architecture of the Eden System %A Edward D. Lazowska %A Henry M. Levy %A Guy T. Almes %A Michael J. Fischer %A Robert J. Fowler %A Stephen C. Vestal %J PROC 8th SYMP on Operating Systems Principles %D DEC 1981 %P 148-159
db@helium.East.Sun.COM (David Brownell) (11/17/89)
In article <31.UUL1.3#5109@pantor.UUCP> richard@pantor.UUCP (Richard Sargent) writes: > >> From: cimshop!davidm@uunet.UU.NET (David S. Masterson) >> Subject: Shippable C++ Objects (RFC) >> >> 3. Assume each class of objects has methods for filling (or constructing) >> itself from the shippable byte array representation made in (2). ... > >Here's the nub of the problem! The rest of the proprosal is quite >straight forward. In fact, I believe that I read a paper on this >in one of the OOPSLA or C++ Workshop proceedings. It's also, surprise!, essentially the same idea that shows up in networking for how to serialize and deserialize RPC arguments. Commercial implementations of the idea (not in C++) include XDR, ASN.1, and an analogue that NCS's NIDL compiles into (sorry, I forget the name). If you've never looked at RPC systems before, do so -- you'll find that they look very much like distributed object systems. (Sun's RPC has a limitation of one object per type per system, but then again any OO programmer worth his/her salt can implement objects that manage access to many other objects ... like NFS does for "file" objects, for example.) >"The object can reconstruct itself from the byte stream" requires >that the object already exist. It still seems to me that you *have* >to have a case statement somewhere which knows about all (interesting) >object classes. The cases create each new type of object, probably using >a class constructor which has a ByteStreamRepresentation argument. Sort of; "object" is distinct from a data format. The process is taking data in one format from one data store into another, and constructing a NEW object by binding methods to the new data representation. What's shipped is data, not data plus code, and the new methods might not implement the class used by the message sender. The key points are needing to use a constructor function, and needing to choose which of several constructors to use. No application will be able to understand all types/classes ... they get incrementally added to real systems over time. Apps need to be able to reject or ignore message types they can't understand by examining a type code stored early in the message; there's a "default" branch in that case statement! >The created object is then added into the receiving program's >organization. How to do this from a general purpose class library >without encoding knowledge of the application is another (lesser) >problem. One interesting trick, irrelevant to today's C++, is to ship source code implementing the class (including its deserializing code) to the message recipient when it needs it. This is what NeWS does, and probably some other systems I don't know about. (I can't see any reasonable networked implementation of Lisp not supporting this, for example!) This still doesn't get away from needing to know the type of the message ("source code followed by data"). David Brownell db@east.sun.com Sun Desktop Systems Software sun!suneast!db Billerica, MA
vaughan@mcc.com (Paul Vaughan) (11/17/89)
"The object can reconstruct itself from the byte stream" requires that the object already exist. It still seems to me that you *have* to have a case statement somewhere which knows about all (interesting) object classes. Suppose that in the byte stream there is some token that uniquely identifies the class of the the object being transmitted. Then that token could be used to look up in a table a function to be called to create such an object. So, you don't have to have a case statement (other techniques are possible), but you do have to somehow know of all interesting classes. A further technique would be to transmit a pathname through the byte stream and to use that to dynamically load a class definition. It would still be important to have a table of the classes that were already loaded. Paul Vaughan, MCC CAD Program | ARPA: vaughan@mcc.com | Phone: [512] 338-3639 Box 200195, Austin, TX 78720 | UUCP: ...!cs.utexas.edu!milano!cadillac!vaughan
peterd@cs.washington.edu (Peter C. Damron) (11/18/89)
In article <1044@east.East.Sun.COM> db@helium.East.Sun.COM (David Brownell) writes: >In article <31.UUL1.3#5109@pantor.UUCP> > richard@pantor.UUCP (Richard Sargent) writes: >>> From: cimshop!davidm@uunet.UU.NET (David S. Masterson) >>> 3. Assume each class of objects has methods for filling (or constructing) >>> itself from the shippable byte array representation made in (2). ... >>Here's the nub of the problem! ... >It's also, surprise!, essentially the same idea that shows up in >networking for how to serialize and deserialize RPC arguments. >>"The object can reconstruct itself from the byte stream" requires >>that the object already exist. It still seems to me that you *have* >>to have a case statement somewhere which knows about all (interesting) >>object classes. The cases create each new type of object, probably using >>a class constructor which has a ByteStreamRepresentation argument. It seems to me that this ByteStreamRepresentation is a string language where the objects are the words or tokens of the language. The only way to parse that language in which you know the type of the object before you see it occurs if the langauge is LL(0) and you are using a top-down parser. LL(0) is not a very powerful class of languages. Given that you want to send your LL(1) or LR(1) string of objects, you could first convert these objects into LL(0) objects by adding "syntactic sugar" into the language. This amounts to adding tokens/objects that tell the type/class of the following object. This does not eliminate the "case statement" described above, but it potentially splits the case statement accross many syntactic sugar objects and it probably makes it easier to adapt to changes in the language of the byte stream. >Sort of; "object" is distinct from a data format. The process is taking >data in one format from one data store into another, and constructing a >NEW object by binding methods to the new data representation. What's >shipped is data, not data plus code, and the new methods might not >implement the class used by the message sender. The key points are needing >to use a constructor function, and needing to choose which of several >constructors to use. This is a good point. In OO systems, code is bundled with objects. If you want to maintain the object type/class accross an address boundary, then code has to get moved/referenced as well as data. >No application will be able to understand all types/classes ... they get >incrementally added to real systems over time... Good point. See above about the syntactic sugar. The point that everyone seems to be missing in this disscussion is: Why do you want to convert objects in the byte streams in the first place? Why not just move objects "as is" to another address space? Of course, this implies that you have ways to address objects in another address space, and that is another problem. Better yet, just get rid of the different address spaces. After all, partitioning objects into address spaces in an object oriented system are just an efficiency hack. The object is the address space. Hope this helps, Peter. --------------- Peter C. Damron Dept. of Computer Science, FR-35 University of Washington Seattle, WA 98195 peterd@cs.washington.edu {ucbvax,decvax,etc.}!uw-beaver!uw-june!peterd
davidm@uunet.UU.NET (David S. Masterson) (11/18/89)
In article <9832@june.cs.washington.edu> peterd@cs.washington.edu (Peter C. Damron) writes: In article <CIMSHOP!DAVIDM.89Nov15101037@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: >1. Assume an Event interface between processes. How does the event type relate to the object type/class? How do you assure that the object methods are the same in both processes? The natural assumption is that if I have a byte stream that represents an object to be sent to another process, I will issue an event to that process that will cause that process to invoke the reconstruction method. My assumption (perhaps invalid) was that all processes are built from the same object library, therefore the object methods are the same in both processes because its the same object class. There has to be some assumptions that the project that both processes are involved in are part of the same effort (in other words, they reuse each others code). >2. Assume each class of objects has methods for building a shippable byte >array representation of the object ... >3. Assume each class of objects has methods for filling (or constructing) >itself from the shippable byte array representation made in (2)... It sounds to me like you are implementing write() and read() in 2 & 3. Exactly! Except that this write() and read() are to memory instead of to disk (this should give a C++ program more flexibility in what to do with that object). Now the application, not the object, has the choice of where to send the object and how to get it there (this might be wrapped in a higher object). >4. All parts of an object must be handled in (2) and (3), even to the > point of just deciding that part of an object doesn't need to be shipped. If you want to ship a single node, do you have to ship the whole graph that contains it? Would the node make sense to the receiving process without the object that its next pointer points to? If so, then that comes under the heading of "doesn't need to be shipped" (at least this particular time). However, if you ship a node to another process without the object that is referred to by the node's next pointer (essentially just the data within the node), is it still a node? >7. Memory pointers are the special case. Now this is the hard part. What you want is remote & local references to objects (everything is an object right?). I'm not sure what you mean here. Each process will use the object in its non-shipped, internal form, so to each process the object will be local at the time of its reference. Its up to the application to decide whether there are any conflicts (from a lock standpoint) about this way of doing business. >9. Reconstructing the object on the other side would mean nothing more > than copying well-known values from the byte stream into their proper > places and doing some special tracking of pointer symbols in the byte > stream to make sure they get relinked properly... It sounds like you are limiting the shippable format to be an LL(0) language. This may be too restrictive. Why? The shippable format only exists during the move of an object from one process to another. Breaking an object down into simple terms for this purpose should make it easier to work with. >10. If the methods for build_shippable() and init_from_shippable() are > virtual on the object, then object type for the shippable form will have > to be correct even if the object referred to when build_shippable() is > invoked is of pointer to parent type. Because the object type will be > correct, then the proper event callback on the other side will get > invoked. How do you represent virtual pointers accross processes? I get the feeling you thought about data but not code. Since the processes are dealing with the same class of object, then virtual pointers should take care of themselves (they must be right for the particular process). Code cannot be shipped, therefore the implication is that both processes are already compiled with the same code. Since the "linearization" of the object by the method I have suggested deals with each object in an object to be shipped individually, virtual pointers can be safely ignored (they are outside the definition of an object) because each object only deals with what it knows about. Therefore, the callback method via the event type will know what object needs construction and so construct it with the proper virtual functions already there. I suggest you read about systems that have already tried to deal with distributed objects... I'll try, but the company I work for doesn't have ready access to a reference library. Peter C. Damron Dept. of Computer Science, FR-35 University of Washington Seattle, WA 98195 peterd@cs.washington.edu {ucbvax,decvax,etc.}!uw-beaver!uw-june!peterd -- =================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mt. View, CA 94043 =================================================================== "If someone thinks they know what I said, then I didn't say it!"
davidm@uunet.UU.NET (David S. Masterson) (11/18/89)
In article <31.UUL1.3#5109@pantor.UUCP> richard@pantor.UUCP (Richard Sargent) writes: [edited quote follows] > From: cimshop!davidm@uunet.UU.NET (David S. Masterson) ... > > 3. Assume each class of objects has methods for filling (or constructing) > itself from the shippable byte array representation made in (2). ... Here's the nub of the problem! ... The real trick will be to eliminate the need for [...] a switch! If anyone figures this out for C++, I'm waiting with bated breath That was why I suggested the Event interface (a la X windows) as the method for passing things to new processes. Each Event would be registered with an appropriate callback (which might be wrappable as an object). Therefore, whenever an application takes on new functionality, it acquires a new callback which would be registered with the event processor in the usual fashion. So, a program capable of accepting an object of (say) XClass would have already registered the construction callback XClassConstruct with its Event processing mechanism. -- =================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mt. View, CA 94043 =================================================================== "If someone thinks they know what I said, then I didn't say it!"
dld@F.GP.CS.CMU.EDU (David Detlefs) (11/18/89)
Richard Sargent and Paul Vaughan, among others, have been having a discussion under this Subject, in which there is a shared assumption something like: When you ship an object across a network, the object needs to somehow contain information describing its type; otherwise, it cannot be reconstructed. I think this is a (somewhat) false assumption. Consider a strongly typed RPC interface. The sender must send the right type of object, or else type-checking would fail. The receiver knows the expected type of the RPC argument, and can use his knowledge of the type to construct (at compile-time) to reconstruct the object from the bit-stream. As David Brownell pointed out, all RPC systems solve this problem. Masterson's proposal goes farther in that it posits shipping the transitive pointer-closure of the object, while all RPC systems that I know about require objects with "in-line" data. -- Dave Detlefs Any correlation between my employer's opinion Carnegie-Mellon CS and my own is statistical rather than causal, dld@cs.cmu.edu except in those cases where I have helped to form my employer's opinion. (Null disclaimer.)
tuck@zeta.cs.unc.edu (Russ Tuck) (11/19/89)
> (Original heading and author lost. Sorry.) > "The object can reconstruct itself from the byte stream" requires > that the object already exist. It still seems to me that you *have* > to have a case statement somewhere which knows about all (interesting) > object classes. As I understand it (and I'm sure someone will point it out if I'm wrong), the receiving program only needs to know about the *top-level* classes it receives. Those top-level classes will take care of the other classes they may contain (ie, as member data), automatically and transparently to the receiving program. So there's a case statement or call table containing the a few classes the receiving program must know about, but there's no need for the receiving program to know about the lower-level classes used by the implementation of those top-level classes. This is just like a called subroutine must know about the classes named in its argument list. Russ Tuck tuck@cs.unc.edu UNC Dept. of Computer Science ...!mcnc!unc!tuck CB# 3175 Sitterson Hall Chapel Hill, NC 27599-3175, USA (919) 962-1755 or 962-1932
davidm@uunet.UU.NET (David S. Masterson) (11/20/89)
In article <4042@cadillac.CAD.MCC.COM> vaughan@mcc.com (Paul Vaughan) writes:
Suppose that in the byte stream there is some token that uniquely
identifies the class of the the object being transmitted. Then that
token could be used to look up in a table a function to be called to
create such an object. So, you don't have to have a case statement
(other techniques are possible), but you do have to somehow know of
all interesting classes. A further technique would be to transmit a
pathname through the byte stream and to use that to dynamically load a
class definition. It would still be important to have a table of the
classes that were already loaded.
An interesting idea! Is the capability of run-time loading of libraries of
objects supported within C++ yet? I would think that, in order for a program
to make use of an object, its declaration would have to be known to the
program at compile time. Therefore, I don't see how this could be done. :-(
--
===================================================================
David Masterson Consilium, Inc.
uunet!cimshop!davidm Mt. View, CA 94043
===================================================================
"If someone thinks they know what I said, then I didn't say it!"
graham@cabernet.newcastle.ac.uk (Graham D. Parrington) (11/20/89)
dld@F.GP.CS.CMU.EDU (David Detlefs) writes: >I think this is a (somewhat) false assumption. Consider a strongly >typed RPC interface. The sender must send the right type of object, >or else type-checking would fail. The receiver knows the expected >type of the RPC argument, and can use his knowledge of the type to >construct (at compile-time) to reconstruct the object from the >bit-stream. As David Brownell pointed out, all RPC systems solve this >problem. Masterson's proposal goes farther in that it posits shipping >the transitive pointer-closure of the object, while all RPC systems >that I know about require objects with "in-line" data. Dave is missing one vital point about O-O systems here - that of inheritence. Just because I declare a routine to accept an X at compile time does not mean that I am not free to pass a Y or Z in it place (providing Y and Z are derived from X). The implications of this (at least in a C++ context) is that the receiver wil construct the wrong receiveing type (always a base type and never the correct derived type) and hence all virtual functions will be dispatched incorrectly (from the senders point of view). Thus local and distributed versions of the same program will exhibit different behaviour! Graham Parrington, Computing Laboratory, University of Newcastle upon Tyne ARPA = Graham.Parrington%newcastle.ac.uk@nsfnet-relay.ac.uk UUCP = ...!ukc!newcastle.ac.uk!Graham.Parrington PHONE = +44 91 222 8067
peterson@choctaw.csc.ti.com (Bob Peterson) (11/20/89)
In article <CIMSHOP!DAVIDM.89Nov15101037@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: > >Based on previous discussions within these groups and some work I am currently >doing, I'd like to get some comments (please post) on the following ideas for >the shipment of C++ objects between processes (and, by implication, the >persistence of the object). > You really should visit a library and review the last few proceedings of relevant conferences and workshops, e.g., ACM OOPSLA Conferences since 1986, the OODB Workshops (one in '86 in California, and a second in '88 in Germany), and recent ACM SIGMOD Conferences. >2. Assume each class of objects has methods for building a shippable byte >array representation of the object and returning a pointer to that shippable >form. > >3. Assume each class of objects has methods for filling (or constructing) >itself from the shippable byte array representation made in (2). Seems to me that much of this code should be encapsulated in a translation class, with an object containing only a description of its type. The description is passed to the translation routines. Now a class implementor doesn't have to (again) write the same conversion routines, but simply write a type description. Even writing the type description could be automated! >7. Memory pointers are the special case. ... > Basically, the value of the memory pointer would be given to >the symbol table which would return a symbol for that memory pointer (either a >new one or the previous one). > This assumes that a memory address is constant over the life of the program. Is this a valid assumption? For example, a String class might reallocate storage if a string changes size, resulting in a different memory address for what, logically, is the same entity. Sending the same object twice should result in the same destination structure, without orphaned storage, regardless of reallocations that may happen. If a different memory address results in a different symbol the receiving end can't recognize that the original value is no longer needed. How do you prevent your transfer mechanism from shipping senseless values, i.e., values that make no sense to the receiving process? Examples might be a window object, or an I/O buffer, or a hash table maintained as redundant data purely for performance reasons. These and other identity issues are discussed in the paper, "Object Identity," by Khoshafian and Copeland in the _OOPSLA '86 Conference Proceeedings_ (published as the November 1986 (Volume 21, Number 11) issue of ACM SIGPLAN Notices), page 406. This article is a good discussion of the identity issue. >=================================================================== >David Masterson Consilium, Inc. >uunet!cimshop!davidm Mt. View, CA 94043 >=================================================================== >"If someone thinks they know what I said, then I didn't say it!" Bob Bob Peterson Compuserve: 70235,326 Expressway Site, Texas Instruments USENET: peterson@csc.ti.com North Building, P.O. Box 655474, MS238 (214) 995-6080 2nd Floor, Dallas, Texas, USA 75265 CSC Aisle C3
kipp@warp.sgi.com (Kipp Hickman) (11/21/89)
To eliminate the ``switch'' implied by the object reconstruction mechanism,
all you need is a dictionary. The key is the tag which identifies the object
in its byte stream form (a string works nicely). The value is a pointer to
a function to perform the reconstruction. Usually, the function is encode
thusly:
void (*reader)(ObjectReader& source, void* where);
The "source" argument specifies the source of the byte stream, and provides
operations to retrieve data (could be a stream, for instance). The "where"
argument provides an (optional) address where the object should be placed,
for the cases where the object is imbedded, and shouldn't allocate its
own memory. In C++ 2.0, you implement the above function as follows:
void FooReader(ObjectReader& source, void* where)
{
Foo* f = new(where) Foo(source);
}
Lots of details left as an exercise to the reader... :-)
kipp hickman
silicon graphics inc.
davidm@uunet.UU.NET (David S. Masterson) (11/22/89)
In article <98968@ti-csl.csc.ti.com> peterson@choctaw.csc.ti.com (Bob Peterson) writes: In article <CIMSHOP!DAVIDM.89Nov15101037@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: >Based on previous discussions within these groups... You really should visit a library... I'm getting so much good information here ;-). But I am looking around... >2. Assume each class of objects has methods for building a shippable byte >array representation of the object and returning a pointer to that >shippable form. > >3. Assume each class of objects has methods for filling (or constructing) >itself from the shippable byte array representation made in (2). Seems to me that much of this code should be encapsulated in a translation class, with an object containing only a description of its type. The description is passed to the translation routines. Now a class implementor doesn't have to (again) write the same conversion routines, but simply write a type description. Even writing the type description could be automated! Having attempted to do basically what you're saying with some people before coming to the conclusions in my previous article, I really don't see how the translation of any object into a shippable form could be encapsulated. There is just too much infomation that is within an object's definition to be passed to another class. The object itself has the best understanding of what it is composed of and what would be needed to reconstruct it from the information that it contains. What form would the "description" of an object take in order to pass it to the translation routines (I suspect the answer would be so complex that you would have wound up doing the most of the translation before you give anything to the translation routines). >7. Memory pointers are the special case. ... > Basically, the value of the memory pointer would be given to >the symbol table which would return a symbol for that memory pointer >(either a new one or the previous one). > This assumes that a memory address is constant over the life of the program. Is this a valid assumption? I'm not sure about the life of the program, but the assumption for the symbol table is that the memory pointer would remain constant for the length of time it takes to build the shippable form of the object (the life of the symbol table). This, I think is a valid assumption. How do you prevent your transfer mechanism from shipping senseless values, i.e., values that make no sense to the receiving process? Examples might be a window object, or an I/O buffer, or a hash table maintained as redundant data purely for performance reasons. My assumption was that each object would make the distinction about what of itself is shippable and what is not (therefore, designer responsibility). Howwever, the issue of object identity from the standpoint of working with the object after it is shipped and determining the difference between this last object shipped of class Foo and the previous object of class Foo was something I hadn't considered. I believe this is a question for the application designer, but I will look for the reference you gave. -- =================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mt. View, CA 94043 =================================================================== "If someone thinks they know what I said, then I didn't say it!"
wsmith@mdbs.UUCP (Bill Smith) (11/22/89)
>In article <4042@cadillac.CAD.MCC.COM> vaughan@mcc.com (Paul Vaughan) writes: > > Suppose that in the byte stream there is some token that uniquely > identifies the class of the the object being transmitted. Then that > token could be used to look up in a table a function to be called to > create such an object. So, you don't have to have a case statement > (other techniques are possible), but you do have to somehow know of > all interesting classes. A further technique would be to transmit a > pathname through the byte stream and to use that to dynamically load a > class definition. It would still be important to have a table of the > classes that were already loaded. > >An interesting idea! Is the capability of run-time loading of libraries of >objects supported within C++ yet? I would think that, in order for a program >to make use of an object, its declaration would have to be known to the >program at compile time. Therefore, I don't see how this could be done. :-( > >David Masterson Consilium, Inc. >uunet!cimshop!davidm Mt. View, CA 94043 This discussion has been wandering around an idea that I integrated into my MS thesis project, Leif. Leif only solved a problem defined in terms of C, but I think with the hints and ideas suggested this far, it could be extended to C++ without too much work. (If an implementor of C++ wanted to integrate something like this into the language, it might be even better. I'm not prepared to get involved with something as complex as that, yet I thought I should contribute the ideas anyway.) I'll try to give the problem statement that I tried to solve. It's a little different than the original problem we are discussing, but they are closely related: Problem: How does one store into a flat file a complex data structure, including arbitrary pointer references, with the provision that the file must be accessible (to some degree) arbitrarily into the future, even if the data structures of the loading program no longer match the definitions that were used to store the data. If I abstract the term "into a file" to be "to another address space", this more closely matches the shippable C++ objects problem. This technique, I believe, also solves a problem that Dr. Stroustrup references in the first edition of the C++ book: How can a compiler dynamically load symbol tables and other language definitions to allow more flexibility and better performance with commonly used libraries? Solution: First, define the data structures so that the type can be unambiguously detected from a pointer to the structure. In Leif, I put a word at the beginning of each structure. This was acceptable because I needed the field anyway for the most common structs in Leif. Also, I added a word that was used by the data structure traversal algorithms. In C++, the type definition field is free (just assume each class processed in this way has a virtual function for the purpose.) This extra traversal field may or may not be acceptable (but remember you can't get something for nothing). It might be possible to reduce it to a single bit per record, but I'm not positive on that. The problem is solved by traversing the structure in a fixed order(*) and then reconstructing it in the same order when the new copy is built. LL(0), LL(1) grammars and the like play no part in the algorithm and any arbitrary graph structure may be stored. At the header of the file is a description of the types stored so that they may be "formal"ly input, even if the classes describing the data are unknown to the loading program, preventing "casual" access. I don't really want to go through all of the gory details unless requested, but it works pretty well. It's not lightning fast, but I didn't really try to optimize it for speed either. I was more interested in being able to store a parse tree from one version of Leif and be able to retrieve the text even if the parse tree structure has been altered in an incompatible way, yet if the structure is still valid all of the data can be retrieved. This feature is also valuable in any rapidly changing software environment such as an object oriented program. Otherwise, just one change in one class and the stored data goes up in smoke... (Or you have a nasty version control problem that would make the cure about as bad as the disease.) You can get the source to Leif fairly easily (writing to leif@cs.uiuc.edu should be a way to get information). I think the module that implements this technique (and some related (IMHO) nifty activities) is in the directory dumpload. My MS thesis describes the method as part of one if its chapters but it's not a UIUC tech report yet as far as I know. :-( Let me know if you want a more clear description of the technique, but it seems a little pretentious to go into all the details if the solution doesn't meet a essential requirement that I haven't thought of. Bill Smith uunet!pur-ee!mdbs!wsmith (Not mdbs opinions...) (*) fixed order means that the order is set deterministically by the algorithm and the structure of the data, not that different data use the same order.
davidm@uunet.UU.NET (David S. Masterson) (11/28/89)
In my previous article, I requested comments on an idea for shipping objects between processes. Having had a little time to play with the idea, I have found a flaw in the methodology. The problem concerns container classes of virtual objects and that a receiving process would not know what type of object was contained within the container it was requested to reconstruct. The wish is that the container class should only know about the base virtual object that it might contain -- not any of the possible derived children. Part of my idea was based on the feeling that a receiving process, once informed of the incoming object type, would know what to do with the incoming stream. The virtualness of internal objects in the stream means there are more decision points than this idea could handle. Oh well, back to the drawing board... ...maybe extending the event callback methodology for both internal and external events... `:-/ -- =================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mt. View, CA 94043 =================================================================== "If someone thinks they know what I said, then I didn't say it!"
vasey@gallo.ACA.MCC.COM (Ron Vasey) (11/29/89)
[Posted for Bryan Boreham -- please reply to him.] --- I have been reading the discussion about shipping C++ objects with some interest, and I feel that people would be interested in hearing about the system used in the ET++ class library. In, ET++ almost everything inherits from class Object, and Object declares these methods: virtual ostream& PrintOn (ostream&s); virtual istream& ReadFrom(istream &); The implementation of PrintOn in Object merely prints out the class name; each subclass of Object overrides this to output its instance variables, and call the superclass version. Here is (a contrived) one for a class Shape: ostream &Shape::PrintOn(ostream &s) { Object::PrintOn(s); return s << contentRect << pattern; } Also in Object, there is a virtual method IsA(), which returns a pointer to a meta-class structure that knows things like the name of the class. One such meta-class object is constructed statically at program startup for each subclass of Object; the programmer must write a simple line in her code to invoke this. More complicated objects have more sophisticated PrintOn methods, and their corresponding ReadFrom. For example, class Collection outputs each object it holds in turn. During the outputing of a structure, the meta-class objects build a table of all the objects output so far, and when the same one comes round again, they just output an index number. A sample output of a collection of shapes might be: {Collection #0 4 {Shape #0 (20,40) (80,80) } {Shape #1 (0,40) (20,100) } {Shape #2 (30,0) (0,50) } {Shape #1} } This is slightly simplified over a real ET++ output. Here, the second shape appears twice in the collection, but is only output in full once. In this way, arbitrary cyclic structures are linearised and converted into a form suitable to be transmittted over any channel, or stored in a file. For input, the routine to read in an Object* looks for the '{', then finds the right meta-class; it constructs a blank object, and then calls the virtual ReadFrom method on that blank object. Our Collection will read the number of rows, and then read in each one separately. The braces "{}" serve to make sure that nothing has been dropped, and help humans to read the output. If, when reading in a new object, the system cannot find a meta-class with the right name, it tries to dynamically load that class into the running program. This is (at present) quite simplistic; for class FooBar, it looks for a file FooBar.o. So, there is a table containing a meta-class object for every class in the system; this table (and the program) can be expanded by dynamic linking; the system requires some work on the part of the programmer to set up the meta-class objects and to write the PrintOn/ReadFrom methods, and it all works pretty well. ET++ was written at the University of Zurich; its main function is as a toolkit for Macintosh-like interactive programs; it is in the public domain and you can compile it with cfront 1.2 or g++ 1.36.1, if you are not faint of heart. I hope this is of interest, Bryan Boreham bryan@kewill.uucp Software Engineer || bryan%kewill@uunet.uu.net Kewill Systems PLC || ... uunet!mcvax!ukc!root44!kewill!bryan Walton-On-Thames Surrey, England Telephone: (+44) 932 248 328