thomasw@hpcupt1.cup.hp.com (Thomas Wang) (05/07/91)
I am looking at the issues of value semantic versus reference semantic. The value semantic says that a := b; means the value of 'b' is copied to 'a'. The reference semantic says that a := b; means 'a' is pointing to the same place as 'b'. It has occurred to me that value semantic is equivalent to reference semantic if the value of 'b' can never change. So one can use reference semantic on immutable classes, and expect them to be well behaved. However, it seems the running speed of program using immutable classes can be slower than an equivalent program using mutable classes, since so many garbage objects can be generated by the immutable version. The question is 'are there optimization algorithms that can improve the performance of immutable classes?' -Thomas Wang (Everything is an object.) wang@hpdmsjlm.cup.hp.com thomasw@hpcupt1.cup.hp.com
sakkinen@jyu.fi (Markku Sakkinen) (05/14/91)
In article <68780001@hpcupt1.cup.hp.com> thomasw@hpcupt1.cup.hp.com (Thomas Wang) writes: > >I am looking at the issues of value semantic versus reference semantic. > >The value semantic says that a := b; means the value of 'b' is copied >to 'a'. > >The reference semantic says that a := b; means 'a' is pointing to the >same place as 'b'. Right, and note that almost all OOPL's offer only reference semantics for objects. C++ is one of the refreshing exceptions. >It has occurred to me that value semantic is equivalent to reference >semantic if the value of 'b' can never change. So one can use reference >semantic on immutable classes, and expect them to be well behaved. To avoid all ambiguity, it might be better to speak of immutable _objects_. I must admit another good point in C++ (newer versions): immutability ('const') is not an attribute of a class but rather of a variable or object. There seems to be the bad point in Smalltalk that one really cannot define sensible immutable classes, because there is no distinction between the initialisation and the modification of an object. (Smalltalk experts: is there a way around this problem?) In reference-based languages, for an assignment to be equivalent to value semantics, it is usually not sufficient for the assigned object to be immutable itself; it must be "recursively immutable" in the worst case. But note that with reference semantics there is even no general definition of the "value" of an object: usually it is something between "shallow value" and "deep value". >However, it seems the running speed of program using immutable classes >can be slower than an equivalent program using mutable classes, since so many >garbage objects can be generated by the immutable version. >The question is 'are there optimization algorithms that can improve >the performance of immutable classes?' There should be people in this group who know some answers to the question, but so far I haven't seen any. However, I have some doubts about the relevance of the problem: 1. If you are modelling mutable entities with your software, why should you use immutable objects in the first place? 2. If you really need immutable objects, there will not be generated many garbage objects. I.e., use both mutable and immutable objects according to where each alternative is more appropriate. Personally, I do not believe in "applicative object-oriented languages" (if that's what you had at the back of your mind), except perhaps for very special applications (pun not originally intended, but it came for free). Markku Sakkinen Department of Computer Science and Information Systems University of Jyvaskyla (a's with umlauts) PL 35 SF-40351 Jyvaskyla (umlauts again) Finland SAKKINEN@FINJYU.bitnet (alternative network address)
eliot@cs.qmw.ac.uk (Eliot Miranda) (05/15/91)
>In article <68780001@hpcupt1.cup.hp.com> thomasw@hpcupt1.cup.hp.com (Thomas Wang) writes: >>It has occurred to me that value semantic is equivalent to reference >>semantic if the value of 'b' can never change. So one can use reference >>semantic on immutable classes, and expect them to be well behaved. This is not true of typical OOPLs without some effort. Systems which give access to an object's identity (C++ & Smalltalk) or allow enumeration of a class's instances (Smalltalk) can allow the programmer to show that two or more equi-valued immutable objects exist. In article <1991May14.093053.3017@jyu.fi> sakkinen@jytko.jyu.fi (Markku Sakkinen) writes: >To avoid all ambiguity, it might be better to speak of immutable _objects_. >I must admit another good point in C++ (newer versions): immutability >('const') is not an attribute of a class but rather of a variable or object. >There seems to be the bad point in Smalltalk that one really cannot define >sensible immutable classes, because there is no distinction between >the initialisation and the modification of an object. (Smalltalk experts: >is there a way around this problem?) Indeed. Note that in standard Smalltalk-80 & Smalltalk/V systems the symbols are a set of unique strings. These symbols are tested for equality by looking at their pointer. Message lookup matches the pointer of the message selector symbol with an occurence of the pointer in some dictionary. One simple way to implement 'private' messages is to create a private set of objects (they don't even have to be strings, they could be e.g. SmallIntegers). Within some compilation group (e.g. a class & its metaclass) certain message selectors can be taken from this private set. This means that only instances of the class or the class itself can send & get understood messages in this set. To implement truly immutable objects in Smalltalk arrange that a) the initialization message selector is private to the object's class b) override all mutating primitives with shouldNotImplement versions This gives truly immutable objects provided that you trust the object's classes to correctly initialize them & not diddle about later. To implement truly immutable values (e.g. which can be arbitrarily duplicated without being able to distinguish between equi-valued instances) a) override the == primitive with a method that compares the values of the receiver & anObject. This could be primitively implemented for simple values (e.g. strings) b) stumble across the instance enumeration facilities: someInstance nextInstance allInstances allInstancesDo: and realise you have a problem :-) Part a) is a problem in some Smalltalk virtual machines because they don't bother to look up the #== message. On these platforms you typically have to change the set of special selectors so it doesn't include #== and recompile the entire system. This ensures that the special selector send of #== which isn't looked up is nolonger generated by the compiler. Too many people assume the Smalltalk compilation system as a given (harder not to assume this in Smalltalk/V). Being open, Smalltalk-80 is much more flexible. > >In reference-based languages, for an assignment to be equivalent to >value semantics, it is usually not sufficient for the assigned object to be >immutable itself; it must be "recursively immutable" in the worst case. >But note that with reference semantics there is even no general definition >of the "value" of an object: usually it is something between "shallow value" >and "deep value". More than that. In systems like Smalltalk with many reflective facilities you must also think hard about e.g. allInstancesDo:. Its no good not being able to distinguish between two immutable objects with the same value if you can demonstrate that in fact more than one exists. e.g. in Smalltalk, if running the following | count | count := 0. ImmutableStringValue allInstancesDo: [:s| (s size = 3 and: [(s at: 1) = $Y and: [(s at: 2) = $E and: [(s at: 3) = $S]]]) ifTrue: [ count := count + 1]]. count evaluates to something more than 1 then your implementation of immutable values is flawed. I haven't thought of a good solution to this. > >>However, it seems the running speed of program using immutable classes >>can be slower than an equivalent program using mutable classes, since so many >>garbage objects can be generated by the immutable version. >>The question is 'are there optimization algorithms that can improve >>the performance of immutable classes?' Certainly. The use of hash tables to hold all instances (a la Smalltalk-80 symbol table) where the hash tables are 'weak arrays' so that instances only referenced from the table will get collected. When you want to create an instance of some immutable value class given its intended value first look in the hash table. I think the same technique could be used to solve the allInstancesDo: problem either by arranging that duplicates are never created or by ignoring instances not in the table during enumeration. >There should be people in this group who know some answers to the question, >but so far I haven't seen any. However, I have some doubts about >the relevance of the problem: >1. If you are modelling mutable entities with your software, why should > you use immutable objects in the first place? >2. If you really need immutable objects, there will not be generated > many garbage objects. > >I.e., use both mutable and immutable objects according to where >each alternative is more appropriate. Personally, I do not believe >in "applicative object-oriented languages" (if that's what you had >at the back of your mind), except perhaps for very special applications >(pun not originally intended, but it came for free). > >Markku Sakkinen We're doing work on a distributed Smalltalk and for us immutable values can be freely copied between machines, which is much cheaper than creating proxies. Consider e.g. a database that returns a string in response to a query from another machine. If the string is mutable & referenced by other objects then a proxy must be created. (If the string is not referenced by other objects it can be migrated). The proxy will reference the string via some global id. When the client wants to fetch the characters in the string a remote message send will be needed for each character (using some sort of concurrency control e.g. read & write batons in distributed filing systems, can optimize this). If the string is an immutable value then it can simply be copied. In our distributed Smalltalk we intend to arrange that substancial parts of classes (especially methods) will be 'recursively immutable' by Markku's definition. Changes to classes will be probably be done via a version system. This should make the management of a large shared code-base much simpler. -- Eliot Miranda email: eliot@dcs.qmw.ac.uk Dept of Computer Science ARPA: eliot%dcs.qmw.ac.uk@nsf.ac.uk Queen Mary Westfield College UUCP: eliot@qmw-dcs.uucp Mile End Road Fax: 081 980 6533 (+44 81 980 6533) LONDON E1 4NS Tel: 071 975 5229 (+44 71 975 5229)
bertrand@eiffel.UUCP (Bertrand Meyer) (05/19/91)
From <1991May14.093053.3017@jyu.fi> by sakkinen@jyu.fi (Markku Sakkinen): > note that almost all OOPL's offer only reference semantics > for objects. C++ is one of the refreshing exceptions. There are two categories of Eiffel types: reference types and expanded types. For assignment as well as argument passing (collectively called ``reattachment operations''), the former follow reference semantics, the latter follow copy semantics. Predefined expanded types include the arithmetic types INTEGER etc.; but given a class C, you can define the new expanded type `expanded C', or `expanded C [T, ...]' if C is generic. In other words, programmers get to control the exact semantics of reattachment - reference assignment or copy. -- -- Bertrand Meyer Interactive Software Engineering Inc., Santa Barbara bertrand@eiffel.uucp
jgk@osc.COM (Joe Keane) (05/21/91)
The whole idea of asking for all instances of a class is wrong. Such a facility should not be provided. It's part of Smalltalk, and certainly Smalltalk programmers will exploit it to do whatever nasty little tricks they feel they need to do. Fortunately, in my opinion, it's not part of C++. People complain that they want it, and even come up with situations where it appears to be useful. If you want to put such a facility in your class, that's your own business, and no one should try to stop you. But it shouldn't be a part of the language. Please note that i'm not claiming that C++ is better than Smalltalk. Let's not get into a language war. I just prefer the way it does some things. A class is an abstract entity. It defines a set of possible instances, and a protocol which works on these instances. You may have instances of this class in your workspace, on your disk, on another machine, and so on. This doesn't change the class. What does it mean to ask for the instances of a class? We may allow that the state of a class includes a list of its instances. Then the class changes when instances are created or destroyed. What's more, the `same' class on two different machines will actually be two different things. Like i said, some people may think this is a good idea, but i'd rather stick with the old abstract definition in terms of possible instances, rather than the instances which currently exist in some workspace. Perhaps for some low-level operations it may be useful to determine which instances of some class are in some area of memory. But surely this is an implementation detail, and such an ugly thing should not be exposed in a high-level language. We also want to talk about sharing value. Relating object-oriented and functional terminology, an object is a reference, which refers to some value. Values are immutable, while references are not. People have talked about immutable objects. These are a special case: since the reference always refers to the same value, we can eliminate a level of indirection, and say that the object is a value. It's certainly useful to be able to share values between different, possibly unrelated, references. You can save space, and it also helps out caching/memoization if you do that. But the same argument applies to mutable objects too. If two mutable objects happen to have the same value, they can still share space for it. Then if one of the objects is changed, you have to copy its value before making the change. This is analagous to `copy on write' for pages in a virtual memory system. Or you may want to have some sort of delta versioning scheme, if you don't expect both objects to have heavy use. Note that `object identity' is determined by the reference, not the value, so there is no problem with that. -- Joe Keane, professional C++ programmer jgk@osc.com (...!uunet!stratus!osc!jgk)
diamond@jit533.swstokyo.dec.com (Norman Diamond) (05/21/91)
In article <3683@sequent.cs.qmw.ac.uk> eliot@cs.qmw.ac.uk (Eliot Miranda) writes: >More than that. In systems like Smalltalk with many reflective facilities >you must also think hard about e.g. allInstancesDo:. Its no good not being >able to distinguish between two immutable objects with the same value if you >can demonstrate that in fact more than one exists. e.g. in Smalltalk, >if running the following > | count | > count := 0. > ImmutableStringValue allInstancesDo: [:s| > (s size = 3 > and: [(s at: 1) = $Y > and: [(s at: 2) = $E > and: [(s at: 3) = $S]]]) ifTrue: [ > count := count + 1]]. > count >evaluates to something more than 1 then your implementation of immutable values >is flawed. I haven't thought of a good solution to this. By this reasoning, your implementation of immutable values is flawed if count evaluates to something LESS than 1, too. I think there are a lot of types (or classes or many other categories of objects) for which there are an infinite number of immutable VALUES, though only a few of these values are actually assigned to objects at any given time. Now, if you want to perform some operation on the values that are actually assigned to objects, then allInstancesDo: will do that. You probably want to repeat your operation on all equal (pseudo-equivalent) instances. I don't see any problem there. If you want a count of how many objects are assigned the same value, then allInstancesDo: will not be sufficient -- but it never was. If you want to know whether any object is assigned the value, then compare count to 0. If you want to know whether a value is a value, the answer is always yes. I don't think you should have to insist on count evaluating to 1. -- Norman Diamond diamond@tkov50.enet.dec.com If this were the company's opinion, I wouldn't be allowed to post it. Permission is granted to feel this signature, but not to look at it.
sakkinen@jyu.fi (Markku Sakkinen) (05/22/91)
In article <4851@osc.COM> jgk@osc.COM (Joe Keane) writes: >The whole idea of asking for all instances of a class is wrong. Such a >facility should not be provided. This is somewhat too strong an opinion. Funny that it comes from a person working for an OODB company (right?) -- especially many object database people often think that the "class as collection" aspect is also necessary, and it is analogous to most other database models. Example advantage: - class (or schema) evolution: one often has to convert all existing instances to fit the new class definition. Example disadvantage: - clashes with access rights in a database: e.g. what is a user entitled to know about possible instances not accessible to him/her in a class? >It's part of Smalltalk, and certainly Smalltalk programmers will exploit it to >do whatever nasty little tricks they feel they need to do. Rather natural in a language in which even classes are first-rate runtime objects; e.g. the mentioned advantage for class evolution holds. (In fact one can contend that ordinary classes are the _only_ first-rate objects in Smalltalk, and both instances and metaclasses somewhat crippled, but that's another story.) >Fortunately, in my opinion, it's not part of C++. People complain that they >want it, and even come up with situations where it appears to be useful. If >you want to put such a facility in your class, that's your own business, and >no one should try to stop you. But it shouldn't be a part of the language. Perhaps it could be a part of a language, but only as an option for each class. Everybody who needs it would then not have to "roll his own". > ... >A class is an abstract entity. It defines a set of possible instances, and a > ... No, it isn't only an abstract entity even in C++ if it has "static members" (class variables). > ... >We also want to talk about sharing value. Relating object-oriented and >functional terminology, an object is a reference, which refers to some value. > ... Well ... Conceptually, from an Algol 68 -like viewpoint, you can say that. But in actual OOPL's (most certainly in C++) objects are not references but containers of values. This makes an important difference in the following. > ... >But the same argument applies to mutable objects too. If two mutable objects >happen to have the same value, they can still share space for it. Then if one >of the objects is changed, you have to copy its value before making the >change. This is analagous to `copy on write' for pages in a virtual memory > ... Not quite so, although the main principle is good. The programmer must define a "front end" class for the objects that clients are allowed to see, and a "back end" class for hidden objects to which the visible objects contain references. The back end objects then contain the actual data and reference counts, and the above tactic can be applied to them. Markku Sakkinen Department of Computer Science and Information Systems University of Jyvaskyla (a's with umlauts) PL 35 SF-40351 Jyvaskyla (umlauts again) Finland SAKKINEN@FINJYU.bitnet (alternative network address)
jgk@osc.COM (Joe Keane) (05/25/91)
In article <4851@osc.COM> some radical guy writes: >The whole idea of asking for all instances of a class is wrong. Such a >facility should not be provided. In article <1991May22.053938.20827@jyu.fi> sakkinen@jytko.jyu.fi (Markku Sakkinen) writes: >This is somewhat too strong an opinion. Funny that it comes from a person >working for an OODB company (right?) -- especially many object database >people often think that the "class as collection" aspect is also necessary, >and it is analogous to most other database models. I'd say that's what containers are for. Suppose you want to be be able to find something like resistors. If you're smart you'd put them together in a box, and maybe sort them by value. You don't find all the resistors in the universe, and then pick out those that have the right value and belong to you. If you want to find all the instances of a given class in some container, that's fine with me. We even let you find all the instances of a given class in a given database. I don't think this is a great idea, but you can do it. But how about `find all instances of a given class'? What does it mean? Find all instances in memory? In general only some of the objects you're working with are actually in main memory at a given time, and the whole point of an ODBMS is to hide whether objects are `in' or `out'. Find all instances in all databases you're connected to? Again, which databases you're connected to should not be such an important part of your state. Find all instances in all databases in existence? This makes sense, but it has some obvious practical problems. >Example advantage: > - class (or schema) evolution: one often has to convert all existing > instances to fit the new class definition. No, that's the old way to do it. Relational database users tell horror stories about how it took a day to add a column to a large table. Of course, during this time, the database is unusable. The same criticism applies at least as much to objects. Without getting to much into details, i'll just note that our product has lazy schema evolution. You can make a new version of class, but the old instances aren't converted to the new schema until they're used. >Example disadvantage: > - clashes with access rights in a database: e.g. what is a user entitled > to know about possible instances not accessible to him/her in a class? Right. On another topic, i wrote: >But the same argument applies to mutable objects too. If two mutable objects >happen to have the same value, they can still share space for it. Then if one >of the objects is changed, you have to copy its value before making the >change. This is analagous to `copy on write' for pages in a virtual memory Markku writes: >Not quite so, although the main principle is good. The programmer must >define a "front end" class for the objects that clients are allowed >to see, and a "back end" class for hidden objects to which the visible >objects contain references. The back end objects then contain the >actual data and reference counts, and the above tactic can be applied >to them. I think we're saying the same thing: `front end' = reference = mutable object `back end' = value = immuable object -- Joe Keane, amateur mathematician jgk@osc.com (...!uunet!stratus!osc!jgk)