[comp.object] value semantic versus reference semantic

thomasw@hpcupt1.cup.hp.com (Thomas Wang) (05/07/91)

I am looking at the issues of value semantic versus reference semantic.

The value semantic says that  a := b;  means the value of 'b' is copied
to 'a'.

The reference semantic says that  a := b;  means 'a' is pointing to the
same place as 'b'.

It has occurred to me that value semantic is equivalent to reference
semantic if the value of 'b' can never change.  So one can use reference
semantic on immutable classes, and expect them to be well behaved.

However, it seems the running speed of program using immutable classes
can be slower than an equivalent program using mutable classes, since so many
garbage objects can be generated by the immutable version.
The question is 'are there optimization algorithms that can improve
the performance of immutable classes?'
 

 -Thomas Wang
              (Everything is an object.)
                                                     wang@hpdmsjlm.cup.hp.com
                                                     thomasw@hpcupt1.cup.hp.com

sakkinen@jyu.fi (Markku Sakkinen) (05/14/91)

In article <68780001@hpcupt1.cup.hp.com> thomasw@hpcupt1.cup.hp.com (Thomas Wang) writes:
>
>I am looking at the issues of value semantic versus reference semantic.
>
>The value semantic says that  a := b;  means the value of 'b' is copied
>to 'a'.
>
>The reference semantic says that  a := b;  means 'a' is pointing to the
>same place as 'b'.

Right, and note that almost all OOPL's offer only reference semantics
for objects.  C++ is one of the refreshing exceptions.

>It has occurred to me that value semantic is equivalent to reference
>semantic if the value of 'b' can never change.  So one can use reference
>semantic on immutable classes, and expect them to be well behaved.

To avoid all ambiguity, it might be better to speak of immutable _objects_.
I must admit another good point in C++ (newer versions): immutability
('const') is not an attribute of a class but rather of a variable or object.
There seems to be the bad point in Smalltalk that one really cannot define
sensible immutable classes, because there is no distinction between
the initialisation and the modification of an object.  (Smalltalk experts:
is there a way around this problem?)

In reference-based languages, for an assignment to be equivalent to
value semantics, it is usually not sufficient for the assigned object to be
immutable itself;  it must be "recursively immutable" in the worst case.
But note that with reference semantics there is even no general definition
of the "value" of an object:  usually it is something between "shallow value"
and "deep value".

>However, it seems the running speed of program using immutable classes
>can be slower than an equivalent program using mutable classes, since so many
>garbage objects can be generated by the immutable version.
>The question is 'are there optimization algorithms that can improve
>the performance of immutable classes?'

There should be people in this group who know some answers to the question,
but so far I haven't seen any.  However, I have some doubts about
the relevance of the problem:
1. If you are modelling mutable entities with your software, why should
   you use immutable objects in the first place?
2. If you really need immutable objects, there will not be generated
   many garbage objects.

I.e., use both mutable and immutable objects according to where
each alternative is more appropriate.  Personally, I do not believe
in "applicative object-oriented languages" (if that's what you had
at the back of your mind), except perhaps for very special applications
(pun not originally intended, but it came for free).

Markku Sakkinen
Department of Computer Science and Information Systems
University of Jyvaskyla (a's with umlauts)
PL 35
SF-40351 Jyvaskyla (umlauts again)
Finland
          SAKKINEN@FINJYU.bitnet (alternative network address)

eliot@cs.qmw.ac.uk (Eliot Miranda) (05/15/91)

>In article <68780001@hpcupt1.cup.hp.com> thomasw@hpcupt1.cup.hp.com (Thomas Wang) writes:
>>It has occurred to me that value semantic is equivalent to reference
>>semantic if the value of 'b' can never change.  So one can use reference
>>semantic on immutable classes, and expect them to be well behaved.

This is not true of typical OOPLs without some effort.  Systems which give
access to an object's identity (C++ & Smalltalk) or allow enumeration of a
class's instances (Smalltalk) can allow the programmer to show that two or
more equi-valued immutable objects exist.


In article <1991May14.093053.3017@jyu.fi> sakkinen@jytko.jyu.fi (Markku Sakkinen) writes:
>To avoid all ambiguity, it might be better to speak of immutable _objects_.
>I must admit another good point in C++ (newer versions): immutability
>('const') is not an attribute of a class but rather of a variable or object.
>There seems to be the bad point in Smalltalk that one really cannot define
>sensible immutable classes, because there is no distinction between
>the initialisation and the modification of an object.  (Smalltalk experts:
>is there a way around this problem?)

Indeed. Note that in standard Smalltalk-80 & Smalltalk/V systems the symbols
are a set of unique strings.  These symbols are tested for equality by looking
at their pointer.  Message lookup matches the pointer of the message selector
symbol with an occurence of the pointer in some dictionary.  One simple way
to implement 'private' messages is to create a private set of objects (they
don't even have to be strings, they could be e.g. SmallIntegers).  Within some
compilation group (e.g. a class & its metaclass) certain message selectors can
be taken from this private set.  This means that only instances of the class
or the class itself can send & get understood messages in this set.

To implement truly immutable objects in Smalltalk arrange that
	a) the initialization message selector is private to the object's class
	b) override all mutating primitives with shouldNotImplement versions

This gives truly immutable objects provided that you trust the object's classes
to correctly initialize them & not diddle about later.

To implement truly immutable values (e.g. which can be arbitrarily duplicated
without being able to distinguish between equi-valued instances)
	a) override the == primitive with a method that compares the values of
	   the receiver & anObject.  This could be primitively implemented for
	   simple values (e.g. strings)
	b) stumble across the instance enumeration facilities:
		someInstance nextInstance allInstances allInstancesDo:
	   and realise you have a problem :-)
Part a) is a problem in some Smalltalk virtual machines because they don't
bother to look up the #== message.  On these platforms you typically have to
change the set of special selectors so it doesn't include #== and recompile the
entire system.  This ensures that the special selector send of #== which isn't
looked up is nolonger generated by the compiler.

Too many people assume the Smalltalk compilation system as a given (harder not
to assume this in Smalltalk/V).  Being open, Smalltalk-80 is much more flexible.

>
>In reference-based languages, for an assignment to be equivalent to
>value semantics, it is usually not sufficient for the assigned object to be
>immutable itself;  it must be "recursively immutable" in the worst case.
>But note that with reference semantics there is even no general definition
>of the "value" of an object:  usually it is something between "shallow value"
>and "deep value".
More than that. In systems like Smalltalk with many reflective facilities
you must also think hard about e.g. allInstancesDo:.  Its no good not being
able to distinguish between two immutable objects with the same value if you
can demonstrate that in fact more than one exists. e.g. in Smalltalk,
if running the following

	| count |
	count := 0.
	ImmutableStringValue allInstancesDo: [:s|
		(s size = 3
		and: [(s at: 1) = $Y
		and: [(s at: 2) = $E
		and: [(s at: 3) = $S]]]) ifTrue: [
			count := count + 1]].
	count

evaluates to something more than 1 then your implementation of immutable values
is flawed.  I haven't thought of a good solution to this.
		
>
>>However, it seems the running speed of program using immutable classes
>>can be slower than an equivalent program using mutable classes, since so many
>>garbage objects can be generated by the immutable version.
>>The question is 'are there optimization algorithms that can improve
>>the performance of immutable classes?'
Certainly. The use of hash tables to hold all instances (a la Smalltalk-80
symbol table) where the hash tables are 'weak arrays' so that instances only
referenced from the table will get collected.  When you want to create an
instance of some immutable value class given its intended value first look in
the hash table.  I think the same technique could be used to solve the
allInstancesDo: problem either by arranging that duplicates are never created
or by ignoring instances not in the table during enumeration.


>There should be people in this group who know some answers to the question,
>but so far I haven't seen any.  However, I have some doubts about
>the relevance of the problem:
>1. If you are modelling mutable entities with your software, why should
>   you use immutable objects in the first place?
>2. If you really need immutable objects, there will not be generated
>   many garbage objects.
>
>I.e., use both mutable and immutable objects according to where
>each alternative is more appropriate.  Personally, I do not believe
>in "applicative object-oriented languages" (if that's what you had
>at the back of your mind), except perhaps for very special applications
>(pun not originally intended, but it came for free).
>
>Markku Sakkinen
We're doing work on a distributed Smalltalk and for us immutable values can be
freely copied between machines, which is much cheaper than creating proxies.
Consider e.g. a database that returns a string in response to a query from
another machine.  If the string is mutable & referenced by other objects then
a proxy must be created. (If the string is not referenced by other objects it
can be migrated).  The proxy will reference the string via some global id.
When the client wants to fetch the characters in the string a remote message
send will be needed for each character (using some sort of concurrency control
e.g. read & write batons in distributed filing systems, can optimize this).
If the string is an immutable value then it can simply be copied.

In our distributed Smalltalk we intend to arrange that substancial parts of
classes (especially methods) will be 'recursively immutable' by Markku's
definition.  Changes to classes will be probably be done via a version system.
This should make the management of a large shared code-base much simpler.
-- 
Eliot Miranda			email:	eliot@dcs.qmw.ac.uk
Dept of Computer Science	ARPA:	eliot%dcs.qmw.ac.uk@nsf.ac.uk
Queen Mary Westfield College	UUCP:	eliot@qmw-dcs.uucp
Mile End Road			Fax:	081 980 6533 (+44 81 980 6533)
LONDON E1 4NS			Tel:	071 975 5229 (+44 71 975 5229)

bertrand@eiffel.UUCP (Bertrand Meyer) (05/19/91)

From <1991May14.093053.3017@jyu.fi> by sakkinen@jyu.fi (Markku Sakkinen):

> note that almost all OOPL's offer only reference semantics
> for objects.  C++ is one of the refreshing exceptions.

There are two categories of Eiffel types: reference types and expanded
types. For assignment as well as argument passing (collectively
called ``reattachment operations''), the former follow
reference semantics, the latter follow copy semantics. Predefined
expanded types include the arithmetic types INTEGER etc.; but
given a class C, you can define the new expanded type `expanded C',
or `expanded C [T, ...]' if C is generic.  In other words, programmers
get to control the exact semantics of reattachment - reference
assignment or copy. 
-- 
-- Bertrand Meyer
Interactive Software Engineering Inc., Santa Barbara
bertrand@eiffel.uucp

jgk@osc.COM (Joe Keane) (05/21/91)

The whole idea of asking for all instances of a class is wrong.  Such a
facility should not be provided.

It's part of Smalltalk, and certainly Smalltalk programmers will exploit it to
do whatever nasty little tricks they feel they need to do.

Fortunately, in my opinion, it's not part of C++.  People complain that they
want it, and even come up with situations where it appears to be useful.  If
you want to put such a facility in your class, that's your own business, and
no one should try to stop you.  But it shouldn't be a part of the language.

Please note that i'm not claiming that C++ is better than Smalltalk.  Let's
not get into a language war.  I just prefer the way it does some things.

A class is an abstract entity.  It defines a set of possible instances, and a
protocol which works on these instances.  You may have instances of this class
in your workspace, on your disk, on another machine, and so on.  This doesn't
change the class.  What does it mean to ask for the instances of a class?

We may allow that the state of a class includes a list of its instances.  Then
the class changes when instances are created or destroyed.  What's more, the
`same' class on two different machines will actually be two different things.
Like i said, some people may think this is a good idea, but i'd rather stick
with the old abstract definition in terms of possible instances, rather than
the instances which currently exist in some workspace.

Perhaps for some low-level operations it may be useful to determine which
instances of some class are in some area of memory.  But surely this is an
implementation detail, and such an ugly thing should not be exposed in a
high-level language.

We also want to talk about sharing value.  Relating object-oriented and
functional terminology, an object is a reference, which refers to some value.
Values are immutable, while references are not.  People have talked about
immutable objects.  These are a special case: since the reference always
refers to the same value, we can eliminate a level of indirection, and say
that the object is a value.  It's certainly useful to be able to share values
between different, possibly unrelated, references.  You can save space, and it
also helps out caching/memoization if you do that.

But the same argument applies to mutable objects too.  If two mutable objects
happen to have the same value, they can still share space for it.  Then if one
of the objects is changed, you have to copy its value before making the
change.  This is analagous to `copy on write' for pages in a virtual memory
system.  Or you may want to have some sort of delta versioning scheme, if you
don't expect both objects to have heavy use.  Note that `object identity' is
determined by the reference, not the value, so there is no problem with that.
--
Joe Keane, professional C++ programmer
jgk@osc.com (...!uunet!stratus!osc!jgk)

diamond@jit533.swstokyo.dec.com (Norman Diamond) (05/21/91)

In article <3683@sequent.cs.qmw.ac.uk> eliot@cs.qmw.ac.uk (Eliot Miranda) writes:

>More than that. In systems like Smalltalk with many reflective facilities
>you must also think hard about e.g. allInstancesDo:.  Its no good not being
>able to distinguish between two immutable objects with the same value if you
>can demonstrate that in fact more than one exists. e.g. in Smalltalk,
>if running the following
>	| count |
>	count := 0.
>	ImmutableStringValue allInstancesDo: [:s|
>		(s size = 3
>		and: [(s at: 1) = $Y
>		and: [(s at: 2) = $E
>		and: [(s at: 3) = $S]]]) ifTrue: [
>			count := count + 1]].
>	count
>evaluates to something more than 1 then your implementation of immutable values
>is flawed.  I haven't thought of a good solution to this.

By this reasoning, your implementation of immutable values is flawed if count
evaluates to something LESS than 1, too.  I think there are a lot of types
(or classes or many other categories of objects) for which there are an
infinite number of immutable VALUES, though only a few of these values are
actually assigned to objects at any given time.

Now, if you want to perform some operation on the values that are actually
assigned to objects, then allInstancesDo: will do that.  You probably want
to repeat your operation on all equal (pseudo-equivalent) instances.  I
don't see any problem there.

If you want a count of how many objects are assigned the same value, then
allInstancesDo: will not be sufficient -- but it never was.  If you want to
know whether any object is assigned the value, then compare count to 0.
If you want to know whether a value is a value, the answer is always yes.
I don't think you should have to insist on count evaluating to 1.
--
Norman Diamond       diamond@tkov50.enet.dec.com
If this were the company's opinion, I wouldn't be allowed to post it.
Permission is granted to feel this signature, but not to look at it.

sakkinen@jyu.fi (Markku Sakkinen) (05/22/91)

In article <4851@osc.COM> jgk@osc.COM (Joe Keane) writes:
>The whole idea of asking for all instances of a class is wrong.  Such a
>facility should not be provided.

This is somewhat too strong an opinion.  Funny that it comes from a person
working for an OODB company (right?) -- especially many object database
people often think that the "class as collection" aspect is also necessary,
and it is analogous to most other database models.  Example advantage:
 - class (or schema) evolution:  one often has to convert all existing
   instances to fit the new class definition.
Example disadvantage:
 - clashes with access rights in a database:  e.g. what is a user entitled
   to know about possible instances not accessible to him/her in a class?

>It's part of Smalltalk, and certainly Smalltalk programmers will exploit it to
>do whatever nasty little tricks they feel they need to do.

Rather natural in a language in which even classes are first-rate
runtime objects; e.g. the mentioned advantage for class evolution holds.
(In fact one can contend that ordinary classes are the _only_ first-rate
objects in Smalltalk, and both instances and metaclasses somewhat crippled,
but that's another story.)

>Fortunately, in my opinion, it's not part of C++.  People complain that they
>want it, and even come up with situations where it appears to be useful.  If
>you want to put such a facility in your class, that's your own business, and
>no one should try to stop you.  But it shouldn't be a part of the language.

Perhaps it could be a part of a language, but only as an option
for each class.  Everybody who needs it would then not have to
"roll his own".

> ...
>A class is an abstract entity.  It defines a set of possible instances, and a
> ...

No, it isn't only an abstract entity even in C++ if it has "static members"
(class variables).

> ...
>We also want to talk about sharing value.  Relating object-oriented and
>functional terminology, an object is a reference, which refers to some value.
> ...

Well ...  Conceptually, from an Algol 68 -like viewpoint, you can say that.
But in actual OOPL's (most certainly in C++) objects are not references
but containers of values.  This makes an important difference in the
following.

> ...
>But the same argument applies to mutable objects too.  If two mutable objects
>happen to have the same value, they can still share space for it.  Then if one
>of the objects is changed, you have to copy its value before making the
>change.  This is analagous to `copy on write' for pages in a virtual memory
> ...

Not quite so, although the main principle is good.  The programmer must
define a "front end" class for the objects that clients are allowed
to see, and a "back end" class for hidden objects to which the visible
objects contain references.  The back end objects then contain the
actual data and reference counts, and the above tactic can be applied
to them.

Markku Sakkinen
Department of Computer Science and Information Systems
University of Jyvaskyla (a's with umlauts)
PL 35
SF-40351 Jyvaskyla (umlauts again)
Finland
          SAKKINEN@FINJYU.bitnet (alternative network address)

jgk@osc.COM (Joe Keane) (05/25/91)

In article <4851@osc.COM> some radical guy writes:
>The whole idea of asking for all instances of a class is wrong.  Such a
>facility should not be provided.

In article <1991May22.053938.20827@jyu.fi> sakkinen@jytko.jyu.fi (Markku
Sakkinen) writes:
>This is somewhat too strong an opinion.  Funny that it comes from a person
>working for an OODB company (right?) -- especially many object database
>people often think that the "class as collection" aspect is also necessary,
>and it is analogous to most other database models.

I'd say that's what containers are for.  Suppose you want to be be able to
find something like resistors.  If you're smart you'd put them together in a
box, and maybe sort them by value.  You don't find all the resistors in the
universe, and then pick out those that have the right value and belong to you.

If you want to find all the instances of a given class in some container,
that's fine with me.  We even let you find all the instances of a given class
in a given database.  I don't think this is a great idea, but you can do it.

But how about `find all instances of a given class'?  What does it mean?  Find
all instances in memory?  In general only some of the objects you're working
with are actually in main memory at a given time, and the whole point of an
ODBMS is to hide whether objects are `in' or `out'.  Find all instances in all
databases you're connected to?  Again, which databases you're connected to
should not be such an important part of your state.  Find all instances in all
databases in existence?  This makes sense, but it has some obvious practical
problems.

>Example advantage:
> - class (or schema) evolution:  one often has to convert all existing
>   instances to fit the new class definition.

No, that's the old way to do it.  Relational database users tell horror
stories about how it took a day to add a column to a large table.  Of course,
during this time, the database is unusable.  The same criticism applies at
least as much to objects.  Without getting to much into details, i'll just
note that our product has lazy schema evolution.  You can make a new version
of class, but the old instances aren't converted to the new schema until
they're used.

>Example disadvantage:
> - clashes with access rights in a database:  e.g. what is a user entitled
>   to know about possible instances not accessible to him/her in a class?

Right.

On another topic, i wrote:
>But the same argument applies to mutable objects too.  If two mutable objects
>happen to have the same value, they can still share space for it.  Then if one
>of the objects is changed, you have to copy its value before making the
>change.  This is analagous to `copy on write' for pages in a virtual memory

Markku writes:
>Not quite so, although the main principle is good.  The programmer must
>define a "front end" class for the objects that clients are allowed
>to see, and a "back end" class for hidden objects to which the visible
>objects contain references.  The back end objects then contain the
>actual data and reference counts, and the above tactic can be applied
>to them.

I think we're saying the same thing:
`front end' = reference = mutable object
`back end' = value = immuable object
--
Joe Keane, amateur mathematician
jgk@osc.com (...!uunet!stratus!osc!jgk)