steveb@tekcsc.WV.TEK.COM (04/20/91)
A few days ago I posted a query in comp.lang.smalltalk because I was confused
about deep and shallow equality. I was hoping to either get references to
OOPSLA type papers or a good definition. Judging from what I received, most
people either didn't understand my question or didn't understand deep and
shallow equality. I got all manner of answers including such things as
"equality is shallowEquality and identity is deepEquality". So, in case
anyone is interested here is the best answer I received. It is from Harry
Porter who teaches Object-Oriented Design at Portland State University. He
addresses the subject with examples you might not fully understand unless
you've taken his class but they're relatively clear and obvious. So, thanks
to Harry for taking the time to write this up, and thanks to everyone else who
responded. I'm posting this to comp.lang.c++ also because it's related to
OOD in general.
Steve
==============================
There are several kinds of "equality" in object-oriented
programming. In an attempt to clarify the different underlying
concepts, I use the following terms for the different meanings of the
term "equality:"
object identity
shallow equality
deep equality
similiar-for-the-purposes-of-the-application
First, "object identity" is just a pointer comparison: two objects are
identical if they are the same object. Often times, people say "two
objects are identical if they are the same object." By "the same
object," we mean they are stored in the same memory location. A
change to one object will always affect an identical object since there
is only one piece of memory involved. When we say two variables (x
and y) are identical, we mean that they contain the same pointer.
That is, they point to the identical object.
Second, there is the concept of "shallow equality." Two objects are
shallow equal if they look alike. Two variables are shallow equal if
they point to two regions of memory that are bit-wise equal to each
other. Thus, a change to one object will NOT be seen by processes
holding pointers to the other. The definition of shallow equality is
usually enlarged to include "object identity" as well. If two objects
are identical then they are generally said to be shallow equal, too. A
formal definition might be something like: "Two objects are shallow
equal if either: (1) they are identical, or (2) each corresponding pair
of fields in the objects is identical."
Third, there is the concept of "deep equal." The idea here is that the
fields in the two object do not need to be identical (i.e., they don't
need to point to the very same sub-objects) but, instead, they must
point to objects that are themselves "equal." And what do we mean
by "equal?" Namely, the sub-objects are themselves deep equal.
This loose definition leads to a more formal recursive definition: Two
objects are deep equal if either: (1) they are identical, (2) each field
is deep equal. As an example, imagine two trees represented with
instances of our Tree class. If the two tree have the same shapes and
each corresponding pair of nodes is the same, then the trees are deep
equal. Thus, we have two completely different sets of objects: one
representing one tree and the other set representing the second tree.
(The definition of deep equality given above is not entirely
satisfactory, since it doesn't exactly state that the shapes must be the
same. It works well for our simple tree example, but imagine that
we are representing a Directed Acyclic Graph (DAG) in which one
node may be the child of several parents. As a simple example, first
consider a node (labeled "A") whose left-child and right-child
pointers both point to the same (identical) node (labeled "B"). Now
imagine a second set of nodes. The top node (labeled "A") has a left-
child pointer which points to a node (labeled "B") and a right-child
pointer that points to another, non-identical node also labeled "B".
The two "A" nodes are not deep equal. Thus, a rigorous definition of
deep equality would require the same "shape" - or "co-reference
structure" - among two deep-equal structures.)
Fourth, there is the concept of "similar-for-the-purposes-of-the-
application." (Sometimes, this is called "equivalence," but Smalltalk
uses that term to mean "identical" so I would avoid it.) The example
here is comparing a Fraction with a Float. The two objects are very
different - they belong to different classes and have differing
numbers of fields - but one Fraction object might represent the same
number as some Float object. Here, the definition of equality is
application dependent. For example, we might want a Fraction be
equal to a Float if they are within some small tolerance.
Finally, there is the implementation of == and = in Smalltalk. == is
implemented primitively as a pointer comparison and therefore ==
means object identity. == is implemented in class Object and should
not be overridden.
On the other hand, = is used to mean "similiar-for-the-purposes-of-
the-application." An implementation of = is provided in various
classes. (As an example, take a look at the implementation of = in
class Number. I actually found the method in a class called
ArithmeticValue, which sits between Magnitude and Number. Not all
class hierarchies are (deep) equal!) A default implementation is
provided for = ("similiar-for-the-purposes-of-the-application") in
class Object. This default just invokes ==.
Shallow equality and deep equality are not provided in Smalltalk, at
least not in class Object. You may always implement them in any
class.
Equality is related to copying. When you make a copy of an object,
what exactly do you get? Again, there are several related concepts:
"shallow equality", "deep equality" and "pointer copying." In
Smalltalk, the assignment statement (left-arrow) performs pointer
copying: no new objects are allocated on the heap. Pointer copying is
also used for parameter passing and returned values.
If you want to create new objects, you can send a message to the
object you want to copy. All objects understand the messages
"shallowCopy", "deepCopy" and "copy." ShallowCopy will create
exactly one new object and the fields of the original will be moved
(via pointer copying) into the new object. Thus, sub-objects will be
shared. The implementation of deepCopy in Smalltalk allocates a
new object and then calls itself recursively for each field. Thus, all
sub-objects will be copied and not shared. If there is no pointer co-
reference in the object being copied, this works fine. On the other
hand, the implementation of deepCopy in Smalltalk will fail to
produce an exact copy of the DAG example mentioned above. Finally,
the message "copy" is intended to be used to create a copy as
appropriate for the application. The copy message is implemented in
class Object (as a shallowCopy), but should be overridden in classes
where other behavior is appropriate.
The importance of getting all these concepts straight cannot be over-
emphasized. When constructing complex programs, you must know
whether objects are identical or not. Changes made to one object
may be felt elsewhere if both parts of the program hold pointers to
an identical object. Also, some programs will use the identity
comparison (==) for efficiency and clarity, but this can be confusing
when objects are equal but not the same object.jimad@microsoft.UUCP (Jim ADCOCK) (04/23/91)
In article <10646@orca.wv.tek.com| steveb@tekcsc.WV.TEK.COM () writes: (quoting generally good stuff from Harry Porter) |First, "object identity" is just a pointer comparison: two objects are |identical if they are the same object. Often times, people say "two |objects are identical if they are the same object." By "the same |object," we mean they are stored in the same memory location. I disagree, rather: "An object's identity is that which distinguishes an object from all other objects." I caution C++ programmers that while defining "object identity" to be the same as a memory address works pretty well for Smalltalk, there are many problems with such a definition in C++. The reasons that using a memory address as the representation of an object's identity works well in Smalltalk include: 1) Smalltalk has garbage collection. Thus if an object no longer exists, we are guaranteed that its "identity" no longer exists anywhere in the system, thus we are free to re-assign that identity to some newly created object. 2) Smalltalk [generally] doesn't have multiple inheritence, thus avoiding the C++ problem that a MI object may have several addresses each of which can be considered as representing the object. Some examples of common "identity traps" that C++ programmers constantly fall into by equating 'this' to be the same thing as an object's 'identity.' * An embedded object may have the same address as an embedding object. * MI objects may have several significant addresses. * Compilers differ in how they lay out objects, thus these problems change with compiler. * C++ has VERY ill-defined rules about when and where compilers may make a copy of an object, changing "its 'identity.'" * Two objects may occupy the same address at different points in time, and C++ doesn't guarantee that all references to the prior object have gone away. etc, etc, etc. Thus I claim an object's address in C++ IS NOT its identity. To again quote Khoshofian and Abnous: "An object's identity is that which distiguishes an object from all other objects." Clearly, in C++, a memory address is frequently insufficient to correctly make this distinction. |The importance of getting all these concepts straight cannot be over- |emphasized. When constructing complex programs, you must know |whether objects are identical or not. Agreed. Thus I warn C++ programmers that frequently it is inappropriate to consider a C++ object's address[es] to be its "identity." For more complete discussions see: "Is 'this' identity?" in Feb 1991 'The C++ Report' and Object Orienteering by Khoshafian and Abnous
chip@tct.com (Chip Salzenberg) (05/01/91)
In an article with which I otherwise agree, Jim Adcock writes: >* C++ has VERY ill-defined rules about when and where compilers may make > a copy of an object, changing "its 'identity.'" Those rules describe the creation of new objects, not the changing of old objects' identities. So I would consider object copying rules to be a separate issue from object identity determination. -- Brand X Industries Sentient and Semi-Sentient Being Resources Department: Because Sometimes, "Human" Just Isn't Good Enough [tm] Chip Salzenberg <chip@tct.com>, <uunet!pdn!tct!chip>
jimad@microsoft.UUCP (Jim ADCOCK) (05/04/91)
In article <281EE52A.672F@tct.com> chip@tct.com (Chip Salzenberg) writes: |In an article with which I otherwise agree, Jim Adcock writes: | |>* C++ has VERY ill-defined rules about when and where compilers may make |> a copy of an object, changing "its 'identity.'" | |Those rules describe the creation of new objects, not the changing of |old objects' identities. So I would consider object copying rules to |be a separate issue from object identity determination. This kind of depends on one's perspective. As a programmer, if you get a copy made unexpectedly by some compiler, where you expect the original instead, then the identity you get is not the one you expected. The fact that you get the wrong object along with the wrong identity doesn't help you much. What you'd really like is clearer rules about if and when its legal for a compiler to make a copy. The idea that a compiler can introduce a "temporary" copy if and when it chooses kind of makes OOP difficult -- or at least non-portable.
chip@tct.com (Chip Salzenberg) (05/10/91)
According to jimad@microsoft.UUCP (Jim ADCOCK): >In article <281EE52A.672F@tct.com> chip@tct.com (Chip Salzenberg) writes: >|I would consider object copying rules to be a separate issue from >|object identity determination. > >This kind of depends on one's perspective. As a programmer, if you get >a copy made unexpectedly by some compiler, where you expect the original >instead, then the identity you get is not the one you expected. The fact >that you get the wrong object along with the wrong identity doesn't help >you much. If the copy can be identified as a copy because its identity is detectably different (which, in ARM C++, it is), then copying and identity become entirely unrelated issues (which, in ARM C++, they are). >What you'd really like is clearer rules about if and when its legal for >a compiler to make a copy. I am happy with the clarity of the rules: I think that I understand them. I am not happy with their laxity, however, and wish that the implementor's latitude in this area were reduced. >The idea that a compiler can introduce a "temporary" copy if and when it >chooses kind of makes OOP difficult -- or at least non-portable. Some things are difficult, yes; though I tend to avoid circumstances in which copying is optional. Non-portable, no -- just be careful not to ass_u_me that a copy will or will not be made in circumstances which allow both. -- Brand X Industries Sentient and Semi-Sentient Being Resources Department: Because Sometimes, "Human" Just Isn't Good Enough [tm] Chip Salzenberg <chip@tct.com>, <uunet!pdn!tct!chip>