steveb@tekcsc.WV.TEK.COM (04/20/91)
A few days ago I posted a query in comp.lang.smalltalk because I was confused
about deep and shallow equality. I was hoping to either get references to
OOPSLA type papers or a good definition. Judging from what I received, most
people either didn't understand my question or didn't understand deep and
shallow equality. I got all manner of answers including such things as
"equality is shallowEquality and identity is deepEquality". So, in case
anyone is interested here is the best answer I received. It is from Harry
Porter who teaches Object-Oriented Design at Portland State University. He
addresses the subject with examples you might not fully understand unless
you've taken his class but they're relatively clear and obvious. So, thanks
to Harry for taking the time to write this up, and thanks to everyone else who
responded. I'm posting this to comp.lang.c++ also because it's related to
OOD in general.
Steve
==============================
There are several kinds of "equality" in object-oriented
programming. In an attempt to clarify the different underlying
concepts, I use the following terms for the different meanings of the
term "equality:"
object identity
shallow equality
deep equality
similiar-for-the-purposes-of-the-application
First, "object identity" is just a pointer comparison: two objects are
identical if they are the same object. Often times, people say "two
objects are identical if they are the same object." By "the same
object," we mean they are stored in the same memory location. A
change to one object will always affect an identical object since there
is only one piece of memory involved. When we say two variables (x
and y) are identical, we mean that they contain the same pointer.
That is, they point to the identical object.
Second, there is the concept of "shallow equality." Two objects are
shallow equal if they look alike. Two variables are shallow equal if
they point to two regions of memory that are bit-wise equal to each
other. Thus, a change to one object will NOT be seen by processes
holding pointers to the other. The definition of shallow equality is
usually enlarged to include "object identity" as well. If two objects
are identical then they are generally said to be shallow equal, too. A
formal definition might be something like: "Two objects are shallow
equal if either: (1) they are identical, or (2) each corresponding pair
of fields in the objects is identical."
Third, there is the concept of "deep equal." The idea here is that the
fields in the two object do not need to be identical (i.e., they don't
need to point to the very same sub-objects) but, instead, they must
point to objects that are themselves "equal." And what do we mean
by "equal?" Namely, the sub-objects are themselves deep equal.
This loose definition leads to a more formal recursive definition: Two
objects are deep equal if either: (1) they are identical, (2) each field
is deep equal. As an example, imagine two trees represented with
instances of our Tree class. If the two tree have the same shapes and
each corresponding pair of nodes is the same, then the trees are deep
equal. Thus, we have two completely different sets of objects: one
representing one tree and the other set representing the second tree.
(The definition of deep equality given above is not entirely
satisfactory, since it doesn't exactly state that the shapes must be the
same. It works well for our simple tree example, but imagine that
we are representing a Directed Acyclic Graph (DAG) in which one
node may be the child of several parents. As a simple example, first
consider a node (labeled "A") whose left-child and right-child
pointers both point to the same (identical) node (labeled "B"). Now
imagine a second set of nodes. The top node (labeled "A") has a left-
child pointer which points to a node (labeled "B") and a right-child
pointer that points to another, non-identical node also labeled "B".
The two "A" nodes are not deep equal. Thus, a rigorous definition of
deep equality would require the same "shape" - or "co-reference
structure" - among two deep-equal structures.)
Fourth, there is the concept of "similar-for-the-purposes-of-the-
application." (Sometimes, this is called "equivalence," but Smalltalk
uses that term to mean "identical" so I would avoid it.) The example
here is comparing a Fraction with a Float. The two objects are very
different - they belong to different classes and have differing
numbers of fields - but one Fraction object might represent the same
number as some Float object. Here, the definition of equality is
application dependent. For example, we might want a Fraction be
equal to a Float if they are within some small tolerance.
Finally, there is the implementation of == and = in Smalltalk. == is
implemented primitively as a pointer comparison and therefore ==
means object identity. == is implemented in class Object and should
not be overridden.
On the other hand, = is used to mean "similiar-for-the-purposes-of-
the-application." An implementation of = is provided in various
classes. (As an example, take a look at the implementation of = in
class Number. I actually found the method in a class called
ArithmeticValue, which sits between Magnitude and Number. Not all
class hierarchies are (deep) equal!) A default implementation is
provided for = ("similiar-for-the-purposes-of-the-application") in
class Object. This default just invokes ==.
Shallow equality and deep equality are not provided in Smalltalk, at
least not in class Object. You may always implement them in any
class.
Equality is related to copying. When you make a copy of an object,
what exactly do you get? Again, there are several related concepts:
"shallow equality", "deep equality" and "pointer copying." In
Smalltalk, the assignment statement (left-arrow) performs pointer
copying: no new objects are allocated on the heap. Pointer copying is
also used for parameter passing and returned values.
If you want to create new objects, you can send a message to the
object you want to copy. All objects understand the messages
"shallowCopy", "deepCopy" and "copy." ShallowCopy will create
exactly one new object and the fields of the original will be moved
(via pointer copying) into the new object. Thus, sub-objects will be
shared. The implementation of deepCopy in Smalltalk allocates a
new object and then calls itself recursively for each field. Thus, all
sub-objects will be copied and not shared. If there is no pointer co-
reference in the object being copied, this works fine. On the other
hand, the implementation of deepCopy in Smalltalk will fail to
produce an exact copy of the DAG example mentioned above. Finally,
the message "copy" is intended to be used to create a copy as
appropriate for the application. The copy message is implemented in
class Object (as a shallowCopy), but should be overridden in classes
where other behavior is appropriate.
The importance of getting all these concepts straight cannot be over-
emphasized. When constructing complex programs, you must know
whether objects are identical or not. Changes made to one object
may be felt elsewhere if both parts of the program hold pointers to
an identical object. Also, some programs will use the identity
comparison (==) for efficiency and clarity, but this can be confusing
when objects are equal but not the same object.