[comp.lang.smalltalk] deepEquality and shallowEquality revisited.

steveb@tekcsc.WV.TEK.COM (04/20/91)

A few days ago I posted a query in comp.lang.smalltalk because I was confused
about deep and shallow equality.  I was hoping to either get references to
OOPSLA type papers or a good definition.  Judging from what I received, most
people either didn't understand my question or didn't understand deep and 
shallow equality.  I got all manner of answers including such things as
"equality is shallowEquality and identity is deepEquality".  So, in case
anyone is interested here is the best answer I received.  It is from Harry
Porter who teaches Object-Oriented Design at Portland State University.  He
addresses the subject with examples you might not fully understand unless
you've taken his class but they're relatively clear and obvious.  So, thanks
to Harry for taking the time to write this up, and thanks to everyone else who
responded.  I'm posting this to comp.lang.c++ also because it's related to
OOD in general.

Steve

==============================

There are several kinds of "equality" in object-oriented 
programming.  In an attempt to clarify the different underlying 
concepts, I use the following terms for the different meanings of the 
term "equality:"
	object identity
	shallow equality
	deep equality
	similiar-for-the-purposes-of-the-application

First, "object identity" is just a pointer comparison:  two objects are 
identical if they are the same object.  Often times, people say "two 
objects are identical if they are the same object."  By "the same 
object," we mean they are stored in the same memory location.  A 
change to one object will always affect an identical object since there 
is only one piece of memory involved.  When we say two variables (x 
and y) are identical, we mean that they contain the same pointer.  
That is, they point to the identical object. 

Second, there is the concept of "shallow equality."  Two objects are 
shallow equal if they look alike.  Two variables are shallow equal if 
they point to two regions of memory that are bit-wise equal to each 
other.  Thus, a change to one object will NOT be seen by processes 
holding pointers to the other.  The definition of shallow equality is 
usually enlarged to include "object identity" as well.  If two objects 
are identical then they are generally said to be shallow equal, too.  A 
formal definition might be something like: "Two objects are shallow 
equal if either: (1) they are identical, or (2) each corresponding pair 
of fields in the objects is identical."

Third, there is the concept of "deep equal."  The idea here is that the 
fields in the two object do not need to be identical (i.e., they don't 
need to point to the very same sub-objects) but, instead, they must 
point to objects that are themselves "equal."  And what do we mean 
by "equal?"  Namely, the sub-objects are themselves deep equal.  
This loose definition leads to a more formal recursive definition:  Two 
objects are deep equal if either: (1) they are identical, (2) each field 
is deep equal.  As an example, imagine two trees represented with 
instances of our Tree class.  If the two tree have the same shapes and 
each corresponding pair of nodes is the same, then the trees are deep 
equal.  Thus, we have two completely different sets of objects: one 
representing one tree and the other set representing the second tree.  
(The definition of deep equality given above is not entirely 
satisfactory, since it doesn't exactly state that the shapes must be the 
same.  It works well for our simple tree example, but imagine that 
we are representing a Directed Acyclic Graph (DAG) in which one 
node may be the child of several parents.  As a simple example, first 
consider a node (labeled "A") whose left-child and right-child 
pointers both point to the same (identical) node (labeled "B").  Now 
imagine a second set of nodes.  The top node (labeled "A") has a left-
child pointer which points to a node (labeled "B") and a right-child 
pointer that points to another, non-identical node also labeled "B".  
The two "A" nodes are not deep equal.  Thus, a rigorous definition of 
deep equality would require the same "shape" - or "co-reference 
structure" - among two deep-equal structures.)

Fourth, there is the concept of "similar-for-the-purposes-of-the-
application."  (Sometimes, this is called "equivalence," but Smalltalk 
uses that term to mean "identical" so I would avoid it.)  The example 
here is comparing a Fraction with a Float.  The two objects are very 
different - they belong to different classes and have differing 
numbers of fields - but one Fraction object might represent the same 
number as some Float object.  Here, the definition of equality is 
application dependent.  For example, we might want a Fraction be 
equal to a Float if they are within some small tolerance.

Finally, there is the implementation of == and = in Smalltalk.  == is 
implemented primitively as a pointer comparison and therefore == 
means object identity.  == is implemented in class Object and should 
not be overridden.

On the other hand, = is used to mean "similiar-for-the-purposes-of-
the-application."  An implementation of = is provided in various 
classes.  (As an example, take a look at the implementation of = in 
class Number. I actually found the method in a class called 
ArithmeticValue, which sits between Magnitude and Number.  Not all 
class hierarchies are (deep) equal!)  A default implementation is 
provided for = ("similiar-for-the-purposes-of-the-application") in 
class Object.  This default just invokes ==.

Shallow equality and deep equality are not provided in Smalltalk, at 
least not in class Object.  You may always implement them in any 
class.

Equality is related to copying.  When you make a copy of an object, 
what exactly do you get?  Again, there are several related concepts: 
"shallow equality", "deep equality" and "pointer copying."  In 
Smalltalk, the assignment statement (left-arrow) performs pointer 
copying: no new objects are allocated on the heap.  Pointer copying is 
also used for parameter passing and returned values.

If you want to create new objects, you can send a message to the 
object you want to copy.  All objects understand the messages 
"shallowCopy", "deepCopy" and "copy."  ShallowCopy will create 
exactly one new object and the fields of the original will be moved 
(via pointer copying) into the new object.  Thus, sub-objects will be 
shared.  The implementation of deepCopy in Smalltalk allocates a 
new object and then calls itself recursively for each field.  Thus, all 
sub-objects will be copied and not shared.  If there is no pointer co-
reference in the object being copied, this works fine.  On the other 
hand, the implementation of deepCopy in Smalltalk will fail to 
produce an exact copy of the DAG example mentioned above.  Finally, 
the message "copy" is intended to be used to create a copy as 
appropriate for the application.  The copy message is implemented in 
class Object (as a shallowCopy), but should be overridden in classes 
where other behavior is appropriate.

The importance of getting all these concepts straight cannot be over-
emphasized.  When constructing complex programs, you must know 
whether objects are identical or not.  Changes made to one object 
may be felt elsewhere if both parts of the program hold pointers to 
an identical object.  Also, some programs will use the identity 
comparison (==) for efficiency and clarity, but this can be confusing 
when objects are equal but not the same object.