steveb@tekcsc.WV.TEK.COM (04/20/91)
A few days ago I posted a query in comp.lang.smalltalk because I was confused about deep and shallow equality. I was hoping to either get references to OOPSLA type papers or a good definition. Judging from what I received, most people either didn't understand my question or didn't understand deep and shallow equality. I got all manner of answers including such things as "equality is shallowEquality and identity is deepEquality". So, in case anyone is interested here is the best answer I received. It is from Harry Porter who teaches Object-Oriented Design at Portland State University. He addresses the subject with examples you might not fully understand unless you've taken his class but they're relatively clear and obvious. So, thanks to Harry for taking the time to write this up, and thanks to everyone else who responded. I'm posting this to comp.lang.c++ also because it's related to OOD in general. Steve ============================== There are several kinds of "equality" in object-oriented programming. In an attempt to clarify the different underlying concepts, I use the following terms for the different meanings of the term "equality:" object identity shallow equality deep equality similiar-for-the-purposes-of-the-application First, "object identity" is just a pointer comparison: two objects are identical if they are the same object. Often times, people say "two objects are identical if they are the same object." By "the same object," we mean they are stored in the same memory location. A change to one object will always affect an identical object since there is only one piece of memory involved. When we say two variables (x and y) are identical, we mean that they contain the same pointer. That is, they point to the identical object. Second, there is the concept of "shallow equality." Two objects are shallow equal if they look alike. Two variables are shallow equal if they point to two regions of memory that are bit-wise equal to each other. Thus, a change to one object will NOT be seen by processes holding pointers to the other. The definition of shallow equality is usually enlarged to include "object identity" as well. If two objects are identical then they are generally said to be shallow equal, too. A formal definition might be something like: "Two objects are shallow equal if either: (1) they are identical, or (2) each corresponding pair of fields in the objects is identical." Third, there is the concept of "deep equal." The idea here is that the fields in the two object do not need to be identical (i.e., they don't need to point to the very same sub-objects) but, instead, they must point to objects that are themselves "equal." And what do we mean by "equal?" Namely, the sub-objects are themselves deep equal. This loose definition leads to a more formal recursive definition: Two objects are deep equal if either: (1) they are identical, (2) each field is deep equal. As an example, imagine two trees represented with instances of our Tree class. If the two tree have the same shapes and each corresponding pair of nodes is the same, then the trees are deep equal. Thus, we have two completely different sets of objects: one representing one tree and the other set representing the second tree. (The definition of deep equality given above is not entirely satisfactory, since it doesn't exactly state that the shapes must be the same. It works well for our simple tree example, but imagine that we are representing a Directed Acyclic Graph (DAG) in which one node may be the child of several parents. As a simple example, first consider a node (labeled "A") whose left-child and right-child pointers both point to the same (identical) node (labeled "B"). Now imagine a second set of nodes. The top node (labeled "A") has a left- child pointer which points to a node (labeled "B") and a right-child pointer that points to another, non-identical node also labeled "B". The two "A" nodes are not deep equal. Thus, a rigorous definition of deep equality would require the same "shape" - or "co-reference structure" - among two deep-equal structures.) Fourth, there is the concept of "similar-for-the-purposes-of-the- application." (Sometimes, this is called "equivalence," but Smalltalk uses that term to mean "identical" so I would avoid it.) The example here is comparing a Fraction with a Float. The two objects are very different - they belong to different classes and have differing numbers of fields - but one Fraction object might represent the same number as some Float object. Here, the definition of equality is application dependent. For example, we might want a Fraction be equal to a Float if they are within some small tolerance. Finally, there is the implementation of == and = in Smalltalk. == is implemented primitively as a pointer comparison and therefore == means object identity. == is implemented in class Object and should not be overridden. On the other hand, = is used to mean "similiar-for-the-purposes-of- the-application." An implementation of = is provided in various classes. (As an example, take a look at the implementation of = in class Number. I actually found the method in a class called ArithmeticValue, which sits between Magnitude and Number. Not all class hierarchies are (deep) equal!) A default implementation is provided for = ("similiar-for-the-purposes-of-the-application") in class Object. This default just invokes ==. Shallow equality and deep equality are not provided in Smalltalk, at least not in class Object. You may always implement them in any class. Equality is related to copying. When you make a copy of an object, what exactly do you get? Again, there are several related concepts: "shallow equality", "deep equality" and "pointer copying." In Smalltalk, the assignment statement (left-arrow) performs pointer copying: no new objects are allocated on the heap. Pointer copying is also used for parameter passing and returned values. If you want to create new objects, you can send a message to the object you want to copy. All objects understand the messages "shallowCopy", "deepCopy" and "copy." ShallowCopy will create exactly one new object and the fields of the original will be moved (via pointer copying) into the new object. Thus, sub-objects will be shared. The implementation of deepCopy in Smalltalk allocates a new object and then calls itself recursively for each field. Thus, all sub-objects will be copied and not shared. If there is no pointer co- reference in the object being copied, this works fine. On the other hand, the implementation of deepCopy in Smalltalk will fail to produce an exact copy of the DAG example mentioned above. Finally, the message "copy" is intended to be used to create a copy as appropriate for the application. The copy message is implemented in class Object (as a shallowCopy), but should be overridden in classes where other behavior is appropriate. The importance of getting all these concepts straight cannot be over- emphasized. When constructing complex programs, you must know whether objects are identical or not. Changes made to one object may be felt elsewhere if both parts of the program hold pointers to an identical object. Also, some programs will use the identity comparison (==) for efficiency and clarity, but this can be confusing when objects are equal but not the same object.
jimad@microsoft.UUCP (Jim ADCOCK) (04/23/91)
In article <10646@orca.wv.tek.com| steveb@tekcsc.WV.TEK.COM () writes: (quoting generally good stuff from Harry Porter) |First, "object identity" is just a pointer comparison: two objects are |identical if they are the same object. Often times, people say "two |objects are identical if they are the same object." By "the same |object," we mean they are stored in the same memory location. I disagree, rather: "An object's identity is that which distinguishes an object from all other objects." I caution C++ programmers that while defining "object identity" to be the same as a memory address works pretty well for Smalltalk, there are many problems with such a definition in C++. The reasons that using a memory address as the representation of an object's identity works well in Smalltalk include: 1) Smalltalk has garbage collection. Thus if an object no longer exists, we are guaranteed that its "identity" no longer exists anywhere in the system, thus we are free to re-assign that identity to some newly created object. 2) Smalltalk [generally] doesn't have multiple inheritence, thus avoiding the C++ problem that a MI object may have several addresses each of which can be considered as representing the object. Some examples of common "identity traps" that C++ programmers constantly fall into by equating 'this' to be the same thing as an object's 'identity.' * An embedded object may have the same address as an embedding object. * MI objects may have several significant addresses. * Compilers differ in how they lay out objects, thus these problems change with compiler. * C++ has VERY ill-defined rules about when and where compilers may make a copy of an object, changing "its 'identity.'" * Two objects may occupy the same address at different points in time, and C++ doesn't guarantee that all references to the prior object have gone away. etc, etc, etc. Thus I claim an object's address in C++ IS NOT its identity. To again quote Khoshofian and Abnous: "An object's identity is that which distiguishes an object from all other objects." Clearly, in C++, a memory address is frequently insufficient to correctly make this distinction. |The importance of getting all these concepts straight cannot be over- |emphasized. When constructing complex programs, you must know |whether objects are identical or not. Agreed. Thus I warn C++ programmers that frequently it is inappropriate to consider a C++ object's address[es] to be its "identity." For more complete discussions see: "Is 'this' identity?" in Feb 1991 'The C++ Report' and Object Orienteering by Khoshafian and Abnous
chip@tct.com (Chip Salzenberg) (05/01/91)
In an article with which I otherwise agree, Jim Adcock writes: >* C++ has VERY ill-defined rules about when and where compilers may make > a copy of an object, changing "its 'identity.'" Those rules describe the creation of new objects, not the changing of old objects' identities. So I would consider object copying rules to be a separate issue from object identity determination. -- Brand X Industries Sentient and Semi-Sentient Being Resources Department: Because Sometimes, "Human" Just Isn't Good Enough [tm] Chip Salzenberg <chip@tct.com>, <uunet!pdn!tct!chip>
jimad@microsoft.UUCP (Jim ADCOCK) (05/04/91)
In article <281EE52A.672F@tct.com> chip@tct.com (Chip Salzenberg) writes: |In an article with which I otherwise agree, Jim Adcock writes: | |>* C++ has VERY ill-defined rules about when and where compilers may make |> a copy of an object, changing "its 'identity.'" | |Those rules describe the creation of new objects, not the changing of |old objects' identities. So I would consider object copying rules to |be a separate issue from object identity determination. This kind of depends on one's perspective. As a programmer, if you get a copy made unexpectedly by some compiler, where you expect the original instead, then the identity you get is not the one you expected. The fact that you get the wrong object along with the wrong identity doesn't help you much. What you'd really like is clearer rules about if and when its legal for a compiler to make a copy. The idea that a compiler can introduce a "temporary" copy if and when it chooses kind of makes OOP difficult -- or at least non-portable.
chip@tct.com (Chip Salzenberg) (05/10/91)
According to jimad@microsoft.UUCP (Jim ADCOCK): >In article <281EE52A.672F@tct.com> chip@tct.com (Chip Salzenberg) writes: >|I would consider object copying rules to be a separate issue from >|object identity determination. > >This kind of depends on one's perspective. As a programmer, if you get >a copy made unexpectedly by some compiler, where you expect the original >instead, then the identity you get is not the one you expected. The fact >that you get the wrong object along with the wrong identity doesn't help >you much. If the copy can be identified as a copy because its identity is detectably different (which, in ARM C++, it is), then copying and identity become entirely unrelated issues (which, in ARM C++, they are). >What you'd really like is clearer rules about if and when its legal for >a compiler to make a copy. I am happy with the clarity of the rules: I think that I understand them. I am not happy with their laxity, however, and wish that the implementor's latitude in this area were reduced. >The idea that a compiler can introduce a "temporary" copy if and when it >chooses kind of makes OOP difficult -- or at least non-portable. Some things are difficult, yes; though I tend to avoid circumstances in which copying is optional. Non-portable, no -- just be careful not to ass_u_me that a copy will or will not be made in circumstances which allow both. -- Brand X Industries Sentient and Semi-Sentient Being Resources Department: Because Sometimes, "Human" Just Isn't Good Enough [tm] Chip Salzenberg <chip@tct.com>, <uunet!pdn!tct!chip>