[comp.lang.lisp] EQ and EQUAL

jeff@aiva.ed.ac.uk (Jeff Dalton, JANET: J.Dalton@uk.ac.ed ) (01/01/70)

The discussion of EQ vs. EQUAL seems to be assuming that

   -- EQ on lists is just an efficiency hack.

   -- EQ is unreliable and implementation-dependent.

So EQ gets classed with goto as something that we'd be better off
without.  However, EQ is not just a fast-but-unreliable EQUAL.

It is right to say that EQ is implementation-dependent, but it still
has certain properties that allow it to be used in portable ways.
Whether it is still useful to use EQ depends on other properties of
the Lisp systems in question.

In the "usual" Lisps, there is a ordering of equality predicates.  In
Common Lisp, the order from most to least strict is: EQ, EQL, EQUAL,
EQUALP.  (I'm leaving out type- specific tests like = and CHAR=.)  If
a more strict predicate returns true for some arguments, all less
strict predicates will also return true.  So, if EQ returns true, this
result is no more unreliable or implementation-dependent that one from
EQUAL -- because EQUAL will always return true as well.

But, one might argue, EQUAL calls EQ first: if you trust EQ only when
it returns true, you might as well use EQUAL instead.  Here I agree.
So the question is: is it ever useful to make the kind of distinction
made by EQ?  The answer to this question will depend on what other
operations are available in your Lisp.

In a sense, EQ tests whether two objects are the same identical object.
But how, apart from using EQ (or EQL), would one know whether objects
were identical?

One method would be to modify one of the objects and see whether the
other changes too.  If your Lisp includes RPLACA or RPLACD, two EQUAL
conses can be distinguished by this means, and so it is possible to
define a CONS-EQ that is stricter than EQUAL.  CONS-EQ would be a
useful test for "operational equivalence", not a fast-but-unreliable
EQUAL.

In the "usual" Lisps RPLACA and RPLACD *are* available, EQ returns true
for conses that are the same under destructive modification (and false
for conses that are different), and CONS always returns a new, EQ-unique,
object each time it's called.  Using EQ in this way works in all of these
Lisps.  You do not need to understand internals of the implementation,
only that EQ, CONS, etc. have certain properties.

Now, there are some who would like to get rid of RPLACA and RPLACD as
well as EQ.  And if you do so, it is not longer necessary for CONS to
create a new object each time.  It might ("hashed cons") quickly look
to see if the right (EQUAL) thing already existed.  And if CONS always
did this, EQUAL for conses could just be a pointer comparison, as EQ
is in more standard Lisps.

So the remaining question is to ask why anyone would want to keep
the operations for destructive modification.

One answer is that having objects that can change over time is an
effective technique for modeling real-world objects that change over
time.  For example, if we represent an person as a Lisp object, and the
person's address changes, we can follow the change by changing the Lisp
object instead of making a new object with a different address (but
otherwise the same) and then doing everything else required to give
the new object the same role in the database that the old one had.
-- 
Jeff Dalton,                          JANET: J.Dalton@uk.ac.ed             
AI Applications Institute,            ARPA:  J.Dalton%uk.ac.ed@cs.ucl.ac.uk
Edinburgh University.                 UUCP:  ...!ukc!ed.ac.uk!J.Dalton

bart@reed.UUCP (Bart Massey) (07/19/87)

In article <3927@spool.WISC.EDU> neves@ai.WISC.EDU (David M. Neves) writes:
> In article <6601@reed.UUCP> bart@reed.UUCP (Bart Massey, me) writes:
> >At Reed college we are currently doing almost all of our lisp
> >programming in an in-house dialect which supports neither destructive
> >list functions nor dotted pairs.
> ...
> >In case anyone is interested, the following major changes in the
> >language were also implemented, and have proven useful:
> >
> >- Elimination of eq: Using a properly written version of equal, eq is
> >the first thing to happen anyhow, and we've never encountered a
> >situation where knowing that two lists were non-eq is "good enough".

I've gotten several replies already to this statement, so I thought
I'd try to give a little more background and detail about what I meant...

There are three explicit Reed Lisp design goals that are relevant to
this question.  These are:

-- Lisp should be easy for non-lisp-programmers, and non-programmers,
to learn and to read.

-- The internal operation of a lisp interpreter/compiler should, as
much as possible, be hidden from the user, for the convenience of the
implementor as well as the user himself.

-- Efficiency is unimportant up to a constant.  I'm not sure how to
put this clearly...  If the language is designed in such a way as to
force certain algorithms to be higher "order" or to take more than an
"order of magnitude" longer to run, the design should be reconsidered.
If, on the other hand, a language design change would allow algorithms
to run a "small constant" amount faster, it should not be implemented
unless there are other compelling reasons for it.  This still isn't
very well said, but I think you can appreciate what I mean...

> I can construct such a situation.  Imagine a list of lists.  You want
> to find out if a list you have a pointer to is contained within that
> list (i.e. MEMBER).  You want to use EQ as the comparison function for
> efficiency -- especially if the first few characters of the non eq
> lists are the same as the one you are looking for.
> 
> e.g.
> look for (a b c) object in
> List:  ( (a b z) (a b a) (a b c d) (a b q) (a b r) (a b c) )

Yep, this is the classic case.  Let's look at a little different one,
just for the sake of clarity:

	look for (a b c) in target
	( (a b c d) (a b c) (a b c d) (a b c) )

OK, here's what I'm thinking when I see this:

Do I know that both instances of (a b c) in target are the same
pointer?  If not, is this a problem? (In other words, do I really want
to look for (a b c) in target, or for a *specific* (a b c)?)  There
are two cases:

	I)  I want *any* (a b c).  This is by *far* the most common
case in my (admittedly somewhat limited) experience.  In this case, I
can still use eq, but I have several problems:

		I must be very careful that I don't insert (a b c)
into target in any way except by inserting the appropriate pointer.

		Anyone trying to read or debug my code must also be very
careful that I have done so.  This is generally non-trivial.

		I must understand the internals of my implementation
*very well* to insure to myself that I am always putting the proper
pointer in.  Such details actually vary a great deal from
implementation to implementation, so I must write for a specific
interpreter.

	Case II)  I am only interested in finding my pointer.  I don't
really care that it happens to be pointing to the list (a b c).  This
is the case which I have never seen occur in practice, and have a hard
time thinking of an example of.  Cases that *look* like this
generally look that way because the programmer has spent time
trying to figure out where his (a b c)s are.

With respect to the three design goals stated above, I suspect it is
now clear why we didn't implement eq.  It makes lisp harder to
understand, the code harder to read, is implementation dependent.
Finally, note what kind of efficiency increase you get from its use in
the above example.  

In compiled lisp, you get effectively a factor of 5, during this one
line of code, because you are doing 5 comparisons (the list and four
of its elements) in the equal version, to each single test (the list)
in the eq version.  But note that these 5 comparisons are
approximately 20 machine instructions (5 compares, 5 cdrs, and a
(machine-level) recursion), versus the single machine instruction of
eq.  The overhead of executing compiled lisp code effectively swamps
this gain in all but the largest of cases, and thus you're not likely
to be able to measure the speedup in a piece of real code.

	The interpreted case is even worse.  Ours is a fast lisp
interpreter -- approximately 2-10 times as fast as Franz.  In our
lisp, an equal on even a 1000 element deeply recursive list with all
but the last element succeeding is completely swamped by the overhead
of the eval associated with executing the equal, in spite of the fact
that much of our speed advantage is due to a very fast eval.

	I hope this clarifies why we left eq out of our lisp
interpreter.  My friends and I are awfully fond of the concept of
"fast enough", and "The Law Of Virtual Efficiency: A "Real Programmer"
will spend a week to save the user 10 seconds over the life of the
program."

But then again, what do we know???  Precious little!  Please send
replies to me, I will summarize.  BTW, thanks much for all the
flattering and interested responses I've received so far!  This was
the first net article I'd posted in a long time that I didn't receive
*a single flame* about!!  Some useful criticisms, but no flames! How
very pleasant!...

					Bart the Long-Winded :-)

weltyc@b.nyser.net (Christopher A. Welty) (07/20/87)

In article <6638@reed.UUCP> bart@reed.UUCP (Bart Massey) writes:
> ...
>	I)  I want *any* (a b c).  This is by *far* the most common
>case in my (admittedly somewhat limited) experience.  In this case, I

	It is true that this is more common.

>
>		I must understand the internals of my implementation
>*very well* to insure to myself that I am always putting the proper
>pointer in.  Such details actually vary a great deal from
>implementation to implementation, so I must write for a specific
>interpreter.
>

	Well, this is not really all true.  If you just understand the
way LISP is supoosed to work, and you are using a more standard form
of LISP, there are certain things that are guaranteed to work in a
certain way.  Of course you are changing this guarantee.  I have
written portable LISP code using complex pointer manipulations.

>
>	Case II)  I am only interested in finding my pointer.  I don't
>really care that it happens to be pointing to the list (a b c).  This
>is the case which I have never seen occur in practice, and have a hard
>time thinking of an example of.

	There are applications of this in practice.  They are not
examples that are easy to think of, nor are they simple enough to
explain briefly, but they come up when doing very large real
applications in LISP.  Being able to get to your pointers is very
important.  Why do you think C is so popular?  Because pointers are
very useful and allow you to do flexible things.  Sure, some people
view them as "too low level" for a language that is supposed to
support high level abstractions.  And using this apprach can sometimes
make *code* less readable.  [Of course, if programmers were taught
earlier on how to correctly DOCUMENT code as they are writing it, this
wouldn't be a problem, and we wouldn't have to worry about making
computer languages that read like english (COBOL was an attempt at
this).]  Hardware, however, has not gotten fast enough that
programmers can afford to write LISP code that is not
efficient...

	Don't get me wrong this is not a flame.  I use LISP quite a
lot, I teach it, I think it's a great language with a lot of great
ideas.  I think it will be around for a long time.  But there are some
SERIOUS problems with using it for real work, and you have proposed
good ideas for helping it past some of its major problems, this is
good!  But don't eliminate important elements of the language because
you can't think of an example of how it could be used.  If it's there,
SOMEONE must be using it!



Christopher Welty - Asst. Director, RPI CS Labs
weltyc@cs.rpi.edu       ...!seismo!rpics!weltyc

wagle@iucs.cs.indiana.edu (07/21/87)

  I view two objects that are otherwise EQUAL, but not EQ as having
different "timestamps".  The two objects look the same, but they are
different because two different place/time's created them.  Hash-tables on
symbols and the like add some texture to this model by allowing two
different creation place/time's to point to the same object (this is the
case for symbols in particular, often subsets of the numbers, etc), which is
useful only if done in a predictable (easy to understand) way.

  Conceptually, these pointers often act as keys for various searches (MEMQ,
ASSQ, or homebrew tree searches).  Also, since when one is destructively
modifying a tree (copying entire trees to modify one piece can sometimes be
(1) prohibitly time-expensive (2) eats cons cells and causes the garbage
collector to be called more often and (3) isn't necessarily more
understandable, and often requires more code to support) one's invariants
have a lot of EQ's of subtrees in them.  Having a close correspondance
between one's invariants and the actual code being written makes life more
pleasant, easier, and the code easier to read and understand.

  While appealing intuitively, your concept of efficiency is horrifying.
Roughly, your MEMQ search is order N, while the equivalent MEMBER search
(for the example given) is order N squared (roughly, I said).  For the small
N in your examples, N squared minus N isn't very large and IS negligible.
But what about large N?  How about N = 1000?  1000000?  Just eight times
slower means what could have taken 1 hour now takes 8 hours!  Try using a
300 BAUD terminal and a 9600 BAUD terminal to print a long file.  What takes
a minute on the 9600 BAUD terminal takes over half an hour on the 300 BAUD
terminal.  Even small factors can add up.

  What you've done by removing EQ is to remove an opportunity for writing
more efficient programs, WITHOUT REPLACING IT WITH SOMETHING EQUIVALENT OR
BETTER!  *THIS* is why people mutter and flame about your LISP being a "toy"
language or "fundamentally brain damaged".  You've shut out whole classes of
large programs without providing any alternative.

  Now, I don't have an alternative.  I think the equivalence classes
introduced by pointers on objects by creation times and hash-tables is an
essential aspect of LISP-like languages.  My programming style is such that
correctness is natural, obvious, and automatic once you understand the
paradigm -- I have never had problems with objects not being EQ when they
should or EQ when they shouldn't be (nor have I had to think about it much).
I don't even think I've used EQUAL in a long time.  Your challenge is to
change my mind by providing a faster or a equal but more elegant
alternative.  You can't change my mind by taking it away, for I simply won't
use your language.

  By the way, you're stuck with von Neumann architectures for the next
decade at least.  I realize my pointer semantics are ugly and unneccessary
on certain non-standard architectures where efficient alternatives exist.
In the meantime, my pointer semantics are explicit in my programs because I
haven't yet built optimizing compilers that understand it well enough to
allow it to be implicit (and possibly non-existant on appropriate
architectures).  I'm trying though.

-  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -  -

  If you want to bend your mind thinking about other things, consider the
idea of HASH-CONS.  HASH-CONS on a particular pair of values always returns
the same cons cell.  What does this allow you to do?  What problems does it
solve?  What problems does it create?  Should you be able to RPLACA and
RPLACD a cell cell produced by HASH-CONS?

  But thats trivia (and possibly bogus).  If you are really concerned about
good language design for LISP, look into SCHEME where you have lexical
scoping for identifiers, first class functions (can be passed as arguments
as well as returned as values), and many other goodies.  Other functional
languages (like ML, f'instance) are worth looking into.  Here you are
talking major overhaul of the entire language, not nit picking certain
aspects.

  Like the man said, its good that you are thinking about these things, but
don't get bogged down.

  Your goal as a language designer should be to allow the programmer to do
what he wants, but to give him the best, cleanest, and simplest tools for
doing that.  You might change HOW he does what he wants, but never WHAT he
wants.  If you start trying to dictate what he wants to do, you've just
murdered your language, because the programmer's just going to get pissed
off, and go to a more pleasant language at his first opportunity.

roberts@cognos.uucp (Robert Stanley) (07/30/87)

In article <290@nysernic> weltyc@nic.nyser.net (Christopher A. Welty) writes:

>...Why do you think C is so popular?  Because pointers are
>very useful and allow you to do flexible things.  Sure, some people
>view them as "too low level" for a language that is supposed to
>support high level abstractions....

Until we get much more powerful computing systems out in the field, remembering
that the lag to plain old boring commercial DP in the field is about 15 years,
all persons writing 'real' applications will at some point be obsessed by
efficiency.  Further, human nature tends to leave us doing things the way we've
always done them.  How long has it taken for the concept of GOTO-less programs
to reach grunt level?  Has it?  It will take a long time to wean programmers
from pointers, even if there is an adequate alternative.  Adequate of course is
a mutable measurement, depending on local context.

>...But don't eliminate important elements of the language because
>you can't think of an example of how it could be used.  If it's there,
>SOMEONE must be using it!

Not so at all.  Nearly everything in large-scale software is prey to creeping
featurism, and the 'it seemed like a good idea at the time' syndrome.  Tools,
in which I include all programming languages, are no exception to this.  I can
name at least one heavy-duty bug that remained undetected for years for each of
several different language implementations.  Undetected for years means that
the feature was essentially unused.  We are suffering from the legacy of
solutions in search of a problem, and I welcome initiatives that offer a
departure from this sorry state of affairs.

The key issue is not to remove a feature from the language without providing a
way of simulating it.  If there are large cries from users that the only way to
do something is too slow/clumsy/impractical/etc. then is the time to take note
and introduce release two.

-- 
Robert Stanley           Compuserve: 76174,3024        Cognos Incorporated
 uucp: decvax!utzoo!dciem!nrcaer!cognos!roberts        3755 Riverside Drive 
                   or  ...nrcaer!uottawa!robs          Ottawa, Ontario
Voice: (613) 738-1440 - Tuesdays only (don't ask)      CANADA  K1G 3N3