[comp.lang.lisp] EQUAL on circular lists

kostelij@apolloway.prl.philips.nl (T. Kostelijk 43897) (06/05/91)

Subject: EQUAL on circular lists

The function EQUAL may not terminate when two non-eq trees of conses
are circular. For example:

(setf p         (cons 'a nil) 
      (cdr p)   p
      q         (cons 'a nil)
      (cdr q)   q)

Now both p and q look alike, but (equal p q) won't terminate.
The (standard) printed representation of both p and q is

 (a a a ..<etc>.. )
 

QUESTION:

Does anyone have an extended version of EQUAL which can determine in 
finite time whether two trees of conses have the same printed
(possibly infinite) representation?


TESTCASE:

As a simple testcase, both p and q are also "EQUAL" to r, which is defined by:

(setf r          (list a a a a)
      (cddddr r) (cdr r))


Other examples:    v and w are "EQUAL"

(setf v         '(a nil)
      (cadr v)  v
      w         '(a (a nil))
      (cdadr w) w)



FROM:

Ton Kostelijk, 
email adress: kostelij@apolloway.prl.philips.nl

fy@lucid.com (Frank Yellin) (06/08/91)

> QUESTION:
> 
> Does anyone have an extended version of EQUAL which can determine in 
> finite time whether two trees of conses have the same printed
> (possibly infinite) representation?

I once asked the same question.

Jim Boyce, formerly of Lucid, now at Oracle, pointed out to me that this
problem is pretty much the same as determining whether two deterministic
finite state automata are equivalent.

Here's a rough implementation.  I haven't tested it thoroughly, but it
should give you an approximate idea of what you need to do.

The idea is objects "a" and "b" are super-equal if 
    1) If either a or b isn't a cons, then (eql a b). [or whatevever choice of
       base comparisons you want to use.
    2) If both "a" and "b" are cons cells, then it must be the case that
       (super-equal (car a) (car b)) and (super-equal (cdr a) (cdr b)).

We keep a queue of all the pairs (one a cell from a, one a cell from b) that we
still need to look at.  In addition, we keep a list of all the pairs that we
have every looked at, so that we don't try to look at them again.

The how field in the code is so that when the code returns nil, you'll know
why the two differ.  It can easily be purged.

The code is based on an algorithm I just stole from "Mathematical Theory of
Computation" by Zohar Manna.


(defstruct pair first second how)

(defun eq-pair (a b)
  (and (eq (pair-first a) (pair-first b))
       (eq (pair-second a) (pair-second b))))

(defun super-equal (a b)
  (let* ((initial-pair (make-pair :first a :second b))
	 (queue (list initial-pair))
	 (everything-seen (list initial-pair)))
    (loop
      (when (null queue) (return 't)) ; nothing more to look at.  Most be okay!
      (let* ((pair (pop queue))		; an item to look at
	     (pair-first (pair-first pair))
	     (pair-second (pair-second pair))
	     (pair-how (pair-how pair)))
	;; pair-first and pair-second must be super-equal.
	(if (and (consp pair-first) (consp pair-second))
	    ;; the cons cell is super equal if both the car and cdr are
	    (let ((car-pair (make-pair :first (car pair-first)
				       :second (car pair-second)
				       :how (cons 'car pair-how)
				       ))
		  (cdr-pair (make-pair :first (cdr pair-first)
				       :second (cdr pair-second)
				       :how (cons 'cdr pair-how)
				       )))
	      ;; If we've already seen the car pair, then don't bother.
	      ;; Otherwise put it on the queue.  Also add it to the list of
	      ;; things seen.
	      (unless (member car-pair everything-seen :test 'eq-pair)
		(push car-pair everything-seen) (push car-pair queue))
	      ;;; Ditto for the cdr pair.
	      (unless (member cdr-pair everything-seen :test 'eq-pair)
		(push cdr-pair everything-seen) (push cdr-pair queue)))
	    ;; One of the items isn't a cons cell.  Just see if they're eql.
	    (unless (eq pair-first pair-second)
	      (return (values nil pair-how))))))))


Here are the test cases you asked about.

> (setf p         (cons 'a nil) 
      (cdr p)   p
      q         (cons 'a nil)
      (cdr q)   q)
#1=(A . #1#)
> (setf r          (list 'a 'a 'a 'a)
      (cddddr r) (cdr r))
#1=(A A A . #1#)
> (setf v         '(a nil)
      (cadr v)  v
      w         '(a (a nil))
      (cdadr w) w)
#1=(A (A . #1#))
> (super-equal p q)
T
> (super-equal p r)
T
> (super-equal q r)
T
> (super-equal v w)
NIL
(CDR CDR CAR CDR)
> (cdr (cdr (car (cdr v))))
NIL
> (cdr (cdr (car (cdr w))))
#1=((A A . #1#))
> 


-- Frank Yellin
   fy@lucid.com

gat@forsight.jpl.nasa.gov (Erann Gat) (06/08/91)

In article <2849@prles2.prl.philips.nl> kostelij@apolloway.prl.philips.nl (T. Kostelijk 43897) writes:
>Does anyone have an extended version of EQUAL which can determine in 
>finite time whether two trees of conses have the same printed
>(possibly infinite) representation?

Try:

(defun printed-representations-equal-p (l1 l2)
  (let ( (*print-circle* t) )
    (string= (format nil "~S" l1) (format nil "~S" l2))))

fy@lucid.com (Frank Yellin) (06/08/91)

A few comments on the super-equal algorithm I just sent.

1) Yes, I do know the difference between a stack and a queue and should
have used the term "stack" throughout.  The Finite State Automata algorithm
I emulated uses a queue.  Stacks are easier to implement in Lisp.  

It's not immediately obvious to me whether a depth-first search (from a
stack) or a breadth-first search (from a queue) will give a faster result.

Sorry for the name confusion.

2)  The FSA algorithm is attributed to E.F. Moore.  Don't know any more.

3) The algorithm must terminate, since the number of pairs you need to look
at is at most sizeof(a) * sizeof(b), where "size" is the number of cons
cells and atoms in the object.

fy@lucid.com (Frank Yellin) (06/11/91)

>> kostelij@apolloway.prl.philips.nl (T. Kostelijk 43897) writes:
>> Does anyone have an extended version of EQUAL which can determine in 
>> finite time whether two trees of conses have the same printed
>> (possibly infinite) representation?

> gat@forsight.jpl.nasa.gov (Erann Gat) replies:
> (defun printed-representations-equal-p (l1 l2)
>   (let ( (*print-circle* t) )
>    (string= (format nil "~S" l1) (format nil "~S" l2))))

I don't really think that the above code will work.  But the problem, as
originally stated, was rather vague.

For example, Kostelijk explicitly wanted the following two lists to be
equal:
    #1=(a a a . #1#)
    #1=(a . #1#)
since they both print out as (a a a a a a a a ...).  The function
#'printed-representations-equal-p will return nil, since with
*print-circle* bound to 't, one prints out as "#1=(a a a . #1#)" and the
other as "#1=(a . #1#)" 

Then again, my solution, posted a while ago, doesn't really solve what the
original poster asked for either.  

The two lists
    (#1=(a . #1) b)
    (#1=(a . #1) c)
both print out as ((a a a a a a .....)) since the printer never gets to the
second element of the list.  So reading Kostelijk's query literally, the
above two lists should be extended-equal.  Both my code and Gat's code will
return nil.  

Maybe I'm beginning to believe in program specification again.

-- Frank Yellin
   fy@lucid.com

kostelij@apolloway.prl.philips.nl (T. Kostelijk 43897) (06/13/91)

Subject: EQUAL on circular lists

> kostelij@apolloway.prl.philips.nl's message of 5 Jun 91 14:08:48 GMT

> QUESTION:
> 
> Does anyone have an extended version of EQUAL which can determine in 
> finite time whether two trees of conses have the same printed
> (possibly infinite) representation?
>

The reactions I received confirmed me that not only novice Lispers read rn, 
and that it is useful for exchanging information about common problems.

A short answer to reactions received on the above question, that might be
of general use.

Frank Yellin (uunet!stanford.edu!lucid.com!karoshi!fy) asked 
for a more formal specification. 


MORE FORMAL DEFINITION OF EQUAL-CIRCLE:

Consider the infinite set S of functions denoted as C{A|D}*R,
and without loss of generality, only consider (possibly circular) lists and symbols.
Any two objects a and b are defined EQUAL-CIRCLE iff the following holds:
  for every function f in S for which (f a) is 
    a. a symbol,  1. (f b) is a symbol and
                  2. (eql (f a) (f b)),
    b. a cons,    1. (f b) is a cons,
    c. not defined (an error), (f b) must be not defined.

Notes: 1. Ofcourse I know that only a few functions of S are usually implemented,
          but that is irrelevant for formal statements.
       2. The definition could be extended easily to other atoms than symbols.
       3. (to Frank) So now obviously the lists (#1=(a . #1) b) and (#1=(a . #1) c)
          are not EQUAL-CIRCLE, take f = CADR


DETERMINISTIC FINITE STATE AUTOMATA:

Frank also confirmed my own impression that the problem seems pretty
much the same as the problem of determining whether two deterministic
finite state automata are equivalent.
This problem has a worst case complexity of sizeof(a) * sizeof(b),
where 'size' is the number of cons cells and atoms in the object.

Perhaps someone has a proof of the FSA statement?


THE BEST ALGORTIHM:

The algorithm I like most was send in by Bernhard Pfahringer
(bernhard@ai-vie.uucp), from Vienna, by using the trick to make conses 
EQ temporarily, thus preventing EQUAL-CIRCLE from going into endless loops.
It has the following definition:

(defun EQUAL-CIRCLE (cons1 cons2)
  "Equal-circle determines equivalence between possibly circular trees"
  (or (eql cons1 cons2)
      (and (consp cons1)
	   (consp cons2)
	   (let ((f2 (first cons2))
		 (r2 (rest cons2)))
	     (unwind-protect
		  (progn (setf (first cons2) (first cons1) ; trick
			       (rest cons2) (rest cons1))      
			 (and (equal-circle (first cons1) f2)
			      (equal-circle (rest cons1) r2)))
	       (setf (first cons2) f2                      ; untrick
		     (rest cons2) r2))))))

It works quite efficiently for usual cases and is not hard to understand.     
The runtime can be reduced up to 50% by leaving the UNWIND-PROTECT out,
but then the process should not be interrupted for data consistency.

I want to thank for all reactions, especially from Bernhard and Frank.


FROM:

Ton Kostelijk, 
email adress: kostelij@apolloway.prl.philips.nl

ceb@csli.Stanford.EDU (Charles Buckley) (06/16/91)

In article <FY.91Jun11102247@hardwick.lucid.com> fy@lucid.com (Frank Yellin) writes:

   From: fy@lucid.com (Frank Yellin)
   Newsgroups: comp.lang.lisp

   Then again, my solution, posted a while ago, doesn't really solve what the
   original poster asked for either. . . . Maybe I'm beginning to
   believe in program specification again.

Gee, isn't solving this problem about as much work as
reference-counting (as in for garbage collection)?