[comp.ai] A pattern recognition algorithm, II

markh@csd4.milw.wisc.edu (Mark William Hopkins) (12/23/88)
     Presented below is an extended version of the algorithm presented in
a previous article.  In that article, it was mentioned that the pattern
recognition algorithm presented there could only deal with exact matches.
This algorithm corrects that flaw with an "optimization" that uses the lambda 
calculus.

     Consider the LISP example of the previous article.  The input:

	 (CONS (CAR a) (CONS (CAR a) (f (CDR a))))

is "parsed" into a graph consisting of string nodes:

	       GRAPH:  (0) CONS (1) (2)
		       (1) CAR a
		       (2) CONS (3) (4)
		       (3) CAR a
		       (4) f (5)
		       (5) CDR a

The algorithm presented previously failed to recognize the similarity between
strings (0) and (2), simply because their last components differed.  What it
produced was the following graph:

		       (0) CONS (1) (2)
		       (1) CAR a
		       (2) CONS (1) (4)
		       (4) f (5)
		       (5) CDR a

     The "optimization" will combine nodes (0) and (2), and also, (1) and (5)
to produce the following:

		       (0) *F[(2)]
		       (2) *F[(4)]
                       (4) f (5)
		       (5) *G[CDR]
                       where
			  *F = lambda ?x. (CONS (1) ?x)
			       where (1): *G[CAR]
                          *G = lambda ?x. (?x a)

These lambda abstractions are lambda abstractions of *entire graphs*.  The
bound variable ?x represents an arbitrary string component: either a
lexical item or a pointer to another string node.  The square brackets
denotes lambda application, so that, for example, 

	  *G[CDR] reduces to (CDR a)
   and    *G[CAR] reduces to (CAR a)

     The optimization illustrated here will form abstract patterns that
consist of single parameters.  It cannot recognize similarities of a higher
order, such as that which would occur among strings (0) - (3) in this graph:

INPUT: 
(f (f (f (f (CAR a)(CDR a))(CDR a))(f (CAR a)(CDR a)))(f (CAR a)(CDR a)))
GRAPH:
	      (0) f (1) (2)
	      (1) f (3) (2)
	      (2) f (4) (5)
              (3) f (2) (5)
	      (4) CAR a
	      (5) CDR a

In fact, it will produce the following lambda abstractions:

	      *F = lambda ?x. (f ?x (2)) ... to merge nodes (0) and (1)
	      *G = lambda ?x. (f ?x (5)) ... to merge nodes (2) and (3)
              *H = lambda ?x. (?x a)     ... to merge nodes (4) and (5)

It fails to see that *F and *G can also be merged, thereby producing a
pattern with two parameters:

		    *F = *J[(2)]
		    *G = *J[(5)]
		    where
		       *J = lambda ?y. lambda ?x. (f ?x ?y)

to produce the graph:

	      (0) *F[(1)]
	      (1) *F[(3)]
	      (2) *G[(4)]
              (3) *G[(2)]
	      (4) *H[a]
	      (5) *H[a]

     There is a different kind of "pattern unification" algorithm underlying
these processes, which will be detailed in a subsequent article.