markh@csd4.milw.wisc.edu (Mark William Hopkins) (12/23/88)
Presented below is an extended version of the algorithm presented in a previous article. In that article, it was mentioned that the pattern recognition algorithm presented there could only deal with exact matches. This algorithm corrects that flaw with an "optimization" that uses the lambda calculus. Consider the LISP example of the previous article. The input: (CONS (CAR a) (CONS (CAR a) (f (CDR a)))) is "parsed" into a graph consisting of string nodes: GRAPH: (0) CONS (1) (2) (1) CAR a (2) CONS (3) (4) (3) CAR a (4) f (5) (5) CDR a The algorithm presented previously failed to recognize the similarity between strings (0) and (2), simply because their last components differed. What it produced was the following graph: (0) CONS (1) (2) (1) CAR a (2) CONS (1) (4) (4) f (5) (5) CDR a The "optimization" will combine nodes (0) and (2), and also, (1) and (5) to produce the following: (0) *F[(2)] (2) *F[(4)] (4) f (5) (5) *G[CDR] where *F = lambda ?x. (CONS (1) ?x) where (1): *G[CAR] *G = lambda ?x. (?x a) These lambda abstractions are lambda abstractions of *entire graphs*. The bound variable ?x represents an arbitrary string component: either a lexical item or a pointer to another string node. The square brackets denotes lambda application, so that, for example, *G[CDR] reduces to (CDR a) and *G[CAR] reduces to (CAR a) The optimization illustrated here will form abstract patterns that consist of single parameters. It cannot recognize similarities of a higher order, such as that which would occur among strings (0) - (3) in this graph: INPUT: (f (f (f (f (CAR a)(CDR a))(CDR a))(f (CAR a)(CDR a)))(f (CAR a)(CDR a))) GRAPH: (0) f (1) (2) (1) f (3) (2) (2) f (4) (5) (3) f (2) (5) (4) CAR a (5) CDR a In fact, it will produce the following lambda abstractions: *F = lambda ?x. (f ?x (2)) ... to merge nodes (0) and (1) *G = lambda ?x. (f ?x (5)) ... to merge nodes (2) and (3) *H = lambda ?x. (?x a) ... to merge nodes (4) and (5) It fails to see that *F and *G can also be merged, thereby producing a pattern with two parameters: *F = *J[(2)] *G = *J[(5)] where *J = lambda ?y. lambda ?x. (f ?x ?y) to produce the graph: (0) *F[(1)] (1) *F[(3)] (2) *G[(4)] (3) *G[(2)] (4) *H[a] (5) *H[a] There is a different kind of "pattern unification" algorithm underlying these processes, which will be detailed in a subsequent article.