restivo@POLYA.STANFORD.EDU (Chuck Restivo) (04/19/88)
Date: Fri 8 Apr 1988 19:29-PDT >From: Chuck Restivo (The Moderator) <PROLOG-REQUEST@SUSHI.STANFORD.EDU> Reply-to: PROLOG@SUSHI.STANFORD.EDU US-Mail: P.O. Box 4584, Stanford CA 94305 Subject: PROLOG Digest V6 #18 To: PROLOG@SUSHI.STANFORD.EDU PROLOG Digest Thursday, 14 Apr 1988 Volume 6 : Issue 18 Today's Topics: Implementation - Grammar Rules & Yacc ---------------------------------------------------------------------- Date: 25 Feb 88 04:53:57 GMT >From: ed298-ak@violet.Berkeley.EDU (Edouard Lagache) Subject: Re: Prolog Grammar Rules (a question on interpreters) I greatly admired Richard O'Keefe's use of grammer rules, and I have almost convinced myself that I understand what is going on. However, my PROLOG interpreter doesn't understand - it crashes! Now my interpreter is very good at crashing so that doesn't mean much (yes I know, I need a new interpreter, computer, production rule system ..........), but I was curious, has anyone else had problems running Richard O'Keefe's examples. I have a feeling that a number of PROLOG systems may have taken Clocksin and Mellish too seriously as the definitive characterization of the language. -- Edouard Lagache ------------------------------ Date: 19 Feb 88 10:47:13 GMT >From: quintus!ok@sun.com (Richard A. O'Keefe) Subject: Re: Prolog Grammar Rules In article <489@ashton.UUCP>, dwiggins@ashtate (Don Dwiggins) writes: > In the translation from DCG rules to clauses, O'Keefe has terminals being > translated to goals of the form "connects(Terminal, Pos1, Pos2)", ... > ... Actually, this predicate can be > "folded into" the translator, so that no calls to connects/3 remain in the > translated clauses (this is the translation presented by Clocksin & Mellish). This is true, but you have to be *extremely* careful, and I'm afraid the version in Clocksin & Mellish isn't. The trouble is cuts. Consider p(1) --> !, [a]. p(_) --> []. What happens if you call | ?- p(1, [], []). The answer, in DEC-10 Prolog, C Prolog, Quintus Prolog, and some others, is that, as you would expect by analogy with ordinary Prolog rules, it fails. If you use the Clocksin & Mellish translator, it succeeds. To use terminology I introduced a while back: the Clocksin & Mellish translator will turn a steadfast predicate into one which is not steadfast. This is the basic problem with macro-expansion in Prolog: you must take care not to push bindings back over side-effective operations. Don Dwiggins also suggests that it would sometimes be useful to put the list/state arguments first so that they can be indexed on. Recall that many Prolog implementations (DEC-10 Prolog included) only index on principal functors, and that in any case indexing doesn't buy you much if you have clauses with variables there as well. Let's look at an example from Michael McCord's parser. topic(Type, Topsubj, hold(Top), _, _, Qaux, [Top|Mods], Mods) --> ( nounphr(Top) ; there(Top) ), topic1(Type, Topsubj, Top, Qaux). topic(Type, Topsubj, nil, C, X, Qaux, [Top|Mods], Mods) --> adverbial(C, X, Top), topic1(Type, Topsubj, Top, Qaux). topic(q, f, nil, _, _, pre(V), [syn([yesno],applyto(Y),yesno(Y),[])|Mods], Mods) --> [ V ], { finiteaux(V) }. The list/state arguments for these rules would be S0, S S0, S [V|S1], S respectively. Indexing really isn't going to pay off much here. This seems to be typical of natural language parsers: here's a nonterminal from a grammar for (a fragment of) Maaori: after_interjection(-, -) --> !. after_interjection(koia, Mod) --> [ Mod ], { positional_particle(Mod) }, !. after_interjection(_, anoo) --> [ anoo ]. Here the list/state arguments would be S0, S { note that S0=S must be postponed to AFTER the cut! } [Mod|S1], S [anoo|S1], S It happens that 'anoo' is not a positional particle, but the indexer doesn't know that... Since early arguments often carry agreement information, there is some hope of indexing off them. The conclusion is that wherever we put the list/state arguments, we can't expect much help from indexing in parsers for natural langauges, so we might as well keep those arguments out of harm's way. Now programming languages are a different story: they tend to be rather over-endowed with keywords. Suppose you want to recognise statements in C, and would like to exploit indexing. You might start with this (where I've suppressed irrelevant detail and a lot of rules): statement --> ['{'], rest_of_block. statement --> [if], rest_of_if. statement --> [return], rest_of_return. statement --> [goto], rest_of_goto. statement --> [id(X),:], statement. statement --> expression, [';']. statement --> [';']. This isn't going to be indexed in most Prolog implementations. Note that the indexer doesn't know what an expression can start with. The list is rather long. A program which computed FIRST sets of definite clause grammars would be handy here...) What can we do? We can read ahead one token. statement --> expression, [;], !. statement --> [First], statement(First). statement('{') --> rest_of_block. statement(if) --> rest_of_if. statement(return) --> rest_of_return. statement(goto) --> rest_of_goto. statement(id(X)) --> [:], statement. statement(';') --> []. Note that if you are parsing expressions, the cleanest way of handling operator precedence involves consulting a table of operators, so once again there really isn't anything to index on. Given that there is a general convention in Prolog that "extra" arguments are added on the right (for example, maplist(p(1), [a,b], [X,Y]) will call p(1,a,X) and then p(1,b,Y)) there doesn't seem to be any compelling reason for grammar rules to be different. ------------------------------ Date: 18 Feb 88 01:00:46 GMT >From: quintus!ok@unix.sri.com (Richard A. O'Keefe) Subject: Yacc in Prolog, Parsing and Dali In article <3833@watvlsi.waterloo.edu>, mccool@watvlsi.waterloo.edu (Michael McCool) writes: > Does anyone know if there's anything like YACC written for Prolog? There are several largish grammar-handling systems written in/for Prolog, but all the ones I know of are intended for processing natural languages. Since the "native" method of parsing in Prolog is recursive descent, the techniques developed for LL(k) parser generators should be relevant. I would be interested to hear of any such programs too. > My wish list is: > 2) can accept a token-production function instead of a list > this is to save memory. WHY does everybody assume > that the ENTIRE token list is in memory??? The underlying formalism assumes no such thing, and provided you do not use explicit lists of terminals in your rules, you should be able to use ordinary grammar rules without ever having a list. For example, suppose you have a table token(Index, TheToken) where Index is an integer ranging from 1 to some N with no gaps. Then you could define token(Token, I0, I) :- token(I0, Token), I is I0+1. and you could write grammar rules like if_statement --> token(if), % would normally be [if] expression, token(then), % would normally be [then] statement, ( token(else) -> % " " " [else] statement | [] % this doesn't imply lists either ). This particular version would of course cost a LOT more memory than a list, but an implicit representation might well cost very little space. So what? If you are building any sort of parse tree (or pushing attributes up and down even a virtual tree), the token list is unlikely to be the major space consumer. Historically, the main use of Prolog grammar rules has been for processing natural language statements and Prolog clauses, both of which tend to be short. (Counting "id(" as a single token, the clause above would have only 23 tokens in its list.) The availability of the tokens in SOME sort of data structure has a major beneficial effect: it makes backtracking easy and cheap. For example, DEC-10 Prolog syntax requires unbounded lookahead. It's interesting that one of the nicest Algol compilers it has ever been my pleasure to use (the Burroughs B6700 Algol compiler) built an array of tokens. This enabled them to do things like trying to compile the subscripts of an array backwards, and then doing it in the usual order if that didn't work. {It was more economical to keep a sequence of tokens and "backtrack" over that than it would have been to build a parse tree and "backtrack" over that.} > 3) has a simple way to handle syntax errors --> > a way to resynch, e.g. skip tokens until you see > a token. This should be done in a way similar to > YACC's "error ;" so I don't have to fool around with > writing tail-recursive loops to scan for these kind of > constructs. I wouldn't call YACC's method "simple": a combination of falling back and skipping "until 3 tokens have been correctly parsed", with a special hack for stopping early. Even the System V Programmer's Guide calls it "crude". I've never managed to get it to do anything more complicated than skip to the next semicolon. Does anyone know what the BSD EYACC does and how to use it? The promise of better error recovery in YACC has been tantalising me since 4.1BSD! If anyone is interested in adding some sort of error-handling method to Prolog-coded parsers, I heartily recommend chapter 6 of Syntax of Programming Languages -- Theory and Practice, by Roland C. Backhouse Prentice-Hall, 1979 ISBN 0-13-879999-7 UK 12.95 pounds There are more recent papers by him. There is a conflict between Prolog style and the demands of error repair: in order to do error repair you need to KNOW that there is an error, but failure in a Prolog grammar rule usually means no more than "oops, wrong guess, try again". (This is one of the reasons why read/1 gives such poor diagnostics: while there are some local errors which *must* be wrong--such as two variables with no intervening operator--many errors involving operators cannot be locally detected.) It would be interesting to have a tool which checked whether the context-free skeleton of a collection of Prolog grammar rules defined an LL(1) language--program 3.2 in chapter 3 of the reference above would be a good place to start. > 4) precedence/associativity parser > for operations, I would like to give only one rule like > expr(binary(Op,E1,E2)) --> > $left,expr(E1),['+'],expr(E2),{Op = plus} | > $left,expr(E1),['-'],expr(E2),{Op = minus} | > $left,expr(E1),['*'],expr(E2),{Op = times} | > $left,expr(E1),['/'],expr(E2),{Op = divide} |... > with "$left" indicating left associativity, and the order > indicating precedence. I do hope that you don't put the vertical bars at the right like that; they are very hard to see there. The usual way of doing this in Prolog is to have a table such as operator(Token, Priority, Type, Op). e.g. operator(+, 3, left, plus). operator(-, 3, left, minus). operator(*, 2, left, times). operator(/, 2, left, divide). operator(^, 1, right, power). then one has rules such as expr(Expr) --> expr(9, Expr). expr(MaxLevel, Expr) --> primary(Expr0), rest_expr(MaxLevel, Expr0, Expr). rest_expr(MaxLevel, Expr0, Expr) --> [ Token ], { operator(Token, Priority, Type, Op) }, { Priority < MaxLevel }, !, { argument_priority(Type, Priority, ArgumentPriority) }, expr(ArgumentPriority, Expr1), rest_expr(MaxLevel, binary(Op,Expr0,Expr1), Expr). rest_expr(_, Expr, Expr) --> /* no next token, not an operator, or wrong priority */ []. argument_priority(left, X, X). argument_priority(right, X, Y) :- Y is X+1. Testing this (with primary//1 recognising integers): | ?- expr(X, [2,^,3,^,4,*,5,*,6], []). X = binary(times,binary(times,binary(power,2,binary(power,3,4)),5),6) The really nice thing about this is that it separates the operators themselves out, making it crystal clear what they have in common (the *shape* of the rule is the same for all the operators) and where they are different (what they look like and their priority). It also makes it easier to add new operators, even dynamically. The fact that YACC forces you to write a separate rule for each priority level (you can fold operators of the same priority together) is one of the reasons why I don't use it. ------------------------------ End of PROLOG Digest ********************