ok@quintus (07/27/88)
In his recent posting about the Oxford Prolog Library, Jocelyn Paine said that one of the components was a grammar rule translator. The description says 'The translator is essentially the same as that published in "Programming in Prolog", by Clocksin & Mellish'. The grammar rule translator which appeared in the 1st and 2nd editions of Clocksin & Mellish had the following flaws: -- there was a variable name spelling mistake in one of the clauses, which broke the translation of rules with pushback (fixed in the 3rd edition, I think); -- the translator did not expand (If->Then;Else)s (still not handled in the 3rd edition); -- the translator did not use the 'C'/3 expansion, but tried to do the terminal matching in the translator, which meant that code with cuts and ``meta-logical'' operations did not work as expected (still a problem in the 3rd edition); -- the translator does nothing useful with variables appearing as non-terminals (in the 3rd edition it goes into a loop); -- the definition of phrase/2 is wrong--it only accepts nonterminals but should accept grammar rule bodies (i.e. phrase((p,q), S) will call ','(p,q,S,[]) when it should call (p(S,S1),q(S1,[]))) (still the case in the 3rd edition). This is reasonable because Clocksin & Mellish do not purport to provide a specification of the Prolog language, only instruction in how to use it. Their grammar rule translator is presented only as the answer to an exercise. The lack of a reliable specification means that some vendors have left grammar rules out entirely, and others have botched the job. Others have added "features". For example, ALS decided to make calls to certain built-in predicates are quietly translated as themselves. The trouble with this is that this is done _quietly_ (so you never know when it has happened without looking at the code in the data base) and that there is no apparent pattern to the choice of which builtins are left alone and which are changed (so that you, or at any rate I, can never recall which are which). In the absence of an agreed standard, however, nobody can say that this was wrong. There is a group of people who purport to be developing a Prolog standard at the taxpayer's expense. You would expect that group to have addressed the grammar rule translation issue, ideally by distributing a draft specification in the form of usable Prolog code. The BSI Prolog substandard committee, however, in their great wisdom, have preferred to do other things. There's only one thing to do, then, and that's to publish source code. The code which follows in this message was rewritten from scratch for this purpose. The purpose of this code is to serve as a workable specification of phrase/3 phrase/2 'C'/3 and the translation from Lhs-->Rhs to Head:-Body form. Error checking and reporting has quite deliberately been left out, and a couple of cases have been over-generalised to accomodate this. I have supplied the dcg rule translation as a separate predicate 'dcg rule'/2 to decouple the question of what DCG rules are translated to from the question of how DCG rule translation is integrated into the rest of a Prolog system; I am not proposing that 'dcg rule'/2 should be directly available under that name. phrase/[2,3] and 'C'/3 _should_ be directly available. There may be mistakes in the code that follows. There may be things it does that it shouldn't do. There may be things it doesn't do that you think it should. Let's hear about them. There are only 66 lines of Prolog code; if we can't get something this size satisfactory, we'll _deserve_ the BSI substandard. % File : DCG.PL % Author : Richard A. O'Keefe % Updated: Tuesday July 26th, 1988. % Purpose: Definite Clause Grammar rule to Prolog clause translator. /* This file is written in the ISO 8859/1 character set. The "Copyright" line contains after the right parenthesis a character which when transmitted was character 169, the international copyright symbol. Copyright (C)) 1988, Quintus Computer Systems, Inc. This file is distributed in the hope that it will be useful, but without any warrantee. You are expressly warned that it is meant to serve as a model, not for direct use: all error checking and reporting has been omitted, and mistakes are almost surely present. Permission is granted to anyone to distribute verbatim copies of this source code as received, in any medium, provided that the copyright notice, the nonwarrantee warning, and this permission notice are preserved. Permission is granted to distribute modified versions of this source code, or of portions of it, under the above conditions, plus the conditions that all changed files carry prominent notices stating who last changed them and that the derived material is subject to this same permission notice. Permission is granted to include this material in products for which money is charged, provided that the customer is given written notice that the code is (or is derived from) material provided by Quintus Computer Systems, Inc., and that the customer is given this source code on request. ---------------------------------------------------------------- Now that we've got that (adapted from the GNU copyright notice) out of the way, here are the technical comments. The predicates are all named 'dcg <something>'/<some arity> in order to keep out of the way, with the exception of phrase/2 and phrase/3 which bear their proper names. Only phrase/[2,3] and 'dcg rule'/2 are meant to be called directly, and 'dcg rule'/2 is meant to be called from expand_term/2. You need to keep 'dcg body'/4 and its dependents around at run time so that variables as nonterminals in DCG rule bodies will work correctly. So that Quintus have _something_ left to sell, this code has been rewritten from scratch with no error checking or reporting code at all, and a couple of places accept general grammar rule bodies where they are really supposed to demand lists of terminals. However, any rule which is valid according to the Quintus Prolog manual will be translated correctly, except that this code makes no attempt to handle module: prefixes. (The change is trivial.) Note that 'dcg rule'/2 and phrase/[2,3] are steadfast. */ % dcg rule(+Grammar Rule, -Equivalent Clause) 'dcg rule'(-->(Head0,Body0), Clause) :- 'dcg head'(Head0, Head, PushBack, S0, S), 'dcg body'(Body0, Body1, S0, S), 'dcg conj'(Body1, PushBack, Body), Clause = :-(Head,Body). % dcg head(+Head0, -Head, -PushBack, -S0, -S) % recognises both % NonTerminal, [PushBackList] --> % and % NonTerminal --> % It returns the difference pair S0\S which the body is to parse. % To avoid error checking, it will accept an arbitrary body in place % of a pushback list, but it should demand a proper list. 'dcg head'((Head0,PushBack0), Head, PushBack, S0, S1) :- !, 'dcg goal'(Head0, Head, S0, S), 'dcg body'(PushBack0, PushBack, S, S1). 'dcg head'(Head0, Head, true, S0, S) :- 'dcg goal'(Head0, Head, S0, S). % dcg goal(+Goal0, -Goal, +S0, +S) % adds the arguments S0, S at the end of Goal0, giving Goal. % It should check that Goal0 is a callable term. 'dcg goal'(Goal0, Goal, S0, S) :- functor(Goal0, F, N), N1 is N+1, N2 is N+2, functor(Goal, F, N2), arg(N2, Goal, S), arg(N1, Goal, S0), 'dcg args'(N, Goal0, Goal). % dcg args(+N, +Goal0, +Goal) % copies the first N arguments of Goal0 to Goal. 'dcg args'(N, Goal0, Goal) :- ( N =:= 0 -> true ; arg(N, Goal0, Arg), arg(N, Goal, Arg), M is N-1, 'dcg args'(M, Goal0, Goal) ). % dcg body(+Body0, -Body, +S0, +S) % translates Body0 to Body, adding arguments as needed to parse S0\S. % It should complain about bodies (such as 2) which are not callable % terms, and about lists of terminals which are not proper lists. % To avoid error checking, [a|foo] is accepted as [a],foo, but it % really should complain. ONLY the forms lists here should be treated; % other non-terminals which look like calls to built-ins could well be % commented on (no error reporting here) but should be expanded even % so. Thus X=Y as a nonterminal is to be rewritten as =(X,Y,S0,S), % perhaps with a warning. If you want the translation X=Y, use {X=Y}. 'dcg body'(Var, Body, S0, S) :- var(Var), !, Body = phrase(Var,S0,S). 'dcg body'((A0,B0), Body, S0, S) :- !, 'dcg body'(A0, A, S0, S1), 'dcg body'(B0, B, S1, S), 'dcg conj'(A, B, Body). 'dcg body'(->(A0,B0), ->(A,B), S0, S) :- !, 'dcg body'(A0, A, S0, S1), 'dcg body'(B0, B, S1, S). 'dcg body'(;(A0,B0), ;(A,B), S0, S) :- !, 'dcg disj'(A0, A, S0, S), 'dcg disj'(B0, B, S0, S). 'dcg body'({}(A), A, S, S) :- !. 'dcg body'(!, !, S, S) :- !. 'dcg body'([], true, S, S) :- !. 'dcg body'([H0|T0], Body, S0, S) :- !, 'dcg term'(H0, H, S0, S1), 'dcg body'(T0, T, S1, S), 'dcg conj'(H, T, Body). 'dcg body'(NT0, NT, S0, S) :- 'dcg goal'(NT0, NT, S0, S). % dcg term(+T0, -T, +S0, +S) % generates code (T) which succeeds when there is a terminal T0 % between S0 and S. This version uses the DEC-10 Prolog predicate % 'C'/3 for compatibility with DEC-10 Prolog, C Prolog, Quintus Prolog. % This is the only place that knows how terminals are translated, so % you could supply instead the definition % 'dcg term'(T0, S0=[T0|S], S0, S). % and reap the same benefits. The one thing you must not do is % NO! 'dcg term'(T0, true, [T0|S], S). DON'T DO THAT! 'dcg term'(T0, 'C'(S0,T0,S), S0, S). % To see why dcg disj/4 is needed, consider the translation of % ( [] | [a] ). We have to insert S1=S0 somewhere, but we do it at % "compile"-time if we can. 'dcg disj'(Body0, Body, S0, S) :- 'dcg body'(Body0, Body1, S1, S), ( S1 == S -> 'dcg conj'(S1=S0, Body1, Body) ; S1 = S0, Body = Body1 ). % dcg conj(+A, +B, -C) % combines two conjunctions A, B, giving C. Basically, we want to % ensure that there aren't any excess 'true's kicking around (in a % compiled system, that shouldn't matter). There is room for some % leeway here: I have chosen to flatten A completely. 'dcg conj'(A, true, A) :- !. 'dcg conj'(A, B, C) :- 'dcg CONJ'(A, B, C). 'dcg CONJ'(true, C, C) :- !. 'dcg CONJ'((A,As), C0, (A,C)) :- !, 'dcg CONJ'(As, C0, C). 'dcg CONJ'(A, C, (A,C)). % 'C'(S0, T, S) % is true when the terminal T "Connects" the "string positions" S0 and S. 'C'([T|S], T, S). % phrase(+NT0, ?S0) % is true when the list S0 is in the language defined by the % grammar rule body NT0. E.g. phrase(([a],[b]), [a,b]). phrase(NT0, S0) :- phrase(NT0, S0, []). % phrase(+NT0, ?S0, ?S) % is true when the list S0\S is in the language defined by the % grammar rule body NT0. E.g. phrase(([a],[b]), [a,b|X], X). phrase(NT0, S0, S) :- 'dcg body'(NT0, NT, T0, T), T0 = S0, T = S, call(NT).
evan@csli.STANFORD.EDU (Evan Kirshenbaum) (07/28/88)
In article <196@quintus.UUCP> ok@quintus () writes: >-- the translator did not use the 'C'/3 expansion, but tried to do > the terminal matching in the translator, which meant that code with > cuts and ``meta-logical'' operations did not work as expected > (still a problem in the 3rd edition); A couple of years ago, I was working on a project which involved a lot of dcg's. The prolog I was working on (a DEC-10 prolog) used 'C'/3 for terminals, and I was going crazy debugging the parser (some of the nonterms had twenty or so rules, almost all of them distinguishable by the first terminal, which I had to step through each time). I found that the prolog on another DEC-20 (an earlier version of DEC-10 prolog) did what I considered "the right thing" and folded terminals into the head. However one of my parsers, which had rules like def(pure_funct(F))-->[id(pure)],!, [id(function)],funct_id(F). no longer worked (it still worked when run under the other system). What this dcg-translator was doing, of course was folding the terminals even over a cut, so this was translated to def(pure_funct(F),[id(pure),id(function)|S0],S):-!, funct_id(F,S0,S). (The string "pure procedure" should match this clause and fail. It didn't match and so matched a default case and did the wrong thing.) I figured there had to be a middle ground. I wrote my own translator which runs rougly as follows (this is from memory): 'dcg body'(V,B,S0,S):- var(V),!, B = phrase(V,S0,S). 'dcg body'((A0,B0),(A,B),S0,S):-!, 'dcg body'(A0,A,S0,S1), 'dcg body'(B0,B,S1,S). 'dcg body'([],true,S,S):-!. 'dcg body'([T|Ts],B,[T|S0],S):-!, 'dcg body'(Ts,B,S0,S). 'dcg body'(!,(!,S0=S),S0,S):-!. 'dcg body'({A},(A,S0=S),S0,S):-!. 'dcg body'(T,B,S0,S):- T=..L0, append(L0,[S0,S],L1), B=..L1. (roughly; it pulled out true's and used functor rather than =.., but this is the gist). This allows the terminal folding into the head, but treats cut (and anything which can contain cut) as explicitly blocking such folding. With this translator, the above code becomes def(pure_funct(F),[id(pure)|S0],S):- !,S0=[id(function)|S1], funct_id(F,S1,S). which seems closer to what is wanted. This scheme has worked quite well. I'm kind of curious as to whether there's a good reason not to do it this way. --- Evan Kirshenbaum Stanford University evan@csli.Stanford.EDU ...!ucbvax!csli.stanford.edu!evan If you think my opinions represent this university, you haven't been on campus recently!
ok@quintus.uucp (Richard A. O'Keefe) (07/28/88)
In article <4751@csli.STANFORD.EDU> evan@csli.UUCP (Evan Kirshenbaum) writes: >In article <196@quintus.UUCP> ok@quintus () writes: >>-- the translator did not use the 'C'/3 expansion, but tried to do >> the terminal matching in the translator, which meant that code with >> cuts and ``meta-logical'' operations did not work as expected >> (still a problem in the 3rd edition); >A couple of years ago, I was working on a project which involved a >lot of dcg's. ... >I figured there had to be a middle ground. I wrote my own translator >which runs roughly as follows (this is from memory): ... > 'dcg body'([],true,S,S):-!. > 'dcg body'([T|Ts],B,[T|S0],S):-!, > 'dcg body'(Ts,B,S0,S). > 'dcg body'(!,(!,S0=S),S0,S):-!. ... >This allows the terminal folding into the head, >but treats cut (and anything which can contain cut) as explicitly >blocking such folding. ... >This scheme has worked quite well. I'm kind of curious as to whether >there's a good reason not to do it this way. When compiled with the current Quintus Prolog compiler, p(..., S0, S) :- 'C'(S0, X, S1), ... and p(..., [X|S1], S) :- ... run at the same speed. I do not know whether this was the case in DEC-10 Prolog. Anyrate, moving the terminals into the head should not be neceesary for efficiency. Kirshenbaum's problem wasn't really with the translation as such, but with what the debugger showed him. Speaking for myself, I _like_ seeing the calls to 'C'/3, and wish I could set a spy-point on it. There is a small set of built-in operations (such as ','/2) which the debugger does not display: if the debugger would let Kirshenbaum say "don't show me 'C'/3" he would have got pretty much the behaviour that he wanted. The trouble with Kirshenbaum's translator is that cut is not the only thing which can go wrong. Consider a stupid example: p --> {var(X)}, [X]. His translator will turn that into p(S0, S) :- var(X), S0=[X|S]. which is correct. But now suppose it is p --> var(X), [X]. var(X) --> {var(X)}. which his translator will turn into p([X|S0], S) :- var(X, S0, S). var(X, S0, S) :- var(X), S0 = S. This is not a correct translation. It is only safe to move the unifications back over pure predicates (and pure nonterminals). I think that the grammar rule translator is not the place for optimisations like unfolding 'C'/3 and pushing it back into the head, but that transformations like that belong in some other tool which can apply them to all sorts of clauses, not just former grammar rules. NU Prolog :- pure declarations would be important for such a tool.
lee@mulga.oz (Lee Naish) (07/29/88)
In article <202@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >When compiled with the current Quintus Prolog compiler, > p(..., S0, S) :- 'C'(S0, X, S1), ... >and > p(..., [X|S1], S) :- ... >run at the same speed. That is because (correct me if I am wrong) Quintus currently indexes only on the top level functor of the first argument. This has several consequences: 1) there is no point is pushing back 'C' 2) the most naturally written grammars are not efficient 3) grammars are rewritten so that simple indexing is effective p --> "+"... p --> "-"... is rewritten p --> [X], p1(X). p1(0'+) --> ... p1(0'-) --> ... 4) it is best to put S0 (and S) after other arguments In NU-Prolog 'C' is pushed back (when its definitely safe to do so) and it is possible to do indexing on subterms of any argument: ?- p(..., [X|S1], S) when X. will cause indexing on X (it also delays calls to p until X is instantiated, so the grammar cannot be used as a generator). It would seem like a good idea to do that kind of indexing on DCG rules by default. Anyway, the fact that most Prolog systems currently don't have good indexing and some people write INELEGANT grammars (:-) is not a good reason for a standard DGC translator to convert decent grammars into code which decent compilers can't take advantage of, though, if you can understand this sentence without backtracking, you probably don't need the advantage of simpler grammars so you can write grammars which don't backtrack using Quintus Prolog DGCs. lee
ok@quintus.uucp (Richard A. O'Keefe) (07/30/88)
In article <2924@mulga.oz> lee@mulga.UUCP (Lee Naish) writes: >In article <202@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >>When compiled with the current Quintus Prolog compiler, >> p(..., S0, S) :- 'C'(S0, X, S1), ... >>and >> p(..., [X|S1], S) :- ... >>run at the same speed. > >That is because (correct me if I am wrong) Quintus currently indexes >only on the top level functor of the first argument. Lee Naish is saying 1) Quintus Prolog currently indexes on the principal functor of the first argument (this is true) 2) My statement above is true (which is so) 3) 1) explains 2). This is absolutely wrong. The reason that the two fragments I sketched run at the same speed is that they turn into identical code, and will still turn into identical code if/when better indexing is provided. This applies to *everything* that uses 'C'/3, not just things that happen to use it in the head of a grammar rule. That was my point: the grammar rule translator should not know how smart the compiler is, neat hacks for speeding things up belong in optimising tools that apply to *every* sort of code.
jonasn@ttds.UUCP (Jonas Nygren) (08/18/88)
In article <202@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >In article <4751@csli.STANFORD.EDU> evan@csli.UUCP (Evan Kirshenbaum) writes: >When compiled with the current Quintus Prolog compiler, > p(..., S0, S) :- 'C'(S0, X, S1), ... >and > p(..., [X|S1], S) :- ... >run at the same speed. I do not know whether this was the case in >DEC-10 Prolog. Anyrate, moving the terminals into the head should >not be neceesary for efficiency. > I tried Kirshenbaum's translator with a lexical analyzer for C (sans preprocessor) and compared with AAIS DCG translator which uses 'C'/3. The time for breaking sieve.c into tokens where as follows: AAIS DCG 17.0 s Kirshenbaum's translator10.6 s which amounts to a 38% speedup which seems worthwhile to me. Jonas Nygren