[comp.lang.prolog] PROLOG DIGEST V6 #18

restivo@POLYA.STANFORD.EDU (Chuck Restivo) (04/19/88)

Date: Fri  8 Apr 1988 19:29-PDT
>From: Chuck Restivo (The Moderator) <PROLOG-REQUEST@SUSHI.STANFORD.EDU>
Reply-to: PROLOG@SUSHI.STANFORD.EDU
US-Mail: P.O. Box 4584, Stanford CA  94305
Subject: PROLOG Digest   V6 #18
To: PROLOG@SUSHI.STANFORD.EDU


PROLOG Digest           Thursday, 14 Apr 1988      Volume 6 : Issue 18

Today's Topics:
                Implementation - Grammar Rules & Yacc
----------------------------------------------------------------------

Date: 25 Feb 88 04:53:57 GMT
>From: ed298-ak@violet.Berkeley.EDU  (Edouard Lagache)
Subject: Re: Prolog Grammar Rules (a question on interpreters)

I greatly admired Richard O'Keefe's use of grammer rules, and I have
almost convinced myself that I understand what is going on.  However,
my PROLOG interpreter doesn't understand - it crashes!  Now my
interpreter is very good at crashing so that doesn't mean much (yes I
know, I need a new interpreter, computer, production rule system
..........), but I was curious, has anyone else had problems running
Richard O'Keefe's examples.  I have a feeling that a number of PROLOG
systems may have taken Clocksin and Mellish too seriously as the
definitive characterization of the language.

-- Edouard Lagache

------------------------------

Date: 19 Feb 88 10:47:13 GMT
>From: quintus!ok@sun.com  (Richard A. O'Keefe)
Subject: Re: Prolog Grammar Rules

In article <489@ashton.UUCP>, dwiggins@ashtate (Don Dwiggins) writes:
> In the translation from DCG rules to clauses, O'Keefe has terminals being
> translated to goals of the form "connects(Terminal, Pos1, Pos2)", ...
> ... Actually, this predicate can be
> "folded into" the translator, so that no calls to connects/3 remain in the
> translated clauses (this is the translation presented by Clocksin & Mellish).

This is true, but you have to be *extremely* careful, and I'm afraid the
version in Clocksin & Mellish isn't.  The trouble is cuts.

Consider
        p(1) --> !, [a].
        p(_) --> [].
What happens if you call
        | ?- p(1, [], []).
The answer, in DEC-10 Prolog, C Prolog, Quintus Prolog, and some others,
is that, as you would expect by analogy with ordinary Prolog rules, it
fails.  If you use the Clocksin & Mellish translator, it succeeds.
To use terminology I introduced a while back:  the Clocksin & Mellish
translator will turn a steadfast predicate into one which is not steadfast.

This is the basic problem with macro-expansion in Prolog: you must take
care not to push bindings back over side-effective operations.

Don Dwiggins also suggests that it would sometimes be useful to put
the list/state arguments first so that they can be indexed on.

Recall that many Prolog implementations (DEC-10 Prolog included) only
index on principal functors, and that in any case indexing doesn't buy
you much if you have clauses with variables there as well.  Let's look
at an example from Michael McCord's parser.

        topic(Type, Topsubj, hold(Top), _, _, Qaux, [Top|Mods], Mods) -->
                (   nounphr(Top)
                ;   there(Top)
                ),
                topic1(Type, Topsubj, Top, Qaux).
        topic(Type, Topsubj, nil, C, X, Qaux, [Top|Mods], Mods) -->
                adverbial(C, X, Top),
                topic1(Type, Topsubj, Top, Qaux).
        topic(q, f, nil, _, _, pre(V),
              [syn([yesno],applyto(Y),yesno(Y),[])|Mods], Mods) -->
                [ V ],
                { finiteaux(V) }.

The list/state arguments for these rules would be
        S0, S
        S0, S
        [V|S1], S
respectively.  Indexing really isn't going to pay off much here.
This seems to be typical of natural language parsers: here's a
nonterminal from a grammar for (a fragment of) Maaori:

        after_interjection(-, -) --> !.
        after_interjection(koia, Mod) -->
                [ Mod ],
                { positional_particle(Mod) },
                !.
        after_interjection(_, anoo) -->
                [ anoo ].

Here the list/state arguments would be
        S0, S   { note that S0=S must be postponed to AFTER the cut! }
        [Mod|S1], S
        [anoo|S1], S
It happens that 'anoo' is not a positional particle, but the indexer
doesn't know that...

Since early arguments often carry agreement information, there is some
hope of indexing off them.  The conclusion is that wherever we put the
list/state arguments, we can't expect much help from indexing in
parsers for natural langauges, so we might as well keep those arguments
out of harm's way.

Now programming languages are a different story:  they tend to be rather
over-endowed with keywords.  Suppose you want to recognise statements in
C, and would like to exploit indexing.  You might start with this (where
I've suppressed irrelevant detail and a lot of rules):

        statement --> ['{'], rest_of_block.
        statement --> [if], rest_of_if.
        statement --> [return], rest_of_return.
        statement --> [goto], rest_of_goto.
        statement --> [id(X),:], statement.
        statement --> expression, [';'].
        statement --> [';'].

This isn't going to be indexed in most Prolog implementations.
Note that the indexer doesn't know what an expression can start with.
The list is rather long.  A program which computed FIRST sets of
definite clause grammars would be handy here...)  What can we do?
We can read ahead one token.

        statement --> expression, [;], !.
        statement --> [First], statement(First).

        statement('{')    --> rest_of_block.
        statement(if)     --> rest_of_if.
        statement(return) --> rest_of_return.
        statement(goto)   --> rest_of_goto.
        statement(id(X))  --> [:], statement.
        statement(';')    --> [].

Note that if you are parsing expressions, the cleanest way of handling
operator precedence involves consulting a table of operators, so once
again there really isn't anything to index on.

Given that there is a general convention in Prolog that "extra"
arguments are added on the right (for example,
        maplist(p(1), [a,b], [X,Y])
will call p(1,a,X) and then p(1,b,Y)) there doesn't seem to be
any compelling reason for grammar rules to be different.

------------------------------

Date: 18 Feb 88 01:00:46 GMT
>From: quintus!ok@unix.sri.com  (Richard A. O'Keefe)
Subject: Yacc in Prolog, Parsing and Dali

In article <3833@watvlsi.waterloo.edu>,
mccool@watvlsi.waterloo.edu (Michael McCool) writes:
> Does anyone know if there's anything like YACC written for Prolog?
There are several largish grammar-handling systems written in/for Prolog,
but all the ones I know of are intended for processing natural languages.
Since the "native" method of parsing in Prolog is recursive descent, the
techniques developed for LL(k) parser generators should be relevant.
I would be interested to hear of any such programs too.

>       My wish list is:
>       2) can accept a token-production function instead of a list
>               this is to save memory.  WHY does everybody assume
>               that the ENTIRE token list is in memory???
The underlying formalism assumes no such thing, and provided you do
not use explicit lists of terminals in your rules, you should be able
to use ordinary grammar rules without ever having a list.  For
example, suppose you have a table
        token(Index, TheToken)
where Index is an integer ranging from 1 to some N with no gaps.
Then you could define

        token(Token, I0, I) :-
                token(I0, Token),
                I is I0+1.

and you could write grammar rules like

        if_statement -->
                token(if),      % would normally be [if]
                expression,
                token(then),    % would normally be [then]
                statement,
                (   token(else) ->      % " " " [else]
                    statement
                |   []          % this doesn't imply lists either
                ).

This particular version would of course cost a LOT more memory than
a list, but an implicit representation might well cost very little
space.  So what?  If you are building any sort of parse tree (or
pushing attributes up and down even a virtual tree), the token list
is unlikely to be the major space consumer.

Historically, the main use of Prolog grammar rules has been for
processing natural language statements and Prolog clauses, both
of which tend to be short.  (Counting "id(" as a single token, the
clause above would have only 23 tokens in its list.)  The availability
of the tokens in SOME sort of data structure has a major beneficial
effect: it makes backtracking easy and cheap.  For example, DEC-10
Prolog syntax requires unbounded lookahead.

It's interesting that one of the nicest Algol compilers it has ever
been my pleasure to use (the Burroughs B6700 Algol compiler) built
an array of tokens.  This enabled them to do things like trying to
compile the subscripts of an array backwards, and then doing it in
the usual order if that didn't work.  {It was more economical to
keep a sequence of tokens and "backtrack" over that than it would
have been to build a parse tree and "backtrack" over that.}

>       3) has a simple way to handle syntax errors -->
>               a way to resynch, e.g. skip tokens until you see
>               a token.  This should be done in a way similar to
>               YACC's "error ;" so I don't have to fool around with
>               writing tail-recursive loops to scan for these kind of
>               constructs.
I wouldn't call YACC's method "simple": a combination of falling back
and skipping "until 3 tokens have been correctly parsed", with a special
hack for stopping early.  Even the System V Programmer's Guide calls it
"crude".  I've never managed to get it to do anything more complicated
than skip to the next semicolon.  Does anyone know what the BSD EYACC
does and how to use it?  The promise of better error recovery in YACC
has been tantalising me since 4.1BSD!

If anyone is interested in adding some sort of error-handling method
to Prolog-coded parsers, I heartily recommend chapter 6 of
        Syntax of Programming Languages -- Theory and Practice,
        by Roland C. Backhouse
        Prentice-Hall, 1979
        ISBN 0-13-879999-7
        UK 12.95 pounds
There are more recent papers by him.

There is a conflict between Prolog style and the demands of error
repair:  in order to do error repair you need to KNOW that there is
an error, but failure in a Prolog grammar rule usually means no more
than "oops, wrong guess, try again".  (This is one of the reasons why
read/1 gives such poor diagnostics:  while there are some local
errors which *must* be wrong--such as two variables with no
intervening operator--many errors involving operators cannot be
locally detected.)  It would be interesting to have a tool which
checked whether the context-free skeleton of a collection of Prolog
grammar rules defined an LL(1) language--program 3.2 in chapter 3 of
the reference above would be a good place to start.

>       4) precedence/associativity parser
>               for operations, I would like to give only one rule like
>                       expr(binary(Op,E1,E2)) -->
>                               $left,expr(E1),['+'],expr(E2),{Op = plus} |
>                               $left,expr(E1),['-'],expr(E2),{Op = minus} |
>                               $left,expr(E1),['*'],expr(E2),{Op = times} |
>                               $left,expr(E1),['/'],expr(E2),{Op = divide} |...
>               with "$left" indicating left associativity, and the order
>               indicating precedence.
I do hope that you don't put the vertical bars at the right like that;
they are very hard to see there.

The usual way of doing this in Prolog is to have a table such as

        operator(Token, Priority, Type, Op).

e.g.
        operator(+, 3, left,  plus).
        operator(-, 3, left,  minus).
        operator(*, 2, left,  times).
        operator(/, 2, left,  divide).
        operator(^, 1, right, power).

then one has rules such as

        expr(Expr) -->
                expr(9, Expr).

        expr(MaxLevel, Expr) -->
                primary(Expr0),
                rest_expr(MaxLevel, Expr0, Expr).

        rest_expr(MaxLevel, Expr0, Expr) -->
                [ Token ],
                { operator(Token, Priority, Type, Op) },
                { Priority < MaxLevel },
                !,
                { argument_priority(Type, Priority, ArgumentPriority) },
                expr(ArgumentPriority, Expr1),
                rest_expr(MaxLevel, binary(Op,Expr0,Expr1), Expr).
        rest_expr(_, Expr, Expr) -->
                /* no next token, not an operator, or wrong priority */
                [].

        argument_priority(left, X, X).
        argument_priority(right, X, Y) :- Y is X+1.

Testing this (with primary//1 recognising integers):
        | ?- expr(X, [2,^,3,^,4,*,5,*,6], []).
        X = binary(times,binary(times,binary(power,2,binary(power,3,4)),5),6)


The really nice thing about this is that it separates the operators
themselves out, making it crystal clear what they have in common
(the *shape* of the rule is the same for all the operators) and
where they are different (what they look like and their priority).
It also makes it easier to add new operators, even dynamically.
The fact that YACC forces you to write a separate rule for each
priority level (you can fold operators of the same priority together)
is one of the reasons why I don't use it.

------------------------------

End of PROLOG Digest
********************