[comp.lang.prolog] grammar for prolog

henks@nikhefk.UUCP (Henk Schouten) (10/03/89)

Hello netlanders,

I am looking for a description of the grammar of the Prolog language to be
used in building a syntax checker with Lex/Yacc. Since building the syntax
checker is a learning project for students any pointers to existing syntax
checkers for Prolog won't help.





+-----------------------------------------------------------------------------+
| Henk Schouten			Software Expertise Center                     |
|				Haagse Hogeschool - Intersector Informatica   |
|				Louis Couperusplein 1-19                      |
| henks@hhinsi@hp4nl		2419 AP Den Haag - The Netherlands            |
| henks@nikhef@hp4nl		tel: (31) 70 618419   fax: (31) 70 618599     |
+-----------------------------------------------------------------------------+

ok@cs.mu.oz.au (Richard O'Keefe) (10/04/89)

In article <585@nikhefk.UUCP>, henks@nikhefk.UUCP (Henk Schouten) writes:
> I am looking for a description of the grammar of the Prolog language to be
> used in building a syntax checker with Lex/Yacc. Since building the syntax
> checker is a learning project for students any pointers to existing syntax
> checkers for Prolog won't help.

(A) There is a public-domain tokeniser for Prolog and a public-domain
    parser for Prolog.  I wrote the tokeniser and adapted the parser,
    and posted both to the net in '84.  Both are Prolog source code.
    I know you say "any pointers to existing .. won't help", but the
    tokeniser and parser are about as concise a specification of Prolog
    syntax as you will get.

(B) However, Prolog is an extensible language.  The lexical structure
    is not extensible, so you could use Lex.  I have a table-driven
    tokeniser in C (I've been using it in an editor for years, and it
    was donated to Stony Brook Prolog), so I know you can do that part
    statically as Lex requires.  The extension mechanism of Prolog is
    that you can declare new operators, which can be prefix, infix,
    postfix, or any mix of them.  For example, after doing
	:- op(100, fx, a).
    the expression "a X" is legal, "a a X" is illegal, "a(X)" and "a = X"
    are legal and would have been without the declaration.
    Had the declaration been
	:- op(800, fy, a).
    the expression "a a X" would have been legal and "a = X" illegal.
    After the declarations
	:- op(100, fx, a), op(100, xf, a), op(100, xfx, a)
    the expressions "a X", "X a", and "X a X" are all legal.
    It is easy to find small sets of operator declarations which make
    Prolog not LR(k) for any fixed k, still less LR(1).

    Some extensible languages (such as Algol 68) can be hacked by having
    a fixed set of operator meta-tokens, PREFIX_1, ... INFIX_9 and the
    like, and having the tokeniser return those.  However, Prolog has
    1200 precedence levels, so that would be rather a strain for Yacc.

(C) The BSI/ISO committees have produced several different grammars for
    Prolog.  I believe the current version to be the one in a document
    ISO/IEC JTC1 SC22 WG17 N40, dated July 1989.  This one is pretty
    straightforward.  (Unlike the version which I criticsed so strongly
    in this newsgroup, the current version makes no attempt to describe
    the syntax of Prolog code.  It is frankly a syntax for TERMS, and as
    such nearly all of my objections no longer apply.)  This document
    should be available from NPL.

(D) Prolog is much more like Lisp than it is like Pascal.  It is
    possible to define a (non-LR(1)) grammar for Prolog *data*, but
    every Prolog procedure is made of Prolog data, and "telling the
    birds from the flowers" is not always possible.  Consider the fact
    that many Prolog systems have a macro-expansion predicate called
    term_expansion/2 which users can define at run-time.  For example,
    after defining

	:- op(10, fx, a).
	:- op(10, fx, an).

	term_expansion(X is a Y,  Fact) :- Fact =.. [Y,X].
	term_expansion(X is an Y, Fact) :- Fact =.. [Y,X].

    the clauses

	clyde is an elephant.
	lr(1) is a restriction.	

    behave to a Prolog system exactly as if they had been written

	elephant(clyde).
	restriction(lr(1)).

    For semantic analysis, it is the term which results from such
    user-defined macro-processing (this includes grammar rule expansion)
    that matters.  There can be no difference in meaning between

	X is (Y+1)/2

    and

	is(X, /(+(Y,1),2))

    because they are the SAME term, even though there is a syntactic
    difference.

    There are Prolog systems which do not offer term_expansion/2, and they
    can be very useful if they are otherwise excellent, but it is a
    definite restriction to leave this facility out.

(E) The bottom line is that Yacc is not able to handle the full syntax
    of Prolog.  You would have to restrict it by leaving out user-defined
    operators and user-defined macro-processing.  The result would still
    be a useful language.  But it would not be a very interesting one to
    use as a class project, because there is very little that a syntactic
    analysis can reject.  Consider:

	p :- 2.

    does not make sense as a Prolog rule, because 2 is not something that
    can be called.  (Although some Prolog systems use precisely this
    syntax to tie basic operations to special instructions.)  However,

	q :- nonvar(( p :- 2 )).

    is perfectly sensible.  The Prolog parser in my editor is, excluding
    the tokeniser, about 200 lines of C.  About a third of that is
    disambiguating operators.  That really doesn't sound as though it
    would make an interesting Yacc grammar.

he@stollco.UUCP (Helge Oldach) (10/10/89)

In article <2293@munnari.oz.au>, ok@cs.mu.oz.au (Richard O'Keefe) writes:
> In article <585@nikhefk.UUCP>, henks@nikhefk.UUCP (Henk Schouten) writes:
> > I am looking for a description of the grammar of the Prolog language to be
> > used in building a syntax checker with Lex/Yacc. 
> [...]
> (E) The bottom line is that Yacc is not able to handle the full syntax
>     of Prolog.  You would have to restrict it by leaving out user-defined
>     operators and user-defined macro-processing.  The result would still
>     be a useful language.  But it would not be a very interesting one to
>     use as a class project, because there is very little that a syntactic
>     analysis can reject.

I agree. My solution (some years ago) was to write an operator precedence
parser for Prolog. The syntax is very straightforward: you just have
operands, operators and brackets. And (this was my main motivation):
there was no harm implementing extensible sets of operators.

>     The Prolog parser in my editor is, excluding
>     the tokeniser, about 200 lines of C.  About a third of that is
>     disambiguating operators.  That really doesn't sound as though it
>     would make an interesting Yacc grammar.

I agree, again. My parser was (I hope, it still *is* :-)) very small, too.