henks@nikhefk.UUCP (Henk Schouten) (10/03/89)
Hello netlanders, I am looking for a description of the grammar of the Prolog language to be used in building a syntax checker with Lex/Yacc. Since building the syntax checker is a learning project for students any pointers to existing syntax checkers for Prolog won't help. +-----------------------------------------------------------------------------+ | Henk Schouten Software Expertise Center | | Haagse Hogeschool - Intersector Informatica | | Louis Couperusplein 1-19 | | henks@hhinsi@hp4nl 2419 AP Den Haag - The Netherlands | | henks@nikhef@hp4nl tel: (31) 70 618419 fax: (31) 70 618599 | +-----------------------------------------------------------------------------+
ok@cs.mu.oz.au (Richard O'Keefe) (10/04/89)
In article <585@nikhefk.UUCP>, henks@nikhefk.UUCP (Henk Schouten) writes: > I am looking for a description of the grammar of the Prolog language to be > used in building a syntax checker with Lex/Yacc. Since building the syntax > checker is a learning project for students any pointers to existing syntax > checkers for Prolog won't help. (A) There is a public-domain tokeniser for Prolog and a public-domain parser for Prolog. I wrote the tokeniser and adapted the parser, and posted both to the net in '84. Both are Prolog source code. I know you say "any pointers to existing .. won't help", but the tokeniser and parser are about as concise a specification of Prolog syntax as you will get. (B) However, Prolog is an extensible language. The lexical structure is not extensible, so you could use Lex. I have a table-driven tokeniser in C (I've been using it in an editor for years, and it was donated to Stony Brook Prolog), so I know you can do that part statically as Lex requires. The extension mechanism of Prolog is that you can declare new operators, which can be prefix, infix, postfix, or any mix of them. For example, after doing :- op(100, fx, a). the expression "a X" is legal, "a a X" is illegal, "a(X)" and "a = X" are legal and would have been without the declaration. Had the declaration been :- op(800, fy, a). the expression "a a X" would have been legal and "a = X" illegal. After the declarations :- op(100, fx, a), op(100, xf, a), op(100, xfx, a) the expressions "a X", "X a", and "X a X" are all legal. It is easy to find small sets of operator declarations which make Prolog not LR(k) for any fixed k, still less LR(1). Some extensible languages (such as Algol 68) can be hacked by having a fixed set of operator meta-tokens, PREFIX_1, ... INFIX_9 and the like, and having the tokeniser return those. However, Prolog has 1200 precedence levels, so that would be rather a strain for Yacc. (C) The BSI/ISO committees have produced several different grammars for Prolog. I believe the current version to be the one in a document ISO/IEC JTC1 SC22 WG17 N40, dated July 1989. This one is pretty straightforward. (Unlike the version which I criticsed so strongly in this newsgroup, the current version makes no attempt to describe the syntax of Prolog code. It is frankly a syntax for TERMS, and as such nearly all of my objections no longer apply.) This document should be available from NPL. (D) Prolog is much more like Lisp than it is like Pascal. It is possible to define a (non-LR(1)) grammar for Prolog *data*, but every Prolog procedure is made of Prolog data, and "telling the birds from the flowers" is not always possible. Consider the fact that many Prolog systems have a macro-expansion predicate called term_expansion/2 which users can define at run-time. For example, after defining :- op(10, fx, a). :- op(10, fx, an). term_expansion(X is a Y, Fact) :- Fact =.. [Y,X]. term_expansion(X is an Y, Fact) :- Fact =.. [Y,X]. the clauses clyde is an elephant. lr(1) is a restriction. behave to a Prolog system exactly as if they had been written elephant(clyde). restriction(lr(1)). For semantic analysis, it is the term which results from such user-defined macro-processing (this includes grammar rule expansion) that matters. There can be no difference in meaning between X is (Y+1)/2 and is(X, /(+(Y,1),2)) because they are the SAME term, even though there is a syntactic difference. There are Prolog systems which do not offer term_expansion/2, and they can be very useful if they are otherwise excellent, but it is a definite restriction to leave this facility out. (E) The bottom line is that Yacc is not able to handle the full syntax of Prolog. You would have to restrict it by leaving out user-defined operators and user-defined macro-processing. The result would still be a useful language. But it would not be a very interesting one to use as a class project, because there is very little that a syntactic analysis can reject. Consider: p :- 2. does not make sense as a Prolog rule, because 2 is not something that can be called. (Although some Prolog systems use precisely this syntax to tie basic operations to special instructions.) However, q :- nonvar(( p :- 2 )). is perfectly sensible. The Prolog parser in my editor is, excluding the tokeniser, about 200 lines of C. About a third of that is disambiguating operators. That really doesn't sound as though it would make an interesting Yacc grammar.
he@stollco.UUCP (Helge Oldach) (10/10/89)
In article <2293@munnari.oz.au>, ok@cs.mu.oz.au (Richard O'Keefe) writes: > In article <585@nikhefk.UUCP>, henks@nikhefk.UUCP (Henk Schouten) writes: > > I am looking for a description of the grammar of the Prolog language to be > > used in building a syntax checker with Lex/Yacc. > [...] > (E) The bottom line is that Yacc is not able to handle the full syntax > of Prolog. You would have to restrict it by leaving out user-defined > operators and user-defined macro-processing. The result would still > be a useful language. But it would not be a very interesting one to > use as a class project, because there is very little that a syntactic > analysis can reject. I agree. My solution (some years ago) was to write an operator precedence parser for Prolog. The syntax is very straightforward: you just have operands, operators and brackets. And (this was my main motivation): there was no harm implementing extensible sets of operators. > The Prolog parser in my editor is, excluding > the tokeniser, about 200 lines of C. About a third of that is > disambiguating operators. That really doesn't sound as though it > would make an interesting Yacc grammar. I agree, again. My parser was (I hope, it still *is* :-)) very small, too.