pgl@cup.portal.com (Peter G Ludemann) (12/04/89)
A Prolog standard must be unambiguous in three respects: - syntax - semantics of the syntax - semantics of the built-in predicates ("BIPs"). I wish to discuss some aspects of the syntax. Unambiguous syntax is much easier to define than unambiguous semantics, so once that is resolved, we can move on to more difficult items. In an earlier article, I said that I had tried to make an LR(k) grammar for Prolog. Richard O'Keefe said that no programming language is LR(k). Of course, Algol-60, C, Pascal, etc. are not LR(k) or even context-free. Like all computer languages, they are context-sensitive. But they can easily be made context-free by adding a bit of intelligence to the lexical analyzer. And the same can be done for Prolog. [No sane person would attempt to make an LR(k) parser for Prolog -- the job is much better done with operator precedence (which are a subset of LR(k) languages). But I don't trust myself to check out a grammar for ambiguity by hand -- I like to have an algorithm check it out. LR(k) is the most general parser generator I have. And I can easily construct a subset of Prolog grammar, say with precedence 1200, 1000, 700 and 500 operators, and use that to verify the non-ambiguity.] If I were starting a new Prolog implementation, I would very much want a precise, formal definition of Prolog syntax. So far, I have not seen one -- what I have seen are all ambiguous with some words added which still don't clear up all the ambiguities. [Yes, Richard, I have implemented a Prolog parser or two -- in Prolog and in C. It's not hard -- but it is hard to verify exactly what grammar the parser accepts.] In particular, the following potentially ambiguous situations are not mentioned in N40. Assume that `p', `l', `r' and `s' are prefix (fy), left-to-right (yfx), right-to-left (xfy) and suffix operators (yf), all of the same priority (less than 1000). Here are the interesting cases, not covered by any of the grammars which I have seen. [I have tried these out on a variety of Prologs and they have given differing answers, including "syntax error" for some of them]: p a l b p a r b a l b s a r b s p a s a l b r c a r b l c a l p b a l b s I don't see an intuitive parse for any of these and I therefore propose that all of them be forbidden in the Standard. [For example, if Edinburgh had prefix `-' and infix `-' at the same priority, the term `- a - b' would parse in a non-obvious way, as '-'(a-b).] - Peter Ludemann pgl@cup.portal.com ...!sun!portal!cup!pgl --- my opinions are my sole responsibility, of course ---
micha@ecrcvax.UUCP (Micha Meier) (12/08/89)
In article <24702@cup.portal.com> pgl@cup.portal.com (Peter G Ludemann) writes: >In an earlier article, I said that I had tried to make an LR(k) >grammar for Prolog. Richard O'Keefe said that no programming >language is LR(k). Of course, Algol-60, C, Pascal, etc. are >not LR(k) or even context-free. Like all computer languages, >they are context-sensitive. But they can easily be made >context-free by adding a bit of intelligence to the lexical >analyzer. And the same can be done for Prolog. I'm afraid this is not the case. No matter how clever your lexical analyzer is, the parser has to cope with terms like prefix infix infix infix ... which cannot be resolved in k steps. In the following 'p' means prefix, 'i' infix, upper case letters mean that the corresponding symbol is parsed as functor, lower case means that it is parsed as an atom, ! means the main functor of the whole term. 1) p. 2) P! i. 3) p I! i. 4) P! i I i. or P i I! i. 5) p I! i I i. or p I i I! i. 6) P! i I i I i. or P i I! i I i. or P i I i I! i. ... As you can see, on even lines the odd symbols represent functors and the even ones atoms, on odd lines it is the opposite. If there are several alternatives in one line, they can be resolved using the precedence and associativity, but until your parser knows how many symbols in sequence there are, it cannot start to apply these rules. This is an instance of the general problem that the associativity and precedence rules allow you to specify which functor binds more than the other one, but it does not tell you which symbol is to be taken as functor. This problem can be resolved e.g. by imposing that an atom which has been declared as a prefix operator always keeps its precedence, even if used as atom, like in Quintus. This of course makes the grammar less clear to normal users but seems to be a good compromise. Another problem concerns operators which are both infix and postfix. When you immagine a sequence of such operators, there is 2^(n-1) of possible interpretations, e.g. for 4: a Ip! ip Ip ip a Ip ip Ip! ip a Ip! ip iP iP a Ip ip iP iP! a iP Ip! ip iP a iP Ip ip iP! a iP iP Ip! ip a iP iP iP iP! (Ip means it is taken as infix, iP as postfix). Here, every symbol, except the first one, can be interpreted either as atom or as functor, although the precedence and associativity can help. Quintus 2.0 and others have the non-documented feature that (a ip ip) is parsed as ip(a, ip) even if the postfix precedence is lower than the infix one: | ?- op(200, yf, a), op(300, xfy, a). yes | ?- display(b a a). a(b,a) In fact, nobody has yet specified whether in this case the precedence applies or whether the term is ambiguous. We have solved this problem in Sepia by forbidding the infix/postfix combination and this seems to be a reasonable restriction. --Micha Meier