[comp.lang.prolog] Ambiguities in WG17's syntax

pgl@cup.portal.com (Peter G Ludemann) (12/04/89)

A Prolog standard must be unambiguous in three respects:
- syntax
- semantics of the syntax
- semantics of the built-in predicates ("BIPs").

I wish to discuss some aspects of the syntax.  Unambiguous
syntax is much easier to define than unambiguous semantics, so
once that is resolved, we can move on to more difficult items.

In an earlier article, I said that I had tried to make an LR(k)
grammar for Prolog.  Richard O'Keefe said that no programming
language is LR(k).  Of course, Algol-60, C, Pascal, etc. are
not LR(k) or even context-free.  Like all computer languages,
they are context-sensitive.  But they can easily be made
context-free by adding a bit of intelligence to the lexical
analyzer.  And the same can be done for Prolog.

  [No sane person would attempt to make an LR(k) parser for
  Prolog -- the job is much better done with operator precedence
  (which are a subset of LR(k) languages).  But I don't trust
  myself to check out a grammar for ambiguity by hand -- I like
  to have an algorithm check it out.  LR(k) is the most general
  parser generator I have.  And I can easily construct a subset
  of Prolog grammar, say with precedence 1200, 1000, 700 and 500
  operators, and use that to verify the non-ambiguity.]

If I were starting a new Prolog implementation, I would very
much want a precise, formal definition of Prolog syntax.  So
far, I have not seen one -- what I have seen are all ambiguous
with some words added which still don't clear up all the
ambiguities.  [Yes, Richard, I have implemented a Prolog parser
or two -- in Prolog and in C.  It's not hard -- but it is hard
to verify exactly what grammar the parser accepts.]

In particular, the following potentially ambiguous situations
are not mentioned in N40.  Assume that `p', `l', `r' and `s'
are prefix (fy), left-to-right (yfx), right-to-left (xfy) and
suffix operators (yf), all of the same priority (less than
1000).  Here are the interesting cases, not covered by any of
the grammars which I have seen.  [I have tried these out on a
variety of Prologs and they have given differing answers,
including "syntax error" for some of them]:

p a l b      p a r b
a l b s      a r b s
p a s
a l b r c    a r b l c
a l p b      a l b s

I don't see an intuitive parse for any of these and I therefore
propose that all of them be forbidden in the Standard.  [For
example, if Edinburgh had prefix `-' and infix `-' at the same
priority, the term `- a - b' would parse in a non-obvious way,
as '-'(a-b).]


- Peter Ludemann      pgl@cup.portal.com
                      ...!sun!portal!cup!pgl

   --- my opinions are my sole responsibility, of course ---

micha@ecrcvax.UUCP (Micha Meier) (12/08/89)

In article <24702@cup.portal.com> pgl@cup.portal.com (Peter G Ludemann) writes:
>In an earlier article, I said that I had tried to make an LR(k)
>grammar for Prolog.  Richard O'Keefe said that no programming
>language is LR(k).  Of course, Algol-60, C, Pascal, etc. are
>not LR(k) or even context-free.  Like all computer languages,
>they are context-sensitive.  But they can easily be made
>context-free by adding a bit of intelligence to the lexical
>analyzer.  And the same can be done for Prolog.

I'm afraid this is not the case. No matter how clever your lexical
analyzer is, the parser has to cope with terms like

	prefix infix infix infix ...

which cannot be resolved in k steps. In the following 'p' means prefix,
'i' infix, upper case letters mean that the corresponding symbol
is parsed as functor, lower case means that it is parsed
as an atom, ! means the main functor of the whole term.

1)	p.
2)	P! i.
3)	p  I! i.
4)	P! i  I  i.		or P  i  I! i.
5)	p  I! i  I  i.		or p  I  i  I! i.
6)	P! i  I  i  I  i.	or P  i  I! i  I  i.	or P  i  I  i  I! i.
	...
As you can see, on even lines the odd symbols represent functors
and the even ones atoms, on odd lines it is the opposite. If there are several
alternatives in one line, they can be resolved using the precedence
and associativity, but until your parser knows how many symbols
in sequence there are, it cannot start to apply these rules.

This is an instance of the general problem that the associativity
and precedence rules allow you to specify which functor
binds more than the other one, but it does not tell you
which symbol is to be taken as functor.

This problem can be resolved e.g. by imposing that an atom which has
been declared as a prefix operator always keeps its precedence,
even if used as atom, like in Quintus. This of course makes
the grammar less clear to normal users but seems to be a good compromise.

Another problem concerns operators which are both infix and postfix.
When you immagine a sequence of such operators, there is 2^(n-1)
of possible interpretations, e.g. for 4:

	a  Ip! ip  Ip  ip
	a  Ip  ip  Ip! ip
	a  Ip! ip  iP  iP
	a  Ip  ip  iP  iP!
	a  iP  Ip! ip  iP
	a  iP  Ip  ip  iP!
	a  iP  iP  Ip! ip
	a  iP  iP  iP  iP!

(Ip means it is taken as infix, iP as postfix).
Here, every symbol, except the first one, can be interpreted either
as atom or as functor, although the precedence and associativity
can help.

Quintus 2.0 and others have the non-documented feature that (a ip ip)
is parsed as ip(a, ip) even if the postfix precedence is lower than
the infix one:
| ?- op(200, yf, a), op(300, xfy, a).

yes
| ?- display(b a a).
a(b,a)
In fact, nobody has yet specified whether in this case the precedence
applies or whether the term is ambiguous.
We have solved this problem in Sepia by forbidding the infix/postfix
combination and this seems to be a reasonable restriction.

--Micha Meier