[comp.lang.prolog] canonical syntax

matsc@sics.se (Mats Carlsson) (12/08/89)

There is a problem with the "Syntax of Terms as Tokens" of Common
Prolog, which complicates the definition of a canonical syntax.

The canonical syntax is normally defined as quoting atoms that need it
and ignoring op declarations, writing all compound terms as
	<functor>(<arg 1>, ..., <arg n>)

The culprit is the syntax rules for term(N), which typically looks like

@var{term(N)}     --> @var{op(N,fx)}				(**)
                   |  @var{op(N,fy)}				(**)
                   |  @var{op(N,fx)} @var{subterm(N-1)}
                         @{ except the case @kbd{-} @var{number} @}
                         @{ if subterm starts with a @kbd{(},
                           @var{op} must be followed by a @var{space} @}
                   |  @var{op(N,fy)} @var{subterm(N)}
                         @{ if subterm starts with a @kbd{(},
                           op must be followed by a @var{space} @}
                   |  @var{subterm(N-1)} @var{op(N,xfx)} @var{subterm(N-1)}
                   |  @var{subterm(N-1)} @var{op(N,xfy)} @var{subterm(N)}
                   |  @var{subterm(N)} @var{op(N,yfx)} @var{subterm(N-1)}
                   |  @var{subterm(N-1)} @var{op(N,xf)}
                   |  @var{subterm(N)} @var{op(N,yf)}

@var{term(1000)}  --> @var{subterm(999)} @kbd{,} @var{subterm(1000)}

@var{term(0)}     --> @var{functor} @kbd{(} @var{arguments} @kbd{)}
                         @{ provided there is no space between
                           the @var{functor} and the @kbd{(} @}
                   |  @kbd{(} @var{subterm(1200)} @kbd{)}
                   |  @kbd{@{} @var{subterm(1200)} @kbd{@}}
                   |  @var{list}
                   |  @var{string}
                   |  @var{constant}
                   |  @var{variable}

@var{constant}          --> @var{atom} | @var{number}

@var{number}            --> @var{integer} | @var{float}

@var{atom}              --> @var{name}
                         @{ where @var{name} is not a prefix operator @}

The two rules marked (**) cause prefix operators used as atoms to have
precedence N, whereas all other atoms have precedence 0.

Another rule, not included above, defines the "context precedence" of
arguments of compund terms as 999.  Consequently, prefix operators
with precedence > 999 must be parenthesized *in the canonical syntax*
when they occur in compound terms, as in

	p((public)).


Questions:

What is the point of the above distinction between prefix operators and
other atoms?  To simplify the parser?

The canonical syntax would become independent of op declarations if
the two (**) rules and the constraint on @var{atom} not being a prefix
operator were deleted.  Would this measure make it more difficult to
parse or cause any other bad effects?
--
Mats Carlsson
SICS, PO Box 1263, S-164 28  KISTA, Sweden    Internet: matsc@sics.se
Tel: +46 8 7521543      Ttx: 812 61 54 SICS S      Fax: +46 8 7517230

lee@munnari.oz.au (Lee Naish) (12/09/89)

In article <MATSC.89Dec8125828@vishnu.sics.se> matsc@sics.se (Mats Carlsson) writes:
> prefix operators
>with precedence > 999 must be parenthesized *in the canonical syntax*
>when they occur in compound terms, as in
>
>	p((public)).

In the NU-Prolog parser parentheses are not needed in this context,
which is certainly an advantage for canonical IO.  Unfortunately, some
input which "standard" parsers can handle is not accepted by NU-Prolog.
The main reason for the nonstandard parser is so we can have prefix
binary operators (for quantifiers).

>The canonical syntax would become independent of op declarations if
>the two (**) rules and the constraint on @var{atom} not being a prefix
>operator were deleted.  Would this measure make it more difficult to
>parse or cause any other bad effects?

Not being an expert in parsing (or Prolog syntax) I don't know
what bad effects there might be (possible ambiguity?).  However, the
grammar modifications suggested by Mats don't correspond to what
NU-Prolog is able to parse.  "not ?- ." causes an error, for example,
even though it is unambiguous.  By adopting Mats' scheme more
unambiguous, currently unparsable terms could be parsed.

	lee