[comp.lang.icon] encompassing formalism

ok@mudla.cs.mu.OZ.AU (Richard O'Keefe) (05/09/90)

In article <9004301628.AA03522@sophist.uchicago.edu>, goer@SOPHIST.UCHICAGO.EDU (Richard Goerwitz) writes:
> The other formalism is more process-oriented (hence it is ironic that
> Prolog is the main language which implements it).  You say -
>       S -> NP VP

*process*-oriented?  It's just a declarative constraint on a node in a
 =======             constituent structure:  a node labelled S may 
dominate two daughters, one labelled NP and one labelled VP, and the
one labelled NP must precede the one labelled VP.  Rules like this can
be used as easily for generation as for parsing.

> Particularly interesting for us here is the way Prolog implements this
> for malism.  Assuming the Prolog you use implements definite clause
> grammar notation, you can say,

Since there is a *freely* distributable DCG->Prolog rule-at-a-time translator
which has been broadcast over the net, I think it's safe to say that any
Prolog system which _hasn't_ got DCG rules isn't trying.  The really
exciting thing about grammar rules in Prolog is that (if you avoid cuts
and non-logical features of Prolog) you have a non-directional declarative
formalism which can be processed in a variety of ways:  one and the same
set of rules may be loaded directly as Prolog code, or parsed with a left-
corner parser, or given to a chart parser, or (and this _happens_) used
for generation.  In each case one "compiles" rather easily from rules to
Prolog.

> I don't see any reason that this couldn't be implemented EASILY in Icon.
> And Icon has some neat advantages over Prolog,
> such as very good string handling.

I think Icon is a _wonderful_ language.  But it isn't supposed to be a
declarative language.  Most of the time when I write grammar rules in
Prolog I am using them to _generate_ lists.  Some of the rest of the
time I don't know whether the code I write will generate or parse, and
have no reason to care.  In the case of PATR-II, people are writing
large grammars where one and the same grammar is used for both parsing
and generation, basically by switching control strategies in a kind of
interpreter.

Yes, Icon has very good string handling.  Anyone with substantial
string-handling problems would be crazy not to use Icon if they had
the chance.  But what has that to do with parsing?  I think that the
most important lesson I ever learned about SNOBOL was when I enthused
about it to an anthropologist, who said "it can parse sequences of
characters?  Great!  Can it parse sequences of words?  No?  Then it's
no use to me!"  That is one of (several) respects in which Icon
improves dramatically on SNOBOL:  you _can_ parse a sequence of words
in Icon using the same basic mechanisms that you use for string
scanning.  I imagine that someone writing a parser for English (or
Akkadian!) in Icon would represent a sentence as a list of (pointers
to) dictionary entries, where a dictionary entry might be a record
or quite possibly a set of "senses".

> [still talking about grammar rules in Prolog]
> There needs to be some research on just
> how far these indexed grammars can represent natural languages.

Prolog grammar rules have the full power of Turing machines,
because the additional arguments may be arbitrarily complex.
(So may the attribute/value matrices used in several current formalisms.)

> Recently a formalism called PATR has been developed.

PATR is based on the idea of "complex categories".
The label on a node of the constituent structure is taken to be,
not a simple name as in BNF, but an attribute/value matrix in which
the traditional category label itself, if there is such a thing at
all, is merely one of the attributes.  For example, instead of the
simple categories S(entence), V(erb)P(hrase), V(erb), it is common
to talk about [cat=v,bar=2], [cat=v,bar=1], [cat=v,bar=0] in order
to capture certain regularities.  For example, there is something
called the Head Feature Convention in GPSG, which basically says
that in a meaningful rule X0 -> X1 ... Xn there is a distinguished
daughter Xi called the "head" of the phrase and X0 and Xi have
certain features in common (such as 'cat' but not 'bar').

Information is passed around in PATR by a method similar to unification.
Icon can certainly implement this, but so can Pascal...  The point is
that it isn't directional.  In one use of a rule, an attribute may be
in effect copied from the parent to its head daughter; in another use
of the same rule in the same parse, the same attribute may be in
effect copied from the daughter to the parent.

The fact that PATR-II has been implemented in Lisp as well as Prolog
shows that backtracking built into the the implementation language is
not necessary.  Icon may well make a good base for such parsers and
generators, but don't expect it to have any advantage over Lisp
(other than size, and of course price...).