ok@mudla.cs.mu.OZ.AU (Richard O'Keefe) (05/09/90)
In article <9004301628.AA03522@sophist.uchicago.edu>, goer@SOPHIST.UCHICAGO.EDU (Richard Goerwitz) writes: > The other formalism is more process-oriented (hence it is ironic that > Prolog is the main language which implements it). You say - > S -> NP VP *process*-oriented? It's just a declarative constraint on a node in a ======= constituent structure: a node labelled S may dominate two daughters, one labelled NP and one labelled VP, and the one labelled NP must precede the one labelled VP. Rules like this can be used as easily for generation as for parsing. > Particularly interesting for us here is the way Prolog implements this > for malism. Assuming the Prolog you use implements definite clause > grammar notation, you can say, Since there is a *freely* distributable DCG->Prolog rule-at-a-time translator which has been broadcast over the net, I think it's safe to say that any Prolog system which _hasn't_ got DCG rules isn't trying. The really exciting thing about grammar rules in Prolog is that (if you avoid cuts and non-logical features of Prolog) you have a non-directional declarative formalism which can be processed in a variety of ways: one and the same set of rules may be loaded directly as Prolog code, or parsed with a left- corner parser, or given to a chart parser, or (and this _happens_) used for generation. In each case one "compiles" rather easily from rules to Prolog. > I don't see any reason that this couldn't be implemented EASILY in Icon. > And Icon has some neat advantages over Prolog, > such as very good string handling. I think Icon is a _wonderful_ language. But it isn't supposed to be a declarative language. Most of the time when I write grammar rules in Prolog I am using them to _generate_ lists. Some of the rest of the time I don't know whether the code I write will generate or parse, and have no reason to care. In the case of PATR-II, people are writing large grammars where one and the same grammar is used for both parsing and generation, basically by switching control strategies in a kind of interpreter. Yes, Icon has very good string handling. Anyone with substantial string-handling problems would be crazy not to use Icon if they had the chance. But what has that to do with parsing? I think that the most important lesson I ever learned about SNOBOL was when I enthused about it to an anthropologist, who said "it can parse sequences of characters? Great! Can it parse sequences of words? No? Then it's no use to me!" That is one of (several) respects in which Icon improves dramatically on SNOBOL: you _can_ parse a sequence of words in Icon using the same basic mechanisms that you use for string scanning. I imagine that someone writing a parser for English (or Akkadian!) in Icon would represent a sentence as a list of (pointers to) dictionary entries, where a dictionary entry might be a record or quite possibly a set of "senses". > [still talking about grammar rules in Prolog] > There needs to be some research on just > how far these indexed grammars can represent natural languages. Prolog grammar rules have the full power of Turing machines, because the additional arguments may be arbitrarily complex. (So may the attribute/value matrices used in several current formalisms.) > Recently a formalism called PATR has been developed. PATR is based on the idea of "complex categories". The label on a node of the constituent structure is taken to be, not a simple name as in BNF, but an attribute/value matrix in which the traditional category label itself, if there is such a thing at all, is merely one of the attributes. For example, instead of the simple categories S(entence), V(erb)P(hrase), V(erb), it is common to talk about [cat=v,bar=2], [cat=v,bar=1], [cat=v,bar=0] in order to capture certain regularities. For example, there is something called the Head Feature Convention in GPSG, which basically says that in a meaningful rule X0 -> X1 ... Xn there is a distinguished daughter Xi called the "head" of the phrase and X0 and Xi have certain features in common (such as 'cat' but not 'bar'). Information is passed around in PATR by a method similar to unification. Icon can certainly implement this, but so can Pascal... The point is that it isn't directional. In one use of a rule, an attribute may be in effect copied from the parent to its head daughter; in another use of the same rule in the same parse, the same attribute may be in effect copied from the daughter to the parent. The fact that PATR-II has been implemented in Lisp as well as Prolog shows that backtracking built into the the implementation language is not necessary. Icon may well make a good base for such parsers and generators, but don't expect it to have any advantage over Lisp (other than size, and of course price...).