[comp.lang.prolog] Difficulty of DCGS -- a clarification

ok@quintus.UUCP (Richard A. O'Keefe) (04/30/88)

Fernando has pointed out that my recent posting may be confusing.
For a graph with |V| vertices and |E| edges, it is obviously
possible to parse the list that represents it using the grammar
Gk in time O(|V|**k x |E|).  For fixed k, this is polynomial in
the size of the *input*.  What it is exponential in is the size
of the *grammar*.

It should be obvious that function-free DCGs can be parsed in time
polynomial in the size of the input.  Suppose that there are |C|
distinct constants, and that a particular rule contains N variables.
Then that rule corresponds to at most |C|**N rules with all arguments
ground.  In the worst case, |C| and |N| can both be O(|G|).
Now Earley's algorithm takes time O(|G|**2 x |L|**3) to parse a list
of length |L|, so we can parse a string of length |L| according to
a function-free DCG in time O(|G|**|G| x |L|**3), which is polynomial
in |L|.  But the multiplier can be _huge_.

Hmmm.  I think there is a big difference here between function-free DCGs
where terminals can only be constants (to which the |G|**|G| x |L|**3
bound is applicable) and ones where terminals in rules can be variables
(still, I think, polynomial, but the degree depends on the grammar).

There's an interesting correspondence between this example and BB&R's
analysis of Sheiber's method for parsing ID/LP grammars.  They show
that his method can require exponential time, but to do that you have
to devote most of the bulk of the grammar to a few very long rules.
If there is a smallish length bound on rules (and practical grammars
tend to have lots of short rules) Sheiber's algorithm is polynomial,
even in grammar size.

It's not without significance that the transitive closure and vertex
cover examples both required the arity of the terms to be unbounded.
Natural language grammars tend to have lots of rules all of which are
fairly short (I think any rule with more than say 6 elements on its
right would be pretty grossly implausible), and the number of agreement
features is usually pretty small.

I should make it clear that these postings are to be read as the results
of a beginner in the field who is hoping to provoke some discussion.