ok@quintus.UUCP (Richard A. O'Keefe) (04/29/88)
Difficulty of DCGs
A question which Fernando Pereira has raised (and studied) is "what
classes of DCGs are polynomial-time parsable"? This note presents a
negative result.
Let us consider only pure DCGs. That is, there are no escapes to
Prolog {...}, no cuts ! or (if->then;else)s, and no built-in predicates,
and there is no push-back. Arbitrary pure DCGs are obviously as
powerful as pure Prolog, and thus intractable.
Definition: a "context-free" DCG is one where no non-terminal has
any arguments.
Since context-free DCGs coincide with context-free grammars, they
are polynomial-time parsable.
Any syntactic restriction on DCGs defines a corresponding class of
logic programs: consider the subset of DCGs which do not contain [...],
only the empty terminal list []. We can set up a preliminary table:
context-free DCGs (cubic) propositional Horn clauses (linear)
? function-free Horn clauses (np-hard)
pure DCGs (intractable) pure Prolog (intractable)
The point of this note is to fill in the "?" slot and show directly
that this class is NP-hard. This is redundant, because it can't be easier
than function-free Horn clauses, but my hope is that seeing some of the
things that can go wrong may suggest methods for avoiding them.
Definition: a "function-free" DCG is one where the arguments of
every non-terminal are variables and constants, but not compound terms,
and terminals are constants or variables.
Definition: a vertex cover of a graph (Vertices,Edges) is a subset
V of Vertices such that for each edge From->To in Edges, either From
is in V or To is in V. The VERTEX COVER problem is, given a graph,
to determine whether it has a vertex cover with |V| <= k. This is
known to be NP-complete.
For fixed k, define a function-free DCG Gk thus:
There is a non-terminal e of arity k.
For 0 <= i <= k there is a non-terminal v[i] of arity i and rules
v[i](X1,..) --> [Xi,=,=], v[i+1](X1,..,Xi). % except v[k]
v[i](X1,..) --> [ _,=,=], v[i](X1,..)
v[i](X1,..) --> [ *,*,*], e(X1,..,--,..,--).
For example, for k=2, we would have
v0 --> [A,=,=], v1(A).
v0 --> [_,=,=], v0.
v0 --> [*,*,*], e(--,--).
v1(A) --> [B,=,=], v2(A,B).
v1(A) --> [_,=,=], v1(A).
v1(A) --> [*,*,*], e(A,--).
v2(A,B) --> [_,=,=], v2(A,B).
v2(A,B) --> [*,*,*], e(A,B).
There are 2k+1 rules
e(X1,..,Xk) --> [*, *, *].
e(X1,..,Xk) --> [X1,>, _], e(X1,..,Xk).
...
e(X1,..,Xk) --> [Xk,>, _], e(X1,..,Xk).
e(X1,..,Xk) --> [_, >,X1], e(X1,..,Xk).
...
e(X1,..,Xk) --> [_, >,Xk], e(X1,..,Xk).
For example, for k=2, we would have
e(A,B) --> [*,*,*].
e(A,B) --> [A,>,_], e(A,B).
e(A,B) --> [B,>,_], e(A,B).
e(A,B) --> [_,>,A], e(A,B).
e(A,B) --> [_,>,B], e(A,B).
Given a graph (V,E), convert it to a list thus:
[V1,=,=,...,Vn,=,=,*,*,*,F1,>,T1,...,Fm,>,Tm,*,*,*].
\____vertices___/ \______edges______/
It follows that phrase(v0, [....]) precisely when the graph has a
vertex cover of size <= k, and the arguments e//k carries around
form such a cover. Since the size of Gk is O(k**2), we have a
polynomial reduction from VERTEX COVER to function-free DCGs, and
parsing the latter is NP-hard.
What is particularly distressing about this is that the grammar is so
very simple: not only is it function-free, but none of the non-terminals
can produce the empty string, and more than that, every single reduction
consumes a triple of terminals. A successful parse takes |V|+|E|+2 steps,
which is independent of k. Before I started to write this note, I thought
that requiring every rule to match at least one terminal might be a useful
constraint.
The underlying problem in this example is that the vi//i rules can
guess an arbitrary subset of k vertices which the e//k rules then check.
I have pointed out a similar problem with transitive closure. If we
place an arbitrary bound on the arity of predicates and non-terminals,
the problem is defined away. But the bound has to be very low in order
to be useful: there are O(|V|**k) subsets to guess from.