ok@quintus.UUCP (Richard A. O'Keefe) (04/29/88)
Difficulty of DCGs A question which Fernando Pereira has raised (and studied) is "what classes of DCGs are polynomial-time parsable"? This note presents a negative result. Let us consider only pure DCGs. That is, there are no escapes to Prolog {...}, no cuts ! or (if->then;else)s, and no built-in predicates, and there is no push-back. Arbitrary pure DCGs are obviously as powerful as pure Prolog, and thus intractable. Definition: a "context-free" DCG is one where no non-terminal has any arguments. Since context-free DCGs coincide with context-free grammars, they are polynomial-time parsable. Any syntactic restriction on DCGs defines a corresponding class of logic programs: consider the subset of DCGs which do not contain [...], only the empty terminal list []. We can set up a preliminary table: context-free DCGs (cubic) propositional Horn clauses (linear) ? function-free Horn clauses (np-hard) pure DCGs (intractable) pure Prolog (intractable) The point of this note is to fill in the "?" slot and show directly that this class is NP-hard. This is redundant, because it can't be easier than function-free Horn clauses, but my hope is that seeing some of the things that can go wrong may suggest methods for avoiding them. Definition: a "function-free" DCG is one where the arguments of every non-terminal are variables and constants, but not compound terms, and terminals are constants or variables. Definition: a vertex cover of a graph (Vertices,Edges) is a subset V of Vertices such that for each edge From->To in Edges, either From is in V or To is in V. The VERTEX COVER problem is, given a graph, to determine whether it has a vertex cover with |V| <= k. This is known to be NP-complete. For fixed k, define a function-free DCG Gk thus: There is a non-terminal e of arity k. For 0 <= i <= k there is a non-terminal v[i] of arity i and rules v[i](X1,..) --> [Xi,=,=], v[i+1](X1,..,Xi). % except v[k] v[i](X1,..) --> [ _,=,=], v[i](X1,..) v[i](X1,..) --> [ *,*,*], e(X1,..,--,..,--). For example, for k=2, we would have v0 --> [A,=,=], v1(A). v0 --> [_,=,=], v0. v0 --> [*,*,*], e(--,--). v1(A) --> [B,=,=], v2(A,B). v1(A) --> [_,=,=], v1(A). v1(A) --> [*,*,*], e(A,--). v2(A,B) --> [_,=,=], v2(A,B). v2(A,B) --> [*,*,*], e(A,B). There are 2k+1 rules e(X1,..,Xk) --> [*, *, *]. e(X1,..,Xk) --> [X1,>, _], e(X1,..,Xk). ... e(X1,..,Xk) --> [Xk,>, _], e(X1,..,Xk). e(X1,..,Xk) --> [_, >,X1], e(X1,..,Xk). ... e(X1,..,Xk) --> [_, >,Xk], e(X1,..,Xk). For example, for k=2, we would have e(A,B) --> [*,*,*]. e(A,B) --> [A,>,_], e(A,B). e(A,B) --> [B,>,_], e(A,B). e(A,B) --> [_,>,A], e(A,B). e(A,B) --> [_,>,B], e(A,B). Given a graph (V,E), convert it to a list thus: [V1,=,=,...,Vn,=,=,*,*,*,F1,>,T1,...,Fm,>,Tm,*,*,*]. \____vertices___/ \______edges______/ It follows that phrase(v0, [....]) precisely when the graph has a vertex cover of size <= k, and the arguments e//k carries around form such a cover. Since the size of Gk is O(k**2), we have a polynomial reduction from VERTEX COVER to function-free DCGs, and parsing the latter is NP-hard. What is particularly distressing about this is that the grammar is so very simple: not only is it function-free, but none of the non-terminals can produce the empty string, and more than that, every single reduction consumes a triple of terminals. A successful parse takes |V|+|E|+2 steps, which is independent of k. Before I started to write this note, I thought that requiring every rule to match at least one terminal might be a useful constraint. The underlying problem in this example is that the vi//i rules can guess an arbitrary subset of k vertices which the e//k rules then check. I have pointed out a similar problem with transitive closure. If we place an arbitrary bound on the arity of predicates and non-terminals, the problem is defined away. But the bound has to be very low in order to be useful: there are O(|V|**k) subsets to guess from.