[comp.lang.prolog] Complexity of DCGs

ok@quintus.UUCP (Richard A. O'Keefe) (04/29/88)

			Difficulty of DCGs

    A question which Fernando Pereira has raised (and studied) is "what
classes of DCGs are polynomial-time parsable"?  This note presents a
negative result.

    Let us consider only pure DCGs.  That is, there are no escapes to
Prolog {...}, no cuts ! or (if->then;else)s, and no built-in predicates,
and there is no push-back.  Arbitrary pure DCGs are obviously as
powerful as pure Prolog, and thus intractable.

    Definition:  a "context-free" DCG is one where no non-terminal has
any arguments.

    Since context-free DCGs coincide with context-free grammars, they
are polynomial-time parsable.

    Any syntactic restriction on DCGs defines a corresponding class of
logic programs: consider the subset of DCGs which do not contain [...],
only the empty terminal list [].  We can set up a preliminary table:

	context-free DCGs (cubic)	propositional Horn clauses (linear)
	?				function-free Horn clauses (np-hard)
	pure DCGs (intractable)		pure Prolog (intractable)

    The point of this note is to fill in the "?" slot and show directly
that this class is NP-hard.  This is redundant, because it can't be easier
than function-free Horn clauses, but my hope is that seeing some of the
things that can go wrong may suggest methods for avoiding them.

    Definition:  a "function-free" DCG is one where the arguments of
every non-terminal are variables and constants, but not compound terms,
and terminals are constants or variables.

    Definition: a vertex cover of a graph (Vertices,Edges) is a subset
V of Vertices such that for each edge From->To in Edges, either From
is in V or To is in V.  The VERTEX COVER problem is, given a graph,
to determine whether it has a vertex cover with |V| <= k.  This is
known to be NP-complete.

For fixed k, define a function-free DCG Gk thus:

    There is a non-terminal e of arity k.
    For 0 <= i <= k there is a non-terminal v[i] of arity i and rules 

	v[i](X1,..) --> [Xi,=,=], v[i+1](X1,..,Xi).	% except v[k]
	v[i](X1,..) --> [ _,=,=], v[i](X1,..)
	v[i](X1,..) --> [ *,*,*], e(X1,..,--,..,--).

    For example, for k=2, we would have

	v0 --> [A,=,=], v1(A).
	v0 --> [_,=,=], v0.
	v0 --> [*,*,*], e(--,--).

	v1(A) --> [B,=,=], v2(A,B).
	v1(A) --> [_,=,=], v1(A).
	v1(A) --> [*,*,*], e(A,--).

	v2(A,B) --> [_,=,=], v2(A,B).
	v2(A,B) --> [*,*,*], e(A,B).

    There are 2k+1 rules

	e(X1,..,Xk) --> [*, *, *].
	e(X1,..,Xk) --> [X1,>, _], e(X1,..,Xk).
	...
	e(X1,..,Xk) --> [Xk,>, _], e(X1,..,Xk).
	e(X1,..,Xk) --> [_, >,X1], e(X1,..,Xk).
	...
	e(X1,..,Xk) --> [_, >,Xk], e(X1,..,Xk).

    For example, for k=2, we would have

	e(A,B) --> [*,*,*].
	e(A,B) --> [A,>,_], e(A,B).
	e(A,B) --> [B,>,_], e(A,B).
	e(A,B) --> [_,>,A], e(A,B).
	e(A,B) --> [_,>,B], e(A,B).

Given a graph (V,E), convert it to a list thus:
    [V1,=,=,...,Vn,=,=,*,*,*,F1,>,T1,...,Fm,>,Tm,*,*,*].
     \____vertices___/       \______edges______/

It follows that phrase(v0, [....]) precisely when the graph has a
vertex cover of size <= k, and the arguments e//k carries around
form such a cover.  Since the size of Gk is O(k**2), we have a
polynomial reduction from VERTEX COVER to function-free DCGs, and
parsing the latter is NP-hard.

    What is particularly distressing about this is that the grammar is so
very simple:  not only is it function-free, but none of the non-terminals
can produce the empty string, and more than that, every single reduction
consumes a triple of terminals.  A successful parse takes |V|+|E|+2 steps,
which is independent of k.  Before I started to write this note, I thought
that requiring every rule to match at least one terminal might be a useful
constraint.

    The underlying problem in this example is that the vi//i rules can
guess an arbitrary subset of k vertices which the e//k rules then check.
I have pointed out a similar problem with transitive closure.  If we
place an arbitrary bound on the arity of predicates and non-terminals,
the problem is defined away.  But the bound has to be very low in order
to be useful: there are O(|V|**k) subsets to guess from.