[comp.lang.prolog] Designing a type checker/inference system?

ok@cs.mu.oz.au (Richard O'Keefe) (09/26/89)
Thank you to the people who replied to my question about simple versions
of NP-complete problems in Horn clauses.  What I am doing can be described
as "non-recursive query complexity for datalog".  Some of the answers I
got addressed this, some didn't.  I have some good news and some bad news.

The bad news is that apart from "number problems", an amazing number of
NP-complete problems can be coded as simple non-recursive sets of function
free Horn clauses.  This means that it is not enough for an abstract
interpretation system to summarise the success sets of predicates, it
must also approximate conjunctions in clause bodies.

The good news is that apart from "number problems", an amazing number of
NP-complete problems can be coded as simple non-recursive sets of function
free Horn clauses.  Well, _I'm_ pleased, anyway.

If you place a bound on both the arity of predicates and the length of
a clause, function-free Horn clauses can be handled in polynomial time.
I have been corresponding with someone who believes that a type inference
system which is exponential in the length of clauses is ok because you
can always split a clause body into simpler pieces and call them.  The
trouble with that is that you _can't_ always do it if predicate arity is
bounded, and (unbounded arity, =< 3 literals in a clause) is just as
hard as (bounded arity, unbounded literals).

My correspondent also seems to believe that real programs have short
clauses anyway.  The reason I am posting this message is to remind anyone
out there who may be designing a type checker or type inference system
or even thinking of doing it that some Prolog code is written by machine.
In fact one of the intended applications of the Mycroft & O'Keefe type
checker was precisely that:  to type check code that had been generated
by another program.  It is not unreasonable for a program to generate a
clause that has thousands of variables in it.  (There is at least one
VLSI description program written in Prolog which represents wires as
Prolog variables, and if you think VLSI designs have a small number of
wires, let me tell you you're wrong.)

In fact it is not sufficient for a type checker or inference system
to operate in polynomial time, it must operate in LINEAR time and space
if it is to be practical.  Most of the ``interesting'' Prolog type
systems are easy to attack as being NP-complete or worse (including
the Mycroft & O'Keefe system because it allows constructor functions
to be overloaded, sigh).  But even passing that test isn't enough.  If
you want to claim that your type system is practical, you really have
to show that it will work in linear time and space.

It may perhaps be adequate if the programmer (whether human or machine)
can plant ``hints'' to the type checker or type inference system.  For
example, the Mycroft & O'Keefe type checker lets you write
	Term: Type
in your source code.  The :Type part is stripped off and is not used at
run time.  The point is that the annotation says ``don't bother trying
to work out which type Term might be, I tell you it is this Type''.  At
the cost of writing enough of these wretched hints, you can get any
well-typed clause through the type checker in linear time (basically the
programmer is playing Oracle).  I can't say that I like this, but at
least it does address the problem.  It would be interesting to know
whether other published type systems can be salvaged by similar means.

[I still haven't figured out how to encode NP-complete problems based
 on regular expressions in the very restricted logic I'm using, but
 that's a purely artificial restriction.  Type inference systems based
 on regular trees cannot afford to be exact...]

-- 
GNUs are more derived than other extant alcelaphines,| Richard A. O'Keefe
such as bonteboks, and show up later in the fossil   | visiting Melbourne
record than less highly derived species.  (Eldredge) | ok@munmurra.cs.mu.OZ.au