[comp.lang.prolog] Prolog and occur checks.

harald@mulga.oz (Harald Sondergaard) (07/22/88)
This is long, so here is a summary: Prolog may not need to abandon
occur checks. It could be that Prolog compilers can generate sound
and efficient target programs after all.

In a reply to a posting about unification algorithms, Fernando
Pereira discusses "Robinson unification" and its drawbacks.
He explains why "RWOC" - Robinson unification without the occur
check - has come to be the de facto standard in Prolog. Realizing
that this is not a very happy situation, he mentions two ways out:

(A) the semantics can be changed so as to allow for infinite terms,

(B) the number of occur checks can be reduced by dataflow analysis.

To me it seems wrong to regard these two options as solutions to the
same problem. In fact (B) only makes sense once (A) has been rejected.
(A) is a possible answer to a question "what should the semantics be?"
(B) is a possible solution to the problem "if we want to maintain
Prolog's claimed affinity with first order logic (via Skolemization)
and thus regard programs as denoting computations over a domain of
finite terms, how can we compute efficiently?"
    For the sake of discussion, I shall here rule out (A). Then,
leaving out occur checks (RWOT) is unfortunate for at least three
reasons:

 1) "Unification without occur check" is ambiguous to begin with.
    Otherwise equivalent unification algorithms behave differently
    once their occur checks are removed. They all either succeed
    (with a wrong answer) or fail to terminate in cases where they
    should fail, but they differ in which of the two wrong behaviours
    they favour. (Extremes are Herbrand's algorithm as given in [3]
    which will always fail to terminate, and the algorithm in [6]
    which will always succeed. Different versions of Robinson's
    algorithm lie in between. Note that this says nothing about
    the algorithms, only about the danger of removing occur checks.)
    This seems to me to imply that a definition of standard Prolog
    will have to specify exactly how unification should be done.
    That is, if Prolog is meant to lack occur checks, yet have its
    usual semantics, unification must be defined operationally
    (which was not exactly what we hoped for).

 2) Prolog becomes unsound. Simple false statements like

      (for-all x there-is y : p(x,y)) => there-is z : p(z,z)

    can readily be "proved" using Prolog without occur checks:
    just code the premise as the clause p(x,f(x)) and query p(z,z).

 3) Prolog without occur checks lacks fundamental substitutivity
    properties that Prolog _with_ occur checks possesses. In
    particular, unfolding can introduce occur check problems in
    initially innocent programs [4].

Of the two wrong behaviours mentioned under 1), nontermination
is the least evil. I still find it hard to accept the fact that
most Prolog systems can return wrong answers without giving the
slightest indication of error. I cannot be the only one who has
experienced this problem in practice.
    The tale we are all told is that once upon a time, the
elders of Logicia had to choose between a sound language with
nice properties or one that would allow them to say "list
concatenation" as fast as the people in the neighbouring
Functionia, and they wisely chose the latter. But are these
options _necessarily_ mutually exclusive?
    Two observations about the exponential time complexity of
Robinson's algorithm [9] are as old as the algorithm itself:
occur checks take up much time and many "subunifications"
performed are redundant. Implementors chose to remove the
occur check while others designed linear algorithms by
avoiding redundant computation and keeping occur checks,
i.e., guarantee soundness [5,7].
    Apparently the linear algorithms were not very successful.
As Fernando Pereira points out, ROWC is superior _in_practice_.
(It does not seem fair to argue this point by showing the O(n^2)
worst-case nature of list concatenation (here n is the length of
the first list) using a linear algorithm. This is an upper bound
and one could still expect that a computation using one of the
linear algorithms would perform better than O(n^2) _in_practice_,
which is the issue here.)
    An interesting point is made in [1]: Robinson unification
_with_ occur checks can have time complexity as low as O(p^2),
where p is the number of symbols in the terms to be unified,
and for practical purposes this gives (slightly) faster
performance than using the linear algorithm of [5].
    Still, however, there seems to be a consensus that this
does not suffice. Of course, anything else being equal, omitting
occur checks will save time. But since I believe Prolog should
be sound, I would argue that we should attempt to get rid of the
occur checks in a safe way, by omitting them only when we know
that they are not needed.
    This brings us to the idea of using dataflow analysis to
detect cases where occur checks can be omitted. It is due to
Plaisted [7] and the underlying theory has been studied in some
detail [1,9]. I have, however,  heard of no experimental work to
investigate its usefulness in practice. Such work is needed,
because it might turn out to demonstrate that Prolog can employ
occur checks without Prolog compilers having to generate
inefficient target programs.
    It would be nice to hear from people who have done
_experimental_ work in one of the two areas "Linear Unification"
(like [1]) and "Dataflow Analysis for Occur Check Elimination".
Hopefully there is still room for improving the Prolog state-of-art.

[1]  Jones, N. D. & H. Sondergaard, A semantics-based framework
     for the abstract interpretation of Prolog, in: Abstract
     Interpretation of Declarative Languages (S. Abramsky & C.
     Hankin, eds.), Ellis Horwood (1987) 123-142.

[2]  Corbin, J. & M. Bidoit, A rehabilitation of Robinson's
     unification algorithm, Information Processing 83 (R. Mason,
     ed.), North-Holland (1983).

[3]  Lassez, J.-L., M. Maher & K. Marriott, Unification revisited,
     in: Foundations of Deductive Databases and Logic Programming
     (J. Minker, ed.), Morgan Kaufmann (1988) 587-625.

[4]  Marriott, K. & H. Sondergaard, On Prolog and the Occur Check
     Problem, Draft note, Dept. of Computer Science, University of
     Melbourne (1988).

[5]  Martelli, A. & U. Montanari, An efficient unification algorithm,
     ACM Trans. Programming Languages and Systems 4, 2 (1982) 250-282.

[6]  Nilsson, N., Principles of Artificial Intelligence, Tioga (1980).

[7]  Paterson, M. S. & M. N. Wegman, Linear unification, J. of Computer
     and System Sciences 16 (1978) 158-167.

[8]  Plaisted, D., The occur-check problem in Prolog, Proc. Symp. Logic
     Programming, Atlantic City, New Jersey (1984) 272-280.

[9]  Robinson, J. A., A machine-oriented logic based on the resolution
     principle, J. ACM 12, 1 (1965) 23-41.

[10] Sondergaard, H., An application of abstract interpretation of
     logic programs: occur check reduction, Proc. ESOP 86 (B. Robinet
     & R. Wilhelm, eds.), Springer LNCS 213 (1986) 327-338.

-- Harald Sondergaard