harald@mulga.oz (Harald Sondergaard) (07/22/88)
This is long, so here is a summary: Prolog may not need to abandon occur checks. It could be that Prolog compilers can generate sound and efficient target programs after all. In a reply to a posting about unification algorithms, Fernando Pereira discusses "Robinson unification" and its drawbacks. He explains why "RWOC" - Robinson unification without the occur check - has come to be the de facto standard in Prolog. Realizing that this is not a very happy situation, he mentions two ways out: (A) the semantics can be changed so as to allow for infinite terms, (B) the number of occur checks can be reduced by dataflow analysis. To me it seems wrong to regard these two options as solutions to the same problem. In fact (B) only makes sense once (A) has been rejected. (A) is a possible answer to a question "what should the semantics be?" (B) is a possible solution to the problem "if we want to maintain Prolog's claimed affinity with first order logic (via Skolemization) and thus regard programs as denoting computations over a domain of finite terms, how can we compute efficiently?" For the sake of discussion, I shall here rule out (A). Then, leaving out occur checks (RWOT) is unfortunate for at least three reasons: 1) "Unification without occur check" is ambiguous to begin with. Otherwise equivalent unification algorithms behave differently once their occur checks are removed. They all either succeed (with a wrong answer) or fail to terminate in cases where they should fail, but they differ in which of the two wrong behaviours they favour. (Extremes are Herbrand's algorithm as given in [3] which will always fail to terminate, and the algorithm in [6] which will always succeed. Different versions of Robinson's algorithm lie in between. Note that this says nothing about the algorithms, only about the danger of removing occur checks.) This seems to me to imply that a definition of standard Prolog will have to specify exactly how unification should be done. That is, if Prolog is meant to lack occur checks, yet have its usual semantics, unification must be defined operationally (which was not exactly what we hoped for). 2) Prolog becomes unsound. Simple false statements like (for-all x there-is y : p(x,y)) => there-is z : p(z,z) can readily be "proved" using Prolog without occur checks: just code the premise as the clause p(x,f(x)) and query p(z,z). 3) Prolog without occur checks lacks fundamental substitutivity properties that Prolog _with_ occur checks possesses. In particular, unfolding can introduce occur check problems in initially innocent programs [4]. Of the two wrong behaviours mentioned under 1), nontermination is the least evil. I still find it hard to accept the fact that most Prolog systems can return wrong answers without giving the slightest indication of error. I cannot be the only one who has experienced this problem in practice. The tale we are all told is that once upon a time, the elders of Logicia had to choose between a sound language with nice properties or one that would allow them to say "list concatenation" as fast as the people in the neighbouring Functionia, and they wisely chose the latter. But are these options _necessarily_ mutually exclusive? Two observations about the exponential time complexity of Robinson's algorithm [9] are as old as the algorithm itself: occur checks take up much time and many "subunifications" performed are redundant. Implementors chose to remove the occur check while others designed linear algorithms by avoiding redundant computation and keeping occur checks, i.e., guarantee soundness [5,7]. Apparently the linear algorithms were not very successful. As Fernando Pereira points out, ROWC is superior _in_practice_. (It does not seem fair to argue this point by showing the O(n^2) worst-case nature of list concatenation (here n is the length of the first list) using a linear algorithm. This is an upper bound and one could still expect that a computation using one of the linear algorithms would perform better than O(n^2) _in_practice_, which is the issue here.) An interesting point is made in [1]: Robinson unification _with_ occur checks can have time complexity as low as O(p^2), where p is the number of symbols in the terms to be unified, and for practical purposes this gives (slightly) faster performance than using the linear algorithm of [5]. Still, however, there seems to be a consensus that this does not suffice. Of course, anything else being equal, omitting occur checks will save time. But since I believe Prolog should be sound, I would argue that we should attempt to get rid of the occur checks in a safe way, by omitting them only when we know that they are not needed. This brings us to the idea of using dataflow analysis to detect cases where occur checks can be omitted. It is due to Plaisted [7] and the underlying theory has been studied in some detail [1,9]. I have, however, heard of no experimental work to investigate its usefulness in practice. Such work is needed, because it might turn out to demonstrate that Prolog can employ occur checks without Prolog compilers having to generate inefficient target programs. It would be nice to hear from people who have done _experimental_ work in one of the two areas "Linear Unification" (like [1]) and "Dataflow Analysis for Occur Check Elimination". Hopefully there is still room for improving the Prolog state-of-art. [1] Jones, N. D. & H. Sondergaard, A semantics-based framework for the abstract interpretation of Prolog, in: Abstract Interpretation of Declarative Languages (S. Abramsky & C. Hankin, eds.), Ellis Horwood (1987) 123-142. [2] Corbin, J. & M. Bidoit, A rehabilitation of Robinson's unification algorithm, Information Processing 83 (R. Mason, ed.), North-Holland (1983). [3] Lassez, J.-L., M. Maher & K. Marriott, Unification revisited, in: Foundations of Deductive Databases and Logic Programming (J. Minker, ed.), Morgan Kaufmann (1988) 587-625. [4] Marriott, K. & H. Sondergaard, On Prolog and the Occur Check Problem, Draft note, Dept. of Computer Science, University of Melbourne (1988). [5] Martelli, A. & U. Montanari, An efficient unification algorithm, ACM Trans. Programming Languages and Systems 4, 2 (1982) 250-282. [6] Nilsson, N., Principles of Artificial Intelligence, Tioga (1980). [7] Paterson, M. S. & M. N. Wegman, Linear unification, J. of Computer and System Sciences 16 (1978) 158-167. [8] Plaisted, D., The occur-check problem in Prolog, Proc. Symp. Logic Programming, Atlantic City, New Jersey (1984) 272-280. [9] Robinson, J. A., A machine-oriented logic based on the resolution principle, J. ACM 12, 1 (1965) 23-41. [10] Sondergaard, H., An application of abstract interpretation of logic programs: occur check reduction, Proc. ESOP 86 (B. Robinet & R. Wilhelm, eds.), Springer LNCS 213 (1986) 327-338. -- Harald Sondergaard