[net.lang.prolog] Defining functor/3

OKeefe.R.A.@EDXA@sri-unix.UUCP (10/04/83)
From:  Richard HPS (on ERCC DEC-10) <OKeefe.R.A.@EDXA>

I recently had the opportunity of looking at a Prolog system
which had been developed from the Clocksin & Mellish textbook.
The trouble is that the Clocksin & Mellish book was written to
help beginners learn how to use three existing Prolog interpreters
( the DEC-10, PDP-11, and EMAS Prolog systems ), and not to help
Prolog implementers produce exactly the same language.  So they
didn't discuss fine points.

An obvious example of this is the fact that in the DEC-10, EMAS,
& C-Prolog systems, disjunction is transparent to cut just like
conjunction.  What I mean by this is that in

        p :- a, !, b; c.
        p :- d.

if 'a' is true but 'b' is false, 'p' will fail, because the cut
cut right back to p, and didn't stop at the semicolon.  If you
are an implementor, you will regret this, because otherwise you
could write

        (A ; B) :- call(A).
        (A ; B) :- call(B).

But if you are NOT an implementor, there is no reason for you
to expect disjunction to be any different from conjunction in
this respect.  The transparency of semicolon falls quite
naturally out of the way the DEC-10 compiler and interpreter work.
C-Prolog ( based on EMAS Prolog ) gets it right, but it is much
harder.  Because PDP-11 Prolog Sacrificed Semicolon ( and a lot
of other things ) to keep its size down, and because disjunction
is not central to Prolog, the Clocksin & Mellish book doesn't
mention this "detail".  ( Of course, if Prolog didn't have cuts,
the problem wouldn't arise... )

A less obvious question is what functor, arg, and univ should do.
The point of this message is that there is a clear design principle
which can settle the question, regardless of what any particular
implementation happens to do.  { I haven't tried all possible
combinations on the DEC-10.  If it doesn't act in accord with this
principle, I shall regard DEC-10 Prolog as wrong. }

The principle is this: as far as possible, every system predicate
should behave AS IF it were defined by an explicit table.  The
interpreter or whatever should only produce an error message when
it is unable to simulate the table.

Here is a simple example.  The predicate succ/2 is "notionally"
defined by the infinite table

        succ(0, 1).
}i      succ(1, 2).
        ...
        succ(56432, 56433).
        ...

We can use this + the principle to specify exactly how succ(X,Y)
should behave in all circumstances:

        X instantiated, but not a non-negative integer => fail
        Y instantiated, but not a positive integer => fail
        X unbound, Y > 0 => bind X to Y-1
        Y unbound, X >= 0 => bind Y to X+1
        X and Y both unbound => apologise and abort

A more than acceptable alternative in the last case would be to
enumerate all possible solutions by backtracking, but this is
not always possible.  The point that I am making is that
succ(foo,Y) is not in error, it is simply false.  There are only
two "error" messages that succ is entitled to produce:

        sorry, succ(<maxint>,_) overflows
        sorry, succ(_,_) needs at least one argument instantiated

and in both cases the error is the interpreter's, not the
programmer's. Of course, it is not particularly sensible to call
succ(a,b), but it is perfectly well defined.  It is probably a
good idea to have a type-checking option which will print a warning
message if either argument of succ is bound to a non-integer, but
it MUST not abort the computation or cause an exception, it must
just fail.  Type-checking is better done at compile-time ( see
files <Prolog>TYPECH.PL and <Prolog>PROLOG.TYP at SU-SCORE ).

Now we come to functor/3.  The only thing that the principle
doesn't tell us is what is to be done with integers (or floating
point numbers, or data base references, or whatever).  The DEC-10
and C-Prolog rule is

        functor(X, X, 0) :- atomic(X).

so that for example functor(9, 9, 0) is true.  This convention
greatly simplifies term hacking.  So we imagine a table

        functor(0, 0, 0).
        functor(654987, 654987, 0).
        functor('', '', 0).
        functor(asdflkjhfdsa, asdflkjhfdsa, 0).
        functor(A+B, +, 2).
        ...

Since functor(Term, never_before, 1297) may ( in implementation
terms) create a new functor never before seen, it is clear that
this table cannot be confined to the functors that appear in the
program.  There is another predicate for that, called current_functor,
which depends on the current state of the program.  Taking this
together with the principle tells us what functor(Term,F,N)
must do:

        Term atomic => unify F=Term, N=0
        Term compound => unify F=function symbol, N=arity
        Term unbound, F unbound => apologise and abort
        Term unbound, F compound => fail
        Term unbound, F atomic but not an atom => unify N=0, Term=F
        Term unbound, N bound but not a non-negative integer => fail
        Term unbound, F atom, N unbound => apologise and abort
        Term unbound, F atom, N non-negative integer =>
                        unify Term=F(_1,...,_N)

Without some sort of principle available to judge by, it would be
difficult to decide what functor(X, 9, N) should do.  With this
principle, it is easy to see that there is exactly one solution,
and we can cheaply find it, so it should succeed and bind X=9, N=0.

A similar analysis can be applied to arg/3 and =../2.  For example,
Term =.. [f,a|_] is "apologise and abort", while 1 =.. [1] is true.

If we have a system which follows the principle, it follows that
any call on one of these evaluable predicates will have one of
three outcomes:

        - the goal is true in the table, and succeeds corretly
        - the goal is false in the table, and fails correctly
        - there is insufficient information for the interpreter
          to tell

In the last case, it would not be correct to fail.  If the
interpreter has enough information to tell that the goal is
false, it should fail with no further ado.  So there might
be true instances of the goal. But the interpreter lacks
the information to find such an instance ( or the patience ),
so it would not be correct to fail either.  That is why
I suggest "apologise and abort".  DEC-10 compiled code does
not abide by this rule.  If there is an uninstantiated variable
in an arithmetic expression, for example, it calmly substitutes
0 ( but doesn't bind the variable ) and continues, though it
does warn you.  "abort" need not be taken too literally: ideally
the interpreter should enter a debugging mode, at the very least
you should be able to inspect the goal stack.

There are lots of things in Prolog that probably won't be in DeuLog
( ho deuteros logos ), but I think functor/arg/univ/succ/plus and
so on probably will be.  They make sense even in a parallel
system,  and are much closer to logic than say I/O is.  So
it is worth  taking some trouble over their definitions.