[net.lang.prolog] More About bagof

OKeefe.R.A.%EDXA@sri-unix.UUCP (10/19/83)

From:  Richard HPS (on ERCC DEC-10) <OKeefe.R.A.@EDXA>

A certain puzzle solution in this Digest contained the following
rather odd goal:

        bagof(M, M^(x(M, Z), nonvar(Z), Z = N), L)

[BTW: that program is an *excellent* debugging tool for Prolog
implementors.  If your assert and retract implementations stand
up to amazing thrashing this program gives them, they'll stand
up to any amount of normal use.  End of insult.  End of tip.]

One of the odd things is just the x(M,Z), nonvar(Z), Z=N bit
itself.  N is always instantiated to an integer, so x(M,Z),
Z==N would be exactly what is wanted.  However, the x(M,Z)
facts in the data base always have M an integer, and Z either
an integer or a variable.  Being a variable indicates "not yet
assigned", and writing x(M,unbound) for that case would (a)
eliminate the var/nonvar tests in the program and (b) let this
goal be written as just x(M,N).

But back to bagof.  If you haven't got a copy of the 15-Dec-81
manual, here are pages 51 & 52, retyped:

setof(X, P, S)

Read this as "S is the set of all instances of X such that P is
provable, where that set is non-empty".  The term P specifies a
goal or goals as in call(P).  S is a set of terms represented as
a list of those terms, without duplicates, in the standard order
for terms [this is defined elsewhere in the manual, and I'm not
going to type that in].  If there are no instances of X such
that P is provable the predicate *fails*.

The variables appearing in the term X should not appear anywhere
else in the clause except within the term P.  Obviously, the set
to be enumerated should be finite, and should be enumerable by
Prolog in finite time.  It is possible for the provable
instances to contain variables, but in this case the list S will
only provide an imperfect representation of what is in reality
an infinite set.  [There is no real need for the variables in X
to be absent from the rest of the clause.  If a variable is
bound, all that means is that its binding will be copied into
the instances in the list S.  If a variable is not bound, it
will still be unbound after the call to setof, just like 'not'.
And I don't know that I'd call a non-ground term an "imperfect
representation" of an infinite set.]

                 [*** This is the key paragraph ***]

If there are uninstantiated variables in P which do not also
appear in X, then a call to this evaluable predicate may
**backtrack**, generating alternative values for S corresponding
to different instantiations of the free variables of P.  (It is
to cater for such usage that the set S is constrained to be
non-empty.)  For example, the call

        ?- setof(X, X likes Y, S).

might produce two alternative solutions via backtracking:

        Y = beer,  S = [dick, harry, tom]
        Y = cider, S = [bill, jan, tom]

(X remains unbound in both cases).  The call

        ?- setof((Y,S), setof(X, X likes Y, S), SS).

would then produce

        SS = [(beer,[dick,harry,tom]), (cider,[bill,jan,tom])]

                 [*** This paragraph explains ^ ***]

Variables occurring in P will not be treated as free if they are
explicitly bound within P by an existential quantifier.  An
existential quantification is written

        Y^Q

meaning "there exists a Y such that Q is true", where Y is some
Prolog variable.  For example,

        ?- setof(X, Y^(X likes Y), S).

would produce the single result

        X = [bill, dick, harry, jan, tom]

in contrast to the earlier example.

bagof(X, P, Bag)

This is exactly the same as setof except that the list (or
alternative lists) returned will not be ordered, and may contain
duplicates.  The effect of this relaxation is to save
considerable time and space in execution.  [This may be
misleading.  There are three components to the cost of setof,
and two to bagof.  Both of them enumerate all solutions like
findall.  That is one component.  Both then do a sort to bring
together solutions belonging to the same alternative (bindings
for free variables in P).  That is the second component.  setof
then does another sort for each list of alternatives.  So you
save O(N.lgN) time and space by using bagof instead of setof.
But that is less than 50% of the cost of bagof.  When there are
NO free variables in P, bagof and setof don't do the first sort,
so the saving for this common case can be considerable.  Also
bear in mind that the cost of sorting is strongly implementation
dependent; it could take O(N) space.]

The interpreter recognises this as meaning "there exists an X
such that P is true", and treats it as equivalent to call(P).
The use of this explicit existential quantifier outside the
setof and bagof constructs is superfluous.

                       [End of manual extract.]

What is odd about bagof(M, M^(x(M, Z), nonvar(Z), Z=N), L) is
thus that a variable which ought NOT to be explicitly quantified
IS (M), and that a variable which SHOULD be (Z) ISN'T.  How come
it works ?

To see why quantifying a variable that appears in the template
doesn't confuse Prolog at all while it confuses people badly,
you'll have to look at the code I published here recently.  To
see why not quantifying Z does no harm, you have only to realise
that all instances of Z are unified with the same *integer* N,
so that bagof does a findall and a sort, and then discovers that
all of the solutions have the same value for Z.  If the goal had
been instead

        bagof(M, Z^(x(M, Z), nonvar(Z), Z=N), L)

bagof would just have done a findall.

Now I don't mean to imply that the author of the program in
question doesn't understand the X^P notation perfectly well.
This may be just the kind of typing mistake we all make,
which just increased the running time without producing wrong
answers, and could easily go unnoticed for 20 years.  But
there seem to be too many Prologs that lack bagof, and I
wouldn't want anyone to take this particular goal as an example
to be imitated.  By the same token, there could well be similar
mishaps in some of my utilities, and I would be grateful if
someone spotted them while I still have access to this net.

One thing which the manual should mention and doesn't is that
when you compile a DEC-10 Prolog program that uses setof, bagof,
or for that matter findall, it often stops working.  The problem
is that the interface between compiled and interpreted code is
mostly one-way:

compiled code can get at any interpreted code, either by using
call or just by calling a predicate that happens not to be
interpeted.  But the interpreter can only get at compiled code
which has :-public declarations.  And call is the interpreter.
So the generators in

        setof(Template, Generator, Set)
        bagof(Template, Generator, Bag)
        findall(Template, Generator, List)

and the goals in

        \+ Goal
        not Goal
        once(Goal)              % once(X) :- call(X), !.
        forall(Generator, Test) % forall(G,T) :- \+ (G, \+ T).

are handled by the *interpreter*, and MUST have :-public declarations.
Omitting these declarations is a very common mistake.  You don't learn
to avoid it, but you learn to look for it when your compiled program
doesn't work but it did when interpreted.  If you haven't got a
compiler, the problem doesn't arise, and there are a couple of other
compilers than DEC-10 Prolog which use a different interface between
compiled and interpreted code (in fact they are just incremental
compilers), and again the problem doesn't arise.