[comp.lang.prolog] semantics of bagof and setof

matsc@sics.se (Mats Carlsson) (09/07/88)

This note is about the operational semantics of bagof/3 and setof/3.

As we all know, a goal
	setof(Y, p(X,Y), L)
yields alternative solutions corresponding to different values of X.
But just what is a "different value"?

The public domain code for bagof(Term, Goal, Solutions) uses the
following method:
  1. compute a tuple K of those variables which are global in Goal,
  2. compute L1 as findall(K-Term, Goal, L1),
* 3. for each member K1-Term1 in L1, if an argument of K1 is unbound, bind it
     to the corresponding argument of the original tuple K,
  4. keysort the K1-Term1 pairs and group them by identical K1 into a list
     Groups of pairs K1-TermList,   
  5. member(K-Solutions, Groups) yields a solution to bagof/3.

This is all well as long as a global variable is either bound to a
ground term or not bound at all in a solution.  But when bound to a
non-ground term, I find the result counterintuitive.  Consider the
program

	p(X, a) :- p(X).
	p(X, b) :- p(X).

	p(_).
	p(f(_)).
	p(f(_,_)).
	p(f(X,X)).

	:- setof(Y, p(X,Y), L).

which yields

	X = _356, L = [a,b] ;
	X = f(_608), L = [b] ;
	X = f(_712), L = [a] ;
	X = f(_554,_554), L = [b] ;
	X = f(_581,_582), L = [b] ;
	X = f(_658,_658), L = [a] ;
	X = f(_685,_686), L = [a] ;

Thus the values of X were considered equivalent for both values of Y
only when X was not bound at all.

I would like to argue for another criterion for treating two instances
of the global variables as equivalent: treat them as equivalent if
they are identical, _modulo renaming of variables_.  This relation
between terms is usually called "variant".  Thus 'p(_10,_11)' is a
variant of 'p(_12,_13)' but not of 'p(_14,_14)'.

Using this criterion, the solutions to the above query becomes

	X = _49, L = [a,b] ;
	X = f(_141), L = [a,b] ;
	X = f(_141,_141), L = [a,b] ;
	X = f(_165,_141), L = [a,b] ;

an answer which I find rather less surprising.

It takes a change in step 3 of the above algorithm to enforce this
behaviour.  Admittedly, it makes the algorithm more expensive.  I have
an implementation which is O(m*n**2) in time and O(n) in space, where
m is the number of proofs of Goal and n is the average number of
variable occurrences in an instance of the "global variables tuple".
I'm sure there are more clever algorithms.

Are there any other snags?

Needless to say, this discussion is only relevant for unsound uses of
bagof/3 and setof/3.