[comp.lang.prolog] Duplicate Solutions

patch@hpldola.HP.COM (Pat Chkoreff) (03/14/90)

Occasionally I have written a predicate that yields duplicate solutions, and
I've wondered how to make each solution occur uniquely.  For example, I have
a multigraph:

	node(X).             % X is a node.
	edge(E, X, Y).       % E is a directed edge from node X to node Y.

I write a predicate:

	joined(X, Y).        % There exists an edge from node X to node Y.

Initially I write:

	joined(X, Y) :-
		edge(_, X, Y).

But this version has duplicate solutions when there are multiple edges from
X to Y.  The most straightforward way I have been able to avoid this is:

	joined(X, Y) :-
		bagof(E, edge(E,X,Y), _).

The use of bagof/3 seems wasteful.  I could use setof/3 instead and think of
the set of edges as a "hyperedge" from X to Y, but I don't really need to
compute the hyperedge -- just determine the existence of one.

Usually, the existence of duplicate solutions makes me change either my data
model or my algorithms to avoid them.  But in this example, my data model is
straightforward, joined/2 seems like a natural thing to have, and duplicate
solutions bother me (I'm not sure why, though).  What would you do to avoid
duplicate solutions, or would you even bother?


Patrick Chkoreff   719-590-5983
{hpfcla,hplabs}!hpldola!patch

finin@prc.unisys.com (Tim Finin) (03/15/90)

In article <11500023@hpldola.HP.COM>, patch@hpldola (Pat Chkoreff) writes:
>Occasionally I have written a predicate that yields duplicate solutions, and
>I've wondered how to make each solution occur uniquely.  For example, I have
>a multigraph: ...
>
>Usually, the existence of duplicate solutions makes me change either my data
>model or my algorithms to avoid them.  But in this example, my data model is
>straightforward, joined/2 seems like a natural thing to have, and duplicate
>solutions bother me (I'm not sure why, though).  What would you do to avoid
>duplicate solutions, or would you even bother?

You are correct, I think, in looking to change your data model and/or
algorithm to avoid generating duplicate solutions when they are not
appropriate.  Other than that, there is no good way to do this in a
pure logic programming language. In practice, this is usually done
with a cut or some related operator, such as if-then-else.  You might
use one of the following approaches, for example:

	joined(X,Y) :- edge(_,X,Y),!.

	joined(X,Y) :- edge(_,X,Y) -> true ; fail.

	joined(X,Y) :- once(edge(_,X,Y))
	once(P) :- P,!.

All of these have problems, however, if you call joined/2 with
uninstantiated arguments and expect it to generate a stream of
answers.  To support this use, you could use bagof/setof (or the
equivalent) or, as you suggest, revise your representation scheme.
The latter is preferable in my mind.  
-- 
 Tim Finin                                   finin@prc.unisys.com
 Center for Advanced Information Technology  215-648-2840, 215-648-2288 (fax)
 Unisys, PO Box 517, Paoli, PA 19301         215-386-1749 (home)

ken@aiai.ed.ac.uk (Ken Johnson) (03/16/90)

In article <11500023@hpldola.HP.COM> patch@hpldola.HP.COM (Pat Chkoreff) writes:
>Occasionally I have written a predicate that yields duplicate solutions, and
>I've wondered how to make each solution occur uniquely.  For example, I have
>a multigraph:
>
>	node(X).             % X is a node.
>	edge(E, X, Y).       % E is a directed edge from node X to node Y.


Instead, try

	n_edges(X, Y, [E1,E2,E3...]).

E1 is a directed edge that goes from X to Y; so are E2, E3 etc.

	n_edges(X, Y, []).

unnecessary, but would mean that X and Y are not connected. Then

	joined(X,Y) :-
		n_edges(X,Y,[_|_]).

True if X and Y are joined.


-- 
Ken Johnson, AI Applications Institute, 80 South Bridge, Edinburgh EH1 1HN
E-mail ken@aiai.ed.ac.uk, phone 031-225 4464 extension 212
`I have read your article, Mr Johnson, and I am no wiser now than when I
started'.  -- `Possibly not, sir, but far better informed.'

correll@sun.acs.udel.edu (Sharon J Correll) (03/16/90)

In article <11500023@hpldola.HP.COM> patch@hpldola.HP.COM (Pat Chkoreff) writes:
>Occasionally I have written a predicate that yields duplicate solutions, and
>I've wondered how to make each solution occur uniquely.  For example, I have
>a multigraph:
>
>	node(X).             % X is a node.
>	edge(E, X, Y).       % E is a directed edge from node X to node Y.
>
>Initially I write:
>
>	joined(X, Y) :-
>		edge(_, X, Y).


How about:

	joined(X,Y) :-
		node(X),
		node(Y),
		an_edge(X,Y).

	an_edge(X,Y) :- edge(_,X,Y), !.

Calling "node" first will instantiate X and Y and then produce exactly one
edge between them.  Two problems with this are:

	1) Both joined(X,Y) and joined(Y,X) could succeed, which
	   probably isn't what you want if they graph is undirected.

	2) All nodes will be examined whether they have edges or not,
	   which isn't terribly efficient. 
-- 
---\  Sharon Correll                                   \---------------
----\  University of Delaware                           \--------------
-----\  Academic Computing and Instructional Technology  \-------------
------\  correll@sun.acs.udel.edu                         \------------

ok@goanna.oz.au (Richard O'keefe) (03/16/90)

In article <11500023@hpldola.HP.COM>,
patch@hpldola.HP.COM (Pat Chkoreff) writes:
> Occasionally I have written a predicate that yields duplicate solutions, and
> I've wondered how to make each solution occur uniquely.  For example, I have
> a multigraph:
> 	node(X).             % X is a node.
> 	edge(E, X, Y).       % E is a directed edge from node X to node Y.
> 	joined(X, Y).        % There exists an edge from node X to node Y.

The ideal thing, of course, would be to write down directly exactly what
you mean:

	joined(X, Y) :- E^edge(E, X, Y).

and have the system figure it out automatically.  Alas, even in NU Prolog
(where you write "some E edge(E, X, Y)" you still get the duplicates.
setof/3 *is* a good way to do it (bagof/3 won't do, precisely because
it *doesn't* eliminate duplicates):

	joined(X, Y) :-
		setof((X->Y), E^edge(E,X,Y), Arcs),
		member((X->Y), Arcs).

This is a quite general way of computing a projection without
duplicates:
	p(X1, ..., Xn) :-
		setof(*(X1,...,Xn), Y1^...^Ym^q(X1,...,Xn,Y1,...,Ym), L),
		member(*(X1,...,Xn), L).

This does not compute any unwanted answers, and it's even one of the few
cases where a broken setof/3 can get away with yielding L=[].

Don't knock this until you have tried it.  Any method which can handle
queries to joined(_,_) which are not ground is going to have to keep
track of the answers _somehow_.

Another approach is to distinguish between the repeated solutions
somehow, and return a particular representative.   For argument's sake,
let's suppose we add a new predicate
	representative(E)
which is to be true when E is the chosen representative for a group of
edges between the same nodes.  We have the constraint
	edge(E1, X, Y) & representative(E1) &
	edge(E2, X, Y) & representative(E2) -> E1 = E2
Then we can define
	joined(X, Y) :-
		edge(E, X, Y),
		representative(E).

This doesn't look at any solutions of edge/3 which are not relative,
and does not have to store or sort solutions.

Tim Finin wrote
:   You are correct, I think, in looking to change your data model
:   and/or algorithm to avoid generating duplicate solutions when
:   they are not appropriate.
Agreed.

:   joined(X, Y) :- edge(_, X, Y), !.
:   joined(X, Y) :- once(edge(_, X, Y)).
He also points out the problem, which is that these approaches only
work when called with X, Y both ground.

Sharon J Correll wrote
:   How about:
:   joined(X,Y) :- node(X), node(Y), an_edge(X, Y).
:   an_edge(X, Y) :- edge(_, X, Y), !.
which is fine.  This has no unexpected solutions.  The cost issue is
interesting, though.  Assuming that X and Y are both variables (the
worst case), that there are N nodes, E edges, and J solutions of
joined/2, we find

	setof method	cost = O(E + E.lg E + J)
	representative	cost = O(E + J)
	node/node/test	cost = O(N*N + J)

What the coefficients are depends on the Prolog you are using.  (A nice
fast setof wouldn't hurt).

patch@hpldola.HP.COM (Pat Chkoreff) (03/23/90)

Thank you for your responses, both here and by mail.

A number of responses suggested using a cut.  What I failed to specify in my
posting was that cuts, and red ones in particular, are undesirable.  Sharon
Correll's solution, which filters the square of the set of nodes, moves the
cut down into an auxiliary predicate so that it's green from the perspective
of the main predicate.  This is a natural solution, but it still has a cut,
and as Richard O'Keefe pointed out, it has performance problems in certain
call modes.

What's interesting about this problem is that duplicate solutions have no
LOGICAL consequence.  That is why I wondered aloud whether or not I should
care.  Most seem to think that I should, and I agree primarily for aesthetic
reasons.  To me, duplicate solutions are ugly and indicate potential logical
problems.

My solution required the use of bagof (or setof).  Ken Johnson's solution
required pre-grouping the class of edges between any pair of nodes.  Richard
O'Keefe's solution suggested using a distinguished representative of each
class of edges.

All three of these solutions involve the common notion of a partitioned set,
or a set of equivalence classes.  Therefore, the best solution is to
represent the concept of partitioned sets in a purely logical manner.  This
concept is extremely useful -- I'm using it for lots of different things in
my current work.  In a partitioned set, each "element" (e.g., edge) resides
in a single "class" (e.g., hyperedge between nodes), and each class contains
zero or more elements.  It is easy to constrain the classes to be nonempty,
which is appropriate for my directed acyclic graph example.

This scheme allows one to:

    1.  Determine the class to which an element belongs.

    2.  Recurse over the elements of a class.

All of this can be done without using cuts, leaving choice points, or
requiring the use of setof/3 and the like.  And if you don't mind using
assert and retract, it's possible to transfer the elements of a class to
another class in constant time.  To support such updates in pure logic, I
would have to use a structural representation instead of a relational one,
and that's farther than I'm willing to go.  This, by the way, is an
unpleasant dichotomy -- what's going on in higher-order logic programming
these days?

When you get right down to it, I'm using doubly-linked "lists" which are
represented using relations.  The use of dual links is NOT just an
implementation trick, however -- it is essential for expressing the
constraint that every element is in exactly one class.  Without dual links,
I would have to resort to setof/3 to express this constraint, and it would
be slow.  Once again, efficiency and pure logic demonstrate their affinity
for one another, as O'Keefe is so fond of noting (:-).

Enough talk.  When I get an example ready, I'll post it.


Patrick Chkoreff   719-590-5983
{hpfcla,hplabs}!hpldola!patch