[comp.lang.prolog] Compiling away bagof

narain@randvax.UUCP (Sanjai Narain) (06/01/90)

This is in response to Richard O'Keefe's request to outline my approach
for compiling away bagof.  It has been developed in the context of DMOD
(Declarative MODeling) a technique for modeling dynamic systems.  The
optimization essentially consists of compiling away OR-parallelism when
the rules are deterministic.  A DMOD model of a dynamic system consists of
rules, roughly, of the form:

	causes(E,F) if Body.

where E and F are event templates, each of the form f(A1,..,Am,T), where T
denotes a real-valued, non-negative time instant.  Informally, the rule
means that event E causes event F provided Body. An example is:

	causes(falls(Domino,T),falls(NextDomino,T+delay)) if
		successor(Domino,NextDomino).

Event occurrences are (roughly) computed by repeatedly using the rule:

	occurs(F) if initial(F) or causes(E,F) and occurs(E).

Given, that e has occurred, an important problem is to compute the bag of
all events g such that causes(e,g). Suppose we have the following rules:

	causes(f(A1,..,Am,T),F1) if B1
	..
	..
	causes(f(A1,..,Am,T),Fk) if Bk

***where each Ai,T is a variable***.  Suppose f(a1,..,am,t) has occurred
where each ai, and t is ground.  Then, one can compute all the effects of
it using:

	bagof(F,causes(f(a1,..,am,t),F),Bag).

Given the overhead of calling bagof, and that this call must be made after
each event occurrence, and that the number of event occurrences can be
tens of thousands, this approach would be extremely slow.  We can do quite
a lot better.

Provided each of these causality rules is deterministic, i.e. produces at
most one effect for a given cause, we can collapse all of these rules into
a single one:

	brings_about(f(A1,..,Am,T),List) if
		cond(B1,F1,L1),
		..
		..
		cond(Bk,Fk,Lk),
		append_all([L1,..,Lk],List).

Here each Li is a variable.  Now brings_about returns a list of caused
events.  The ith call to cond calls Bi and if successful, binds Li to
[Fi].  Otherwise it binds Li to []. append_all appends all of these lists.
The definition of cond is:

	cond(Body,Event,List):-Body,!,List=[Event].
	cond(Body,Event,[]).

Now, to compute the list of all events caused by f(a1,..,am,t) one can
type:

	brings_about(f(a1,..,am,t),List).

No bagof is needed. For our current model, this is about 20 times faster.

LIMITATION: It is entirely possible that a causes rule be non-deterministic,
e.g. in:

	causes(falls(Domino,T),falls(NextDomino,T+delay)) if
		successor(Domino,NextDomino).

successor(A,B) could succeed more than once for a given value of A.  Then
we have to apply such a rule in all possible ways for a given causing
event.  This could be done by redefining the first rule for cond to:

	cond_1(Body,Event,List) :- bagof(Event,Body,List).

This reintroduces bagof. I am not sure how to eliminate this without an
intricate analysis of the program. Any suggestions would be greatly
appreciated.

My current solution is to have the user declare the event templates for
which all causality rules are deterministic, and then use cond for these,
and cond_1 for the rest.

Regards.

Sanjai Narain

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/02/90)

In article <2567@randvax.UUCP>, narain@randvax.UUCP (Sanjai Narain) writes:
> This is in response to Richard O'Keefe's request to outline my approach
> for compiling away bagof.

Whew!  Had me worried for a minute there, but it _is_ in the book!

> Given, that E has occurred, an important problem is to compute the bag of
> all events G such that causes(E, G).

It isn't clear to me why bagof/3 is being used here.  What is the
significance of the rule ordering?  If we have
	causes(ying, tong) :- true.
	...
	causes(ying, iddle) :- true.
why does it _matter_ that the bag look like [...,tong,...,iddle,...]
rather than [...,iddle,...,tong,...]?  Again, suppose we have
	causes(kill(X,Y), dies(Y)) :- true.
	causes(kill(X,Y), dies(X)) :- good_police, death_penalty.
and the event kill(john,john) occurs.  (We are to suppose that john
succeeds in killing himself, and that he is caught, convicted, and
sentenced to death for murder.)  Why is it important to have dies(john)
appear twice in the bag rather than just once?

Absent any reason for recording multiple references to the same event,
it would appear to be appropriate to use setof/3 rather than bagof/3.
Indeed, it is almost _always_ appropriate to use setof/3 rather than
bagof/3, both for theoretical reasons and for practical reasons.
Theoretical reasons:  see Chapter 4 "Why Duplicate Rows are Prohibited"
of C.J.Date's "Relational Database writings 1985--1989".
Practical reasons:  if you want to combine the results of an all-
solutions predicate with other such collections, it is much more
efficient to use set operations based on merging, which need the result
to be sorted anyway, so you might as well let setof/3 do it.

> Suppose we have the following rules:
> 	causes(f(A1,..,Am,T), F1) :- B1.
> 	..
> 	..
> 	causes(f(A1,..,Am,T), Fk) :- Bk.
> ***where each Ai,T is a variable***.  Suppose f(a1,..,am,t) has occurred
> where each ai, and t is ground.  Then, one can compute all the effects of
> it using:
> 	bagof(F, causes(f(a1,..,am,t),F), Bag).

> Provided each of these causality rules is deterministic, i.e. produces at
> most one effect for a given cause, we can collapse all of these rules into
> a single one:

> 	brings_about(f(A1,..,Am,T), List) :-
> 		cond(B1, F1, L1),
> 		..
> 		..
> 		cond(Bk, Fk, Lk),
> 		append_all([L1,..,Lk], List).

(1) We don't need the restriction to A1,...,Am,T being variables in
    the original rules.  We _do_ need the restriction to the call
    having ground arguments a1,...,am,t.
(2) We don't need append_all/2 (called append/2 in library(lists)).

	brings_about(f(A1_0,...,Am_0,T_0), L_1) :-
		(   A1_1 = A1_0, ..., Am_1 = Am_0, T_1 = T_0, B1 ->
		    L_1 = [F_1|L_2]
		;   L_1 = L_2
		),
		...
		(   A1_k = A1_0, ..., Am_k = Am_0, T_k = T_k, Bk ->
		    L_k = [F_k]
		;   L_k = []
		).

    Each L_{i}\L_{i+1} is a list difference pair.  Assuming a Prolog
    compiler that open-codes if->then;else (as they all should...) this
    also avoids the overhead of building a copy of each B_i and passing
    through call/1.

> LIMITATION: It is entirely possible that a causes rule be non-deterministic,
> e.g. in:
> 
> 	causes(falls(Domino,T), falls(NextDomino,T+delay)) :-
> 		successor(Domino, NextDomino).
> 
> successor(A,B) could succeed more than once for a given value of A.  Then
> we have to apply such a rule in all possible ways for a given causing
> event.  This could be done by redefining the first rule for cond to:

> 	cond_1(Body,Event,List) :- bagof(Event,Body,List).	/* MISTAKE */

Er, if you have a DEC-10-compatible implementation of bagof/3, this isn't
going to do what you think it's going to do.  If the Body has any variables
which are strictly local to the Body (i.e. in the original
	causes(Cause, Effect) :- Body
if there are any variables in the Body which appear neither in Cause nor
Effect) then bagof/3 is going to say "ah hah, these are free variables,
the caller wants to know their bindings, I shall enumerate bindings for
these free variables and for each binding list only the Events that
corresond to that binding."  In fact, what you want here, if indeed you
want bagof-like behaviour at all, is findall/3.  There's another obvious
problem which is avoided by using findall/3.

	cond_1(Body, Template, Events) :-
		findall(Template, Body, Events).	/* CORRECTED */

-- 
"A 7th class of programs, correct in every way, is believed to exist by a
few computer scientists.  However, no example could be found to include here."

narain@randvax.UUCP (Sanjai Narain) (06/05/90)

In article <3129@goanna.cs.rmit.oz.au>, ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) writes:
> 
> Whew!  Had me worried for a minute there, but it _is_ in the book!
> 

Nicht verstanden.

> It isn't clear to me why bagof/3 is being used here.

Efficiency.  I quote from the Quintus Prolog manual (v. 2.5, p. 213) "this
relaxation (that lists are not ordered) saves considerable time and space
in execution".

Merging of sorted lists is not done.  Each item in the list returned by
bagof is merged into a leftist tree.  Thus, the overhead of sorting does
not appear to be necessary.

> (1) We don't need the restriction to A1,...,Am,T being variables in

Perhaps not.  It is there just to keep the compiler simple.

> 
>     Each L_{i}\L_{i+1} is a list difference pair.  Assuming a Prolog
>     compiler that open-codes if->then;else (as they all should...) this
>     also avoids the overhead of building a copy of each B_i and passing
>     through call/1.

This is a good suggestion, and I will try to incorporate it.

> 	cond_1(Body, Template, Events) :-
> 		findall(Template, Body, Events).	/* CORRECTED */

Thanks again.

> "A 7th class of programs, correct in every way, is believed to exist by a
> few computer scientists.  However, no example could be found to include here."

Nicht verstanden.