[comp.lang.prolog] Need style advice

stuart (07/15/88)

From: stuart

I'd like some advice about how to clean up a relation.

foo_value/2 is intended to be a proper, single-valued, function.
Several clauses can be satisfied at the same time, so I'm depending on
the lexical ordering of clauses and the use of cut to give the proper
precedence among multiple satisfiable clauses and enforce a single
value.

  foo_value(X, value_1) :- forced_to_be_1(X), !.

  foo_value(X, value_2) :- relation_one(X, Y), not(forced_to_be_1(Y)), !.
  foo_value(X, value_2) :- relation_two(X, Y), foo_value(Y, value_2), !.

  foo_value(X, value_1) :- relation_one(X, Y), forced_to_be_1(Y), !.
  foo_value(X, value_1) :- relation_two(X, Y), foo_value(Y, value_1), !.

  foo_value(X, value_3).

Is there a better way to get the same effect?  "Better" can mean
more efficient, more aesthetic, closer to purely logical, fewer uses
of cut and not, or whatever metric experienced Prolog users use.  :-)

Some other notes:
  relation_two/2 contains no cycles.  (It is neither reflexive
  nor symmetric, nor are there any longer sequences XrY YrZ ... WrX.)

  forced_to_be_1/1 is a collection of ground literals.

  relation_one/2 and relation_two/2 can satisfy multiple second
  arguments for a fully instantiated first argument.

  It would be nice, but not essential, if both arguments could be
  given as unbound variables.  Normally, I try to satisfy
  foo_value(literal, V).

Stu Friedberg  stuart@cs.rochester.edu  {ames,cmcl2,rutgers}!rochester!stuart

ok@quintus.uucp (Richard A. O'Keefe) (07/16/88)

In article <1988Jul14.231619.22145@cs.rochester.edu> stuart writes:
>  foo_value(X, value_1) :- forced_to_be_1(X), !.
>
>  foo_value(X, value_2) :- relation_one(X, Y), not(forced_to_be_1(Y)), !.
>  foo_value(X, value_2) :- relation_two(X, Y), foo_value(Y, value_2), !.
>
>  foo_value(X, value_1) :- relation_one(X, Y), forced_to_be_1(Y), !.
>  foo_value(X, value_1) :- relation_two(X, Y), foo_value(Y, value_1), !.
>
>  foo_value(X, value_3).

>  forced_to_be_1/1 is a collection of ground literals.
>  relation_two/2 contains no cycles.
>  relation_one/2 and relation_two/2 can satisfy multiple second
>  arguments for a fully instantiated first argument.

Note that the code as written is not steadfast:
	?- foo_value(X, value_3).
will succeed for ANY value of X, even when forced_to_be_1(X).
Remember: a cut at the end of a clause is almost always in the wrong place.

Would it be possible for you to tell us what the relation is supposed to
_mean_, rather than showing us the current version of the code?  For
example, the following values for forced_to_be_1/1 and relation_one/2
	forced_to_be_1(jim).		% "male"
	forced_to_be_1(tom).
	relation_one(tom, jim).		% "sibling"
	relation_one(jim, tom).
	relation_one(tom, mary).
	...
	relation_two(_, _) :- fail.	% "parent"
appears to satisfy the description given, yet both
	relation_one(tom, mary), \+forced_to_be_1(mary)
and	relation_one(tom, jim), forced_to_be_1(jim)
are true, so should foo_value(tom, value_2) or foo_value(tom, value_1)?

Presumably there is some finite number of Xs which you intend to assign
foo_values to.  If that number is not too big, why not just write a
program to generate the table and write the results to a file, then
compile that file?

stuart (07/17/88)

From: stuart

Summary: More information about intended behavior
References: <1988Jul14.231619.22145@cs.rochester.edu>, <170@quintus.UUCP>

> Richard O'Keefe responds to my query with:
> Note that the code as written is not steadfast:
> 	?- foo_value(X, value_3).
> will succeed for ANY value of X, even when forced_to_be_1(X).
> Remember: a cut at the end of a clause is almost always in the wrong place.

Urk.  Yes.  Ok, that's the first problem.  My intent is to use the
value in the head of the first satisfied clause, using value_3 only
when no previous clause can be satisfied.

Your reminder is appreciated, but it would be more useful if I were
also made acquainted with how things *should* be done.

> Would it be possible for you to tell us what the relation is supposed to
> _mean_, rather than showing us the current version of the code?

Yes, my apologies if this description includes irrelevant detail.  I
have a graph with two colors of arcs, and two kinds of colors of
nodes.  foo_value/2 is to give values to nodes based on graph
connectivity and coloring.

The first kind node color distinguishes between "forced_to_be_1" and
otherwise.  If a node is forced_to_be_1, it receives value_1, otherwise
we consider its neighbors.  We look, first, for neighbors from whom X
can inherit value_2.  If no such neighbors can be found, we look for
neighbors from whom to inherit value_1.  If we fond no neighbors from
whom value_2 or value_1 can be inherited, we give X the value_3.

This is what I intended to express through the ordering of clauses.
(1) Is the decision forced, (2) Can we find a value_2, (3) Can
we find a value_1, (4) In all other cases, value_3.

There are two ways for X to inherit a value from a neighbor, and that
is what the pairs of clauses (2&3 and 4&5) were intended to express.

relation_one(X, Y) is satisfied when node X is connected to node Y by
an arc of the first color, and node Y has a particular value of the
second kind of node color.  X receives a value based directly on
the first kind of node coloring of Y.

relation_two(X, Y) is satisfied when node X is connected to a node _
by an arc of the first color, and node _ is connected to node Y by
an arc of the second color.  (The second kind of node coloring is
irrelevant in this case.)  X receives the same value that Y receives.

> For example, the following values for forced_to_be_1/1 and relation_one/2
> [ some clauses deleted ]
> appears to satisfy the description given, yet both
> 	relation_one(tom, mary), \+forced_to_be_1(mary)
> and	relation_one(tom, jim), forced_to_be_1(jim)
> are true, so should foo_value(tom, value_2) or foo_value(tom, value_1)?

foo_value(tom, value_2).  My intent is to try the four steps I gave
above, stopping the first time a step succeeds.  I suppose I should stress
that I want foo_value to succeed exactly *once* in all cases.

> Presumably there is some finite number of Xs which you intend to assign
> foo_values to.  If that number is not too big, why not just write a
> program to generate the table and write the results to a file, then
> compile that file?

Well, if there were only one graph of interest, I'd do that, but this
*is* a (evidently incorrect) program to generate the table!  (Actually,
it's my attempt to express more-or-less declaratively what a C program
is busy doing on a dynamically changing graph structure, but you get
the idea.)

****************

Clearly, I have not gone about things the right way.  My suspicions
that things could be improved motivated my query in the first place.
I frankly hadn't noticed the "steadfast"ness bug.  So, what is the
recommended form for simple decision trees in Prolog?  How do I
obtain the intended behavior?

Unless I receive strong urging to the contrary, I would like to *avoid*
rewriting the clause bodies to something like the following:
  foo_value(X, a) :-   pred_a(X).
  foo_value(X, b) :- \+pred_a(X),   pred_b(X).
  foo_value(X, c) :- \+pred_a(X), \+pred_b(X),   pred_c(X).
  foo_value(X, d) :- \+pred_a(X), \+pred_b(X), \+pred_c(X).
This is "pure", but I find it harder to read and understand.  I suspect
that it's also considerably less efficient.  If not so, please urge otherwise.

Stu Friedberg  {ames,cmcl2,rutgers}!rochester!stuart  stuart@cs.rochester.edu

ok@quintus.uucp (Richard A. O'Keefe) (07/19/88)

In article <1988Jul17.003242.1874@cs.rochester.edu> stuart writes:
>> Richard O'Keefe responds to my query with:
...
>> Remember: a cut at the end of a clause is almost always in the wrong place.
>Your reminder is appreciated, but it would be more useful if I were
>also made acquainted with how things *should* be done.

You put a cut at _precisely_ the point at which it becomes known that
this is the right clause to use:  no sooner, and no later.  You take
care to ensure that _only_ the conditions which are relevant to
determining that this is the right clause are tested before the cut.
As a special case of this, if you have a predicate p/2 where the first
argument is relevant to determining the clause but the second argument
is not, you should inspect/bind the second argument _after_ the cut.

>> Would it be possible for you to tell us what the relation is supposed to
>> _mean_...
>Yes,...  I have a graph...

In that case, foo_value(X, Y) should only succeed when X is (the name of)
a node in the graph.

>I have a graph with two colors of arcs, and two kinds of colors of
>nodes.  foo_value/2 is to give values to nodes based on graph
>connectivity and coloring.

In that case, instead of forced_to_be_1/1, let's start with relations
which directly express what you want to say:

	node_colours(Node, FirstColour, SecondColour)
is to be true when Node is the name of a node in the graph,
FirstColour is the first colour (black or white) which is assigned
to that node, and SecondColour is the second colour (red or green)
which is assigned to that node.

	arc_colour(?From, ?To, ?Colour)
is to be true when From and To are the names of nodes in the graph,
there is an arc From->To in the graph, and Colour is the colour which
is assigned to that arc.  For argument's sake, the arc colours are
yellow "first type" and orange "second type".

If I've understood the specification correctly:

	relation_one(From, To) :-
		arc_colour(From, To, yellow),
		node_colours(To, _, green).

	relation_two(From, To) :-
		arc_colour(From, Link, yellow),
		/* Link may be any colours */
		arc_colour(Link, To, orange).


	/*  1: value_1 is assigned to black nodes  */

	node_value(Node, value_1) :-
		node_colours(Node, black, _).

	/*  2: value_2 is assigned to white nodes
	       which are linked to white nodes by relation_one */

	node_value(Node, value_2) :-
		node_colours(Node, white, _),
		relation_one(Node, Next),
		node_colours(Next, white, _).

	/*  3: value_2 is assigned to white nodes
	       which are linked by relation_two to other value_2 nodes */

	node_value(Node, value_2) :-
		node_colours(Node, white, _),
		relation_two(Node, Next),
		node_value(Next, value_2).

	/*  4: value_1 is assigned to white nodes
	       which are linked by relation_one to black nodes */

	node_value(Node, value_1) :-
		node_colours(Node, white, _),
		relation_one(Node, Next),
		node_colours(Next, black, _).

	/*  5: value_1 is assigned to white nodes
	       which are linked by relation_two to other value_1 nodes */

	node_value(Node, value_1) :-
		node_colours(Node, white, _),
		relation_two(Node, Next),
		node_value(Next, value_1).

	/*  6: value_3 is assigned otherwise  */
		/* deferred */

Let's split this up and unfold relation_one and relation_two.

	node_value(Node, Value) :-
		node_colours(Node, Colour, _),
		node_value_case(Colour, Node, Value).

	node_value_case(black, _, value_1).		/* 1 */
	node_value_case(white, Node, OneOrTwo) :-	/* 2-5 */
		white_node_value(Node, Value),
		!,
		OneOrTwo = Value.
	node_value_case(white, _, value_3).		/* 6 */

	white_node_value(Node, value_2) :-		/* 2 */
		arc_colour(Node, Next, yellow),
		node_colours(Next, white, green).
	white_node_value(Node, value_2) :-		/* 3 */
		arc_colour(Node, Link, yellow),
		arc_colour(Link, Next, orange),
		node_value(Next, value_2).
	white_node_value(Node, value_1) :-		/* 4 */
		arc_colour(Node, Next, yellow),
		node_colours(Next, black, green).
	white_node_value(Node, value_1) :-		/* 5 */
		arc_colour(Node, Link, yellow),
		arc_colour(Link, Next, orange),
		node_value(Next, value_1).

I'm still rather unhappy about this.  Why do you look for a value_2,
chasing all through those orange links, and only if that fails look
for a value_1?  Why not just explore out from the node and pick whichever
you meet first?  (For example, if a white node is immediately adjacent to
a green/black node, and 500 steps away from a green/white node, why is
the green/white one preferable?)  It seems rather odd that 'value_2'
can be inherited _through_ black nodes.  E.g.
	a-(yellow)->b-(orange)->c-(yellow)->d
	a: white, red		[value 2, by /* 3 */]
	b: black, green		[value 1, by /* 1 */]
	c: white, red		[value 2, by /* 2 */]
	d: white, green		[value 3, by /* 6 */]

This is single-valued, and can solve for the Node as well as the Value.
I want to stress that I am not happy with it:  the idea of a "default"
value value_3 bothers me (this can only be eliminated by rewriting the
caller).  What most bothers me is that value_2 is a simple transitive
closure, but value_1 has no simple independent characterisation.

>So, what is the
>recommended form for simple decision trees in Prolog?

This is _not_ a simple decision tree, so the answer I am about to
give isn't really relevant, but here it is, for what it's worth.

	dt(Object, Value) :-			% ROOT
		test_1(Object, Tag),
		dt_1(Tag, Object, Value).

	dt_1(val_1_1, Object, Value) :-		% INTERNAL NODE
		test_1_1(Object, Tag),
		dt_1_1(Tag, Object, Value).
	...
	dt_1(val_1_n, Object, Value) :-
		test_1_n(Object, Tag),
		dt_1_n(Tag, Object, Value).

	dt_1_2_1_2(val_1_2_1_2_1, Object, Value) :-	% LEAF
		value_1_2_1_2_1(Object, Value).
	...
	dt_1_2_1_2(val_1_2_1_2_m, Object, Value) :-
		value_1_2_1_2_m(Object, Value).

The root looks like dt/2.  The internal nodes look like dt_1/3.  Each
test_*/2 is a function which classifies the Object and returns a Tag
saying which case it found.  The leaf nodes look lik d1_1_2_1_2/3.
Each value_*/2 is a function which computes the appropriate value for
that leaf.  Typically, no cuts at all are needed.