[net.lang.prolog] PROLOG Digest V4 #38

PROLOG-REQUEST@SU-SCORE.ARPA (Chuck Restivo, The Moderator) (08/15/86)
PROLOG Digest            Friday, 15 Aug 1986       Volume 4 : Issue 38

Today's Topics:
                     Puzzle - Lemmas & Searches,
                    Implementation - Parsing `->'
----------------------------------------------------------------------

Date: 13 Aug 86 12:45:36 GMT
From: David Sherman <mnetor!lsuc!dave@seismo.css.gov>
Subject: Lemmas

Mario O. Bourgoin writes:

>If Prolog refused to solve for goals already encountered,
>this problem would not exist. A hash table of  calls  would
>provide  an efficient  solution  to  the  problem  of  whether
>a state has been met already given a consistent  encoding  of
>variable  names.  Can  someone provide me with a reason other
>than efficiency for why this is not done?

I'd be interested in hearing the answer to this too. I believe
this is what Kowalski describes as "lemmas" in Logic for Problem
Solving - that is, whenever a goal is resolved (to either yes or
no), that fact can be recorded.

I suppose it might be possible to program this into Prolog
without modifying Prolog itself - i.e., with predicates. Has
anyone  done this?  (It's of interest to me for the income tax
analysis system I'm working on.)

One obvious problem that I can see is what to do with assert and
retract. For the lemma-system to be correct, it would have to be
able to figure out the implications of calls to assert and
retract, *including* how these affect predicates several calls
away. For example:

        tired(today) :- toomuch(netnews, yesterday).
        toomuch(netnews, Day) :-
                read(netnews, Minutes, Day),
                Minutes > 120.
        read(netnews, 180, yesterday).

        ?- tired(today).
        yes.

        retract(read(netnews, 180, yesterday)).

        ?- tired(today).

If the implementation of retract has to search the entire
database and start making changes to what's been put into the
lemma hash table, the whole point of the system (improving
efficiency) could be defeated.

Comments?

-- David Sherman

------------------------------

Date: Thu, 14 Aug 86 15:15:38 pdt
From: Allen VanGelder <avg@diablo>
Subject: Exponential searches in the Farmer problem

Mario Bourgoin made a very interesting observation in his summary of
the Farmer problem.

> The number of repetitions has an upper bound of 2^58 as near > as I
can calculate. The 58 was obtained by counting the > number of legal
second states produced by Prolog using > linked/2 and subtracting the
total number of legal > states from the result. Can someone
substantiate this answer?

This is not correct.  2^58 is the number of nodes in a balanced BINARY
tree of DEPTH 58.  The search tree for "Farmer" is neither balanced
nor binary, nor is its depth 58.  However, the size of the search tree
IS exponential in the number of legal states.  The analysis is a bit
involved.  Many readers may wish to skip to "-------" at this point.

The general rule is that if the search has b branches at each node and
(always) goes to depth d, then there are about b^d nodes.  In these
path/state problems, the branching factor b corresponds to the number
of states adjacent to a given state, and the depth d corresponds to
the path length, which is bounded by the number of legal states if the
search is restricted to simple paths.  Mario appears to be estimating
b = 58, so it should be in the base, not the exponent.

However, not all searches go to the same depth, so it is not clear
what the correct estimate of d is. Taking the average is WRONG:
consider a search tree that looks like a long list of length d.  Then
b=2 and the average depth is d/2, but the number of nodes is 2d - 1,
which is linear, and NOT the exponential 2^(d/2).

In any event, assume there are 10 DISTINCT legal states, but 58 
apparent states due to redundancy in the way they are specified.  If
the program allows only simple paths, no path is longer than 9, and we
get 58^9 as a crude upper bound.  That can be improved to 43 * 18^9 by
assuming the redundancy factor is always 6.  (See below.)  Well, that
number is still quite unacceptable, and shows the high price of
redundancy.

Without any redundancy in the statement of the 10 legal states, 8 *
3^9 looks likes an upper bound, assuming the (naive) rules

canGo(Goal, Goal, _).  canGo(S1, Goal, Visited) :-
        legal(S2), linked(S1, S2), not member(S2, Visited),
        canGo(S2, Goal, [S2 | Visited]).

(These rules assume "legal" is a "generator" and "linked" is a 
"tester".  If "linked" can be formulated as a "generator", we may be
able to do better.  See below.)

I got the 8 * 3^9 estimate as follows:  from each internal node in the
search, there are 10 branches to try -- 1 per legal state.  In this
problem, exactly 4 are linked to S1, the current state.  Thus 7
branches fail immediately (i.e. within the rule):  6 because they are
not linked to the current state, and (at least) 1 because it goes back
to the state visited just before S1.  (Notice how all the "domain
knowledge" enters the analysis; we are using the knowledge that the
links are symmetric.)  So charge the node 8, 1 for itself and 1 for
each of its predictable failures.  The "true" branching factor is at
most 3.

If legal states are specified redundantly by a factor of 6, then we
multiply both the 7 and the 3 by 6, and get 43 * 18^9, which is about
6^10 times the non-redundant figure.

Actually, Mario's final program is even better than (1+7) * 3^9 
because he is able to make "linked" a "generator" and eliminate the
"legal" subgoal.  Thus, the estimate is (1+1) * 3^9 because he
generates 4 LINKED states and fails on (at least) 1.  Thus careful
programming saves a factor of 4 right there.

While the search tree is actually much smaller because of the regular
structure of "linked", the bound of 2 * 3^9 is about 40,000.  This
illustrates the danger that search algorithms will get out of hand.
Recall that if there are d states and the branching factor is b, we
get an estimate of b^d nodes.  This is huge for moderate sized d, even
if b is only 2 or 3.

To achieve a quantum jump in (guaranteed) efficiency, we need to 
switch to a depth first search (or breadth first search) that 
remembers ALL the visited nodes, not just the ones on the currently
successful path.  While this can easily be done (for bounded degree
graphs) in linear time in a language with arrays and assignments, in
Prolog it is not so easy, and may take quadratic time.  If you use
"assert" to remember, it depends on how fast the interpreter can
retrieve an asserted fact, something over which the user has no
control.  Even so, d^3 is less than 2^d for problems beyond "toy"
size.

------------------------------

Date: 13 Aug 1986 07:20-EST
From: Saumya Debray <debray%suny-sb@csnet-relay>
Subject: Parsing Prolog's ->

I got the following message from Fernando Pereira, which I
think is a reasonable defense for the way Prolog's -> is
currently parsed:

From: Fernando Pereira <pereira%sri-stinson.arpa>
Subject: Manipulating Programs with `->'

The explanation in terms of cut was just a "crutch".
Maybe a better explanation is in terms of a guarded
command. The goal

        ( C1 -> G1; ... ; Cn -> Gn )

is analogous to the Dijkstra guarded command

        [ C1 -> G1 [] ... [] Cn -> Gn ]

and also to the Lisp COND

        (COND ((C1) G1) ... ((Cn) Gn))

The alternative syntax makes the goal analogous to

[ C1 -> G1 [] true -> [ C2 -> G2 [] true -> ...
                        [ Cn -> Gn ]...]

which is a different beast altogether. The latter
reading is fatally sequential, whereas the former
has at least a potential for parallel evaluation
of the guards.

-- Fernando

------------------------------

End of PROLOG Digest
********************