[comp.lang.prolog] Query Evaluation in Deductive Databases

laurent@ecrcvax.UUCP (Laurent Vieille) (06/12/90)
As a follow-up to several recent mails discussing query evaluation
in deductive (recursive) databases, this note discusses the two major
approaches to address this issue, and shortly presents the EKS-V1
deductive database system that has been running since recently.




I) ISSUES AND SOLUTIONS IN DEDUCTIVE QUERY ANSWERING:
=====================================================
The issue of answering deductive (recursive) queries in database
systems is manyfold (we will illustrate these aspects over the
classical academic example):

	anc(X,Y) <- father(X,Y).
	anc(X,Y) <- father(X,Z), anc(Z,Y).

A) COMPLETENESS and TERMINATION of query evaluation - even in case of
cyclic data ! - This is always possible in the Datalog case.

B) LOGICAL EFFICIENCY. By this, I mean that the evaluation has to
focus on the data relevant to the query. In the case of "anc(X,b)",
the evaluation should essentially consider only the descendants of
"b". This corresponds to the heuristics "push selections first" in 
(relational) database systems.

C) PHYSICAL EFFICIENCY. The size of the database stored in secondary
storage, the peculiarities of IO with secondary storage explain the
need for specialized techniques to perform, for instance, joins.
Further, type-checking and consultation of the database schema can be
done at compile-time in a database system, rather than dynamically at
query evaluation time.

My understanding is that this is why database implementors 1) prefer
to think in a set-oriented manner: this permits the implementation of,
eg, sort-merge joins; 2) are critical about using,  as the low-level
engine for a database system, a simple connection of a file system to
the WAM - at least in its current form.


FORWARD-CHAINING vs BACKWARD-CHAINING approaches:
=================================================
During the last couple of years, two different approaches to these
issues have been pursued: 1) improving forward-chaining methods 2)
improving backward-chaining methods.  Curiously, it seems that the
first approach is better-known in the logic programming community than
the second one. Let us comment somewhat on both on them.


FORWARD-CHAINING methods a priori fail on point (B) (in general, they
will compute everything ...). Hence, the general framework advocated
by this approach: 1) rely on a bottom-up evaluator (eg semi-naive
evaluation), 2) rewrite queries and rules in a set of rules which,
when evaluated by this bottom-up evaluator, correctly address the
point (B) above.

Magic Sets is the best-known of these rewriting methods. (See in
particular the NAIL!  project - Stanford -, the LDL project - MCC -
and the Aditi project - Melbourne -). I will not discuss them any
further here as they have been recently discussed on the net several
times (see also Ullman's paper quoted below).



The BACKWARD-CHAINING approach has had to address the two other issues
(Completeness and Termination/Physical efficiency), along the two
following lines:

1. Algorithmic aspects: the issue is here to design formal algorithms
based on backward chaining that guarantee termination and completeness
of the evaluation - at least for Datalog. This issue has received many
similar solutions, all based on a quasi-identical dynamic programming
scheme: answer a given subgoal only once and re-use its answers for
all the occurrences of this subgoal. These solutions include: OLDT and
SLD-AL resolution - Extension tables - QSQ's methods - Generalization
of parsing methods (Earley) - memoization, etc ... 

Other issues such as coordinating the resolution of answers and
subqueries, or various optimizations (eg tail-recursion optimization)
have been less discussed in papers but are being worked at in the
corresponding prototypes (see below).

It should be stressed that these algorithms guarantee termination and
completeness whatever (local) selection function is applied to choose
subgoals out of the body of a clause. As an example, consider the
query 'anc(X,b)'. Using these methods, the query evaluator can first
choose the literal 'anc(Z,b)' (out of the body of the recursive
rule) without entering into an infinite loop. This property is crucial
to solve issues (A) and (B) together.

	(indeed, if 'father(X,Z)' is selected first out of the recursive 
	rule, as when using Prolog left-to-right order, then the 
	whole database is actually consulted - and the evaluation 
	scheme badly fails on point (B) ).



2. Engineering aspects: implementation techniques for this dynamic
programming scheme first include WAM-based work (eg the XWAM of DSD
Warren - NACLP'89) - which enhance the WAM engine with a tabulation
technique implementing the dynamic programming scheme mentioned above.

Another approach, more database-oriented, consists in re-using
compilation and data manipulation techniques coming from database
systems. These techniques are now used for backward-chaining
evaluation of deductive queries over large amounts of data. An example
is our work on the DedGin* system (Lefebvre/Vieille - DOOD'89 - Kyoto,
December 1989). In this system, all the manipulation of data is made
by the database subsystem using join/difference algorithms especially
implemented on the file system (see below).

Formally, the rationale behind this second kind of engineering
solutions is that backward chaining is not bound to the Prolog/WAM
depth-first search of the SLD tree (i.e. corresponding to nested-loop
joins), but that other search strategies can be used (corresponding,
for instance, to sort-merge joins).

Engineering-wise, the long-term goal of this research is to develop 
efficient compilation/optimization techniques of logic queries over 
large databases. There is still quite a lot to do !



WORKS COMPARING THE TWO APPROACHES:
===================================

Rather than to give an exhaustive list of references to these
works (I am bound to forget some ...), I will just point to a
couple of recent papers discussing the relationship between
(and/or the equivalence of) forward-chaining methods with
rewriting and backward-chaining methods (further references can
be found there - or ask us). Indeed, the field is now mature
enough for these comparisons to be fruitful:


Bry F. 	:	"Query Evaluation in Recursive Databases:
		Bottom-up and Top-down Reconciled" - 	DOOD'89

Miyazaki N. :	"Distribution of Selections: The Missing Link between
		Strategies for Relational and Deductive Databases" - DOOD'89

Seki H.	: 	"On the Power of Alexander Templates" - PODS'89

Ullman J.D. :	"Bottom-up beats Top-down"	-	PODS'89

	DOOD = Conference on Deductive and Object-Oriented Databases.
	PODS = Conference on the Principles Of Database Systems (ACM).



To clarify my position with regard to the above discussion, let me add
that I have investigated (and implemented) backward-chaining methods.  
See for instance:

Vieille L. :	"Recursive Query Processing: the Power of Logic"
		Theoretical Computer Science, Vol 69 (1), 1989, 1-53







II) EKS-V1: A DEDUCTIVE DATABASE SYSTEM:
========================================

I also would like take advantage of this note to publicize the
EKS-V1 system (EKS = ECRC Knowledge base management System),
which was recently demonstrated at international Database
Conferences (EDBT -Extending Database Technology - Venice,
Italy, March 1990; ACM-SIGMOD, Atlantic City, USA, May 1990).

EKS-V1 is essentially a deductive database system and offers
extensive declarative deductive capabilities over a
(secondary-storage based) data base:

1) Its declarative (without cut ..)  rule language is
essentially an extension of Datalog and includes: 
	full recursion, 
	negation,
	universal/existential quantifiers, 
	aggregates, 
	evaluable predicates. 
Recursion and aggregates can be mixed, allowing for instance the
'bill-of-materials' query to be expressed in a compact form.

2) It handles and enforces ANY INTEGRITY CONSTRAINT over the
explicit and/or implicit database (i.e. a constraint is any
closed formula referring to base and/or virtual -
recursive/aggregate - predicates).  The system will then reject
any update or transaction violating these constraints.

To help the user writing high-level transactions, EKS-V1
provides sophisticated primitives for PRE/POST-CONDITIONAL
UPDATES and for HYPOTHETICAL REASONING ('assume' primitive).

The core inference engine of EKS-V1 is a rule evaluator deriving
from the DedGin* prototype mentioned above: rules given to the
database are NOT compiled into WAM-like code, but onto
set-oriented operations to be evaluated by the database
subsystem at query evaluation-time.



The user interacts with EKS-V1 from within a Prolog-based
environment (MEGALOG, developed at ECRC by Jorge Bocca et al)
whose memory-management design - full garbage collection -
allows a safe interaction with the fact base. The file system
used is the BANG multi-dimensional file system (developed at
ECRC by Mike Freeston et al).

EKS-V1 should be seen as one of the results of a more global
project, called EKS, which aims at designing and developing a
fully-fledged Knowledge-Base Management System (including in
particular semantic modelling aspects, ..).



More information about these systems can be obtained from us.



Laurent Vieille


ACKNOWLEDGMENTS: Alexandre Lefebvre made comments on early
versions of this note. The works and arguments presented here
benefited from many discussions with our fellow researchers of
the KB group at ECRC.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
ECRC,
Arabellastr. 17
8000 Munich 81, West-Germany
Tel: (+49) 89 92 699 117
FAX: (+49) 89 92 699 170

E-Mail:

USA     laurent%ecrc.de@pyramid.com        

EUROPE  laurent@ecrc.de