[comp.lang.prolog] Please help me with T-joins

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (02/20/91)
I'm trying to implement the inner T-joins as Prolog predicates
on lists, and I have a problem.  I don't quite understand what
the non-strict inner T-joins are supposed to do.  Help!

%   File   : tjoin.pl (VERY EARLY DRAFT)
%   Author : Richard A. O'Keefe
%   Updated: 20-Feb-91
%   Defines: tjoin/4 (the inner T-joins from Codd's RM/V2).
%   Copyright (C) 1991 Richard A. O'Keefe

/*  The purpose of this file is to implement the four "inner T-join"
    operations defined in 
	The Relational Model for Database Management/Version 2
	E. F. Codd
	Addison-Wesley 1990
	ISBN 0-201-14192-2
    in section 5.7, pages 123--136.

    Letting S(...A...) and T(...B...) be relations with compatible 
    orderable columns A and B, there are four inner T-joins:
	S [[A < B]] T		-- currently implemented
	S [[A > B]] T		-- not here, but I know how to do it
	S [[A =< B]] T		-- not yet understood
	S [[A >= B]] T		-- not yet understood

    There is to be a predicate
	t_join(+Ordering, +S, +T, -Result)
    where
	Ordering is one of {<, >, =<, >=}
	S and T are lists of Key-Value pairs "suitably ordered"
	Result will be a list of Spair+Tpair pairs (that is,
	each element of Result will be (KeyS-ValS)+(KeyT-ValT)).

    What is "suitably ordered"?  In Codd's definition of inner T-join,
    you need a total order on the tuples in a relation.  The tuples
    are ordered using '<' on the comparison columns (so S is ordered
    on A and T is ordered on B), and some other part of the tuples is
    used as a "tie breaker" for tuples with equal values of A (or B).
    The representation I use here is one where the comparison columns
    have been "promoted" out to the Key position, so that if we had a
    relation s(Id,Age,Experience,Pay) and wanted to use Experience as
    comparison column we'd use Experience-s(Id,Age,Experience,Pay) as
    the Prolog representation.  "Suitably ordered" means that the
    elements of the lists are in non-decreasing order of Key, and that
    two list elements with the same Key value are in the right order
    with respect to the "tie breaker".  (In effect, we are defining
    the tie breaker to be position in the list.)

    The Value fields of S and T "just go along for the ride"; they
    are not examined in any way and need not be ground.  The Key
    fields _will_ be term-compared and should be ground.

    This file implements the '<' and '>' cases.  The '>' case
    reverses the lists so that the ordering (including tie breaking)
    is inverted.  Note that there is NO simple relationship between
    S [[A < B]] T	-- t_join(<, S, T, V) -- and
    T [[B > A]] S	-- t_join(>, T, S, W) -- as the test shows.
    The '=<' and '>=' cases are not implemented because I don't yet
    understand them:  I believe that the description 130--134 of
    RM/V2 is incomplete.  (If there is a key `k' which occurs once
    in S and twice in T, should I pick up the first or the last
    instance from T?)

    Why am I providing a file that implements only two of the four
    operations?  In the hope that someone who *does* understand
    the [[ =< ]] and [[ >= ]] versions of inner T-join will be able
    to explain to me (in terms of the representation used here) how
    they work.
*/

t_join(<, S, T, J) :-
	t_join_less(S, T, J).
t_join(>, S, T, J) :-
	rev(S, [], Sr),
	rev(T, [], Tr),
	t_join_greater(Sr, Tr, [], J).

t_join_less([], _, []).
t_join_less([X|S], T, J) :-
	t_join_less(T, X, J, S).

t_join_less([], _, [], _).
t_join_less([Y|T], X, J, S) :-
	key_compare(R, X, Y),
	t_join_less(R, T, J, S, X, Y).

t_join_less(<, T, [X+Y|J], S, X, Y) :-
	t_join_less(S, T, J).
t_join_less(=, T, J, S, X, _) :-
	t_join_less(T, X, J, S).
t_join_less(>, T, J, S, X, _) :-
	t_join_less(T, X, J, S).


t_join_greater([], _, J, J).
t_join_greater([X|S], T, J0, J) :-
	t_join_greater(T, X, J0, J, S).

t_join_greater([], _, J, J, _).
t_join_greater([Y|T], X, J0, J, S) :-
	key_compare(R, X, Y),
	t_join_greater(R, T, J0, J, S, X, Y).

t_join_greater(>, T, J0, J, S, X, Y) :-
	t_join_greater(S, T, [X+Y|J0], J).
t_join_greater(=, T, J0, J, S, X, _) :-
	t_join_greater(T, X, J0, J, S).
t_join_greater(<, T, J0, J, S, X, _) :-
	t_join_greater(T, X, J0, J, S).


key_compare(R, X-_, Y-_) :-
	compare(R, X, Y).

rev([], R, R).
rev([H|T], R0, R) :-
	rev(T, [H|R0], R).


test :-
	/* S, T from page 123 */
	S = [4-k1,6-k2,12-k3,18-k4,20-k5],
	T = [3-m1,5-m2,9-m3,11-m4,13-m5,15-m6],
	t_join(<, S, T, V),
	/* V from page 127 */
	V = [(4-k1) + (5-m2),(6-k2) + (9-m3),(12-k3) + (13-m5)],
	t_join(>, T, S, W),
	/* W from page 128 */
	W = [(11-m4) + (4-k1),(13-m5) + (6-k2),(15-m6) + (12-k3)].

end_of_file.
-- 
Professional programming is paranoid programming