[comp.databases] Dumb Question

davidm@cimshop.UUCP (David Masterson) (06/09/89)

I know I'm going to kick myself when I find the answer to this, but...

What is the method for doing an outer join in SQL?  For instance, given:

	A 1 2		B 1 2
	  a b		  a a
	  c d		  c c

The outer join of "A.1 = B.1" should be:

		a b a a
		a b _ _
		c d c c
		c d _ _

but how do you express it in standard SQL????

aTdHvAaNnKcSe

David Masterson					(preferred address)
uunet!cimshop!davidm		or		DMasterson@cup.portal.com

jlh@stech.UUCP (Jan Harrington) (06/09/89)

in article <423@cimshop.UUCP>, davidm@cimshop.UUCP (David Masterson) says:
> 
> I know I'm going to kick myself when I find the answer to this, but...
> 
> What is the method for doing an outer join in SQL?  For instance, given:
 Add (*) after the join condition, as in:

select people.name, orders.invoice#, orders.item
from people, orders
where people.id = orders.id (*);

Jan Harrington
Scholastech Telecommunications
husc6!stech!jlh or allegra!stech!jlh

*************************************************************************
	Miscellaneous profundidity:
		"No matter where you go, there you are..."
			Buckaroo Banzai
*************************************************************************

jkrueger@daitc.daitc.mil (Jonathan Krueger) (06/09/89)

In article <423@cimshop.UUCP>, davidm@cimshop (David Masterson) writes:
>I know I'm going to kick myself when I find the answer to this, but...

No, you mustn't, you mustn't!!  Your mother and I thought you were
over your autism!  :-)

>What is the method for doing an outer join in SQL?  For instance, given:

There is none.

NOW will you STOP KICKING YOURSELF???

OK, for better exposition, I'll clip the relevant RTI technical note.

-------------------------cut here-----------------------------------

                       INGRES Technical Note #79
                       =========================

                              Outer Joins
                              -----------

INGRES Version      : all
Operating System    : VMS
Date Written        : 26-May-1987
Date Modified       :

What is an Outer Join?
----------------------

When a relational join is performed,  data  from  either  table  can  be
obtained  only  if  the  joining field(s) contains matching data in both
tables.  There are circumstances, however, when one,  or  both,  of  the
columns to be joined has values that the other column does not have.

In such situations, it is not unusual to require that  a  query  give  a
listing  of  both  matched and non-matched values, along with other data
values from those rows where possible. This is an "outer join".

As an example,  suppose  a  PERSONNEL  table  lists  all  employees.  An
APPRAISAL  table  lists  the names of employees who have been evaluated,
with their ratings.  You want a list of all employees, with ratings  for
those  evaluated, and a "0" rating for those not yet evaluated. For this
outer join, we can refer to the PERSONNEL table as  the  "outer"  table,
and the APPRAISAL table as the "inner" table.

This paper describes two ways to accomplish the outer  join.  The  first
method  involves  the  creation of temporary tables to hold intermediate
results and to store the final outer join. The  second  method  involves
the use of QUEL's aggregate functions, without temporary tables.

Sample Data
-----------

The data for the following discussion are the two tables:
PERSONNEL                     APPRAISAL
------------                  -------------------
| name     |                  |name      |rating|
------------                  -------------------
| Adams    |                  |Adams     |7.5   |
| Brown    |                  |Dixon     |9.2   |
| Campbell |                  -------------------
| Dixon    |
| Estes    |
------------

                                                      Technical Note #79

Outer Joins                                                       Page 2

Method 1 - Using Temporary Tables
---------------------------------

Below is a description of how to do an outer join  in  four  steps.  The
first  column describes the step, the second column shows the results of
that step, the third column shows the QUEL code that performs that step,
and the fourth column shows the SQL code that performs that step.
Description           Result                QUEL            SQL
===========           ======                ====            ===

Make a temporary      TEMP                  range of p      create table
table that has the    -------               is personnel    temp as
unique values of      Adams                 retrieve into   select name
of the join field     Brown                 temp            from personnel p
from the outer        Campbell              (p.name)
table.                Dixon
                      Estes

Perform the join      FINAL                 range of a      create table
of the outer and      -------               is appraisal    final as select
inner tables; put     Adams   7.5           retrieve into   p.name,a.rating
the result in a       Dixon   9.2           final (p.name,  from appraisal a,
table.                                      a.rating)       personnel p
                                            where p.name=   where p.name=
                                            a.name          a.name

Delete from the       TEMP                  range of t      delete from
temporary table       -------               is temp         temp t
the rows that have    Brown                 delete t        where
at least one          Campbell              where t.name=   t.name=
matching row in       Estes                 a.name          any(select
the inner                                                   name from
table. The                                                  appraisal)
temporary table now
contains the values
in the outer table
that have no match
in the inner table.

Append the            FINAL                 append to       insert into
temporary table       -----                 final           final
to the join result    Adams    7.5          (t.name,        select name
table. Use a          Dixon    9.2          rating=0)       rating=0
code or value for     Brown    0.0                          from temp;
the column that has   Campbell 0.0
no value.             Estes    0.0

Let's modify our sample data slightly to show an extension of the  outer
join  example  above.  Suppose  the PERSONNEL table is updated daily. It
always contains all of the currently  active  employees.  The  APPRAISAL
table,  however, is updated monthly and may contain the names of employ-
ees no longer with the company. These names would not be in the  PERSON-
NEL  table.  In the method above, the outer join would contain non-match

                                                      Technical Note #79

Outer Joins                                                       Page 3

names from PERSONNEL, but not the non-match names  from  APPRAISAL.  How
could  we  obtain   the non-match names from APPRAISAL, as well as those
from PERSONNEL? The four steps below describe how to do it. We've  added
one  row to APPRAISAL to show this. The data are: name = Farrell, rating
= 8.3 .
Description           Result                QUEL            SQL
===========           ======                ====            ===

Create a temporary    TEMP                  range of p      create table temp
table containing      ----                  is personnel    (name c20,
all rows from both    Adams    0.0          range of a      rating f4);
tables.               Brown    0.0          is appraisal    insert into temp
                      Campbell 0.0          create temp     select name,
                      Dixon    0.0          (name=c20,      rating=0 from
                      Estes    0.0          rating=f4)      personnel;
                      Adams    7.5          append to       insert into temp
                      Dixon    9.2          temp            select name,
                      Farrell  8.3          (name=p.name,   rating from
                                            rating=0)       appraisal;
                                            append to
                                            temp
                                            (name=a.name,
                                            rating=
                                            a.rating)

Create a second       FINAL                 retrieve into   create table final
table that holds      -------               final (p.name   as select name,
the rows having       Adams    7.5          a.rating)       rating from
the joining field     Dixon    9.2          where p.name=   personnel p,
values in both                              a.name          appraisal a
tables.                                                     where p.name=
                                                            a.name;

Delete the matches    TEMP                  range of t      delete from temp
from the first        -------               is temp         where temp.name=
temporary table.      Brown    0.0          range of f      final.name;
                      Campbell 0.0          is final
                      Estes    0.0          delete t
                      Farrell  8.3          where t.name=
                                            f.name

Append the            FINAL                 append to       insert into final
remaining rows        -------               final           select * from
in the first          Adams    7.5          (v.all)         temp;
temporary table       Dixon    9.2
to the second         Brown    0.0
temporary table.      Campbell 0.0
                      Estes    0.0
                      Farrell  8.3

                                                      Technical Note #79

Outer Joins                                                       Page 4

Method 2 - Using Aggregate Functions
------------------------------------

The QUEL language (unlike SQL) offers an alternative method for perform-
ing  an  outer  join. This method relies on syntax rather than temporary
tables.

With the same data that we used in the previous method, we can do a sim-
ple  outer  join  that  counts  the number of matching rows in the inner
table, and includes a count of 0 for values in the outer table that have
no  match  in  the  inner table. Here is the retrieval and the resulting
table:

range of p is personnel
range of a is appraisal
retrieve (p.name,n=count(a.name by p.name where a.name=p.name))

---------------------
|name       |n      |
|-----------|-------|
|Adams      |1      |
|Brown      |0      |
|Campbell   |0      |
|Dixon      |1      |
|Estes      |0      |
---------------------

If we want to obtain the actual rows, and not just count them,  then  we
have  to use the aggregate function "any". If a retrieved row passes the
qualification, then "any" returns a value of "1". A "0" is  returned  if
the qualification is not passed. If we put a join in the "any" function,
we are, in effect, testing whether a value is shared between two tables.
By  requiring  that  the  function return a "0", we are asking for those
values not shared by the two tables. If we  combine  that  qualification
with  the  usual  join  syntax, we obtain the rows that do have a shared
value, as well as those that do not - an outer join.

There is one extra step required to use this method. A dummy row must be
added  to the inner table to provide values for the data when no joining
values are found. These can be codes or standard values  that  you  have
decided represent "missing data".

Here is the QUEL syntax for performing the outer join:

                                                      Technical Note #79

Outer Joins                                                       Page 5

range of p is personnel
range of a is appraisal
append to appraisal (name=" ",rating=0.0)
retrieve (p.name,a.rating) where (p.name=a.name) or
 ((any(p.name by p.name where p.name=a.name) = 0) and a.name= " ")

The result of this retrieve is the following table:

+----------------------+
|name         |rating  |
|-------------|--------|
|Adams        |7.5     |
|Brown        |0.0     |
|Campbell     |0.0     |
|Dixon        |9.2     |
|Estes        |0.0     |
+----------------------+

First, the retrieve does a standard join by the "where  (p.name=a.name)"
syntax.  Then,  rows  in PERSONNEL that do not have a match in APPRAISAL
are included because of the "any...= 0" syntax. When they are  included,
they  are  joined  to  the dummy row in APPRAISAL by the "and a.name..."
syntax so the dummy values can be included for the data fields.

Summary
-------

An outer join is a join of two tables in a database in which the  values
of  the joining field(s) do not always occur in both tables. Outer joins
can be performed in INGRES using QUEL or SQL  code  to  build  temporary
tables  that  will hold intermediate data and the final outer join. QUEL
is capable of performing an outer join without temporary  tables,  using
the aggregate function "any" .

                                                      Technical Note #79
-------------------------cut here-----------------------------------

Hope this helps.

-- Jon
--

mike@blipyramid.BLI.COM (Mike Ubell) (06/10/89)

In article <1118@stech.UUCP> jlh@stech.UUCP (Jan Harrington) writes:
>in article <423@cimshop.UUCP>, davidm@cimshop.UUCP (David Masterson) says:
>> 
>> I know I'm going to kick myself when I find the answer to this, but...
>> 
>> What is the method for doing an outer join in SQL?  For instance, given:
> Add (*) after the join condition, as in:
>
>select people.name, orders.invoice#, orders.item
>from people, orders
>where people.id = orders.id (*);
>
>Jan Harrington
>Scholastech Telecommunications
>husc6!stech!jlh or allegra!stech!jlh

This is only true in Oracle (I think).  The ANSI sql 1986 standard does
not define an outer join.  The proposed sql2 standard has a very differnet
syntax.  Something like:
	select...
	form A left join B on <predicate>
	where

norm@oglvee.UUCP (Norman Joseph) (06/13/89)

From article <106@blipyramid.BLI.COM>, by mike@blipyramid.BLI.COM (Mike Ubell):
# In article <1118@stech.UUCP> jlh@stech.UUCP (Jan Harrington) writes:
#>in article <423@cimshop.UUCP>, davidm@cimshop.UUCP (David Masterson) says:
#>> 
#>> What is the method for doing an outer join in SQL?  For instance, given:
#>
#> Add (*) after the join condition, as in:
#>
#>select people.name, orders.invoice#, orders.item
#>from people, orders
#>where people.id = orders.id (*);
# 
# This is only true in Oracle (I think).

The last time I used Oracle, version 4.<something> on an HP 925 last fall,
the magic join symbol was (+), not (*), but still used in the way described
by Jan above.  I cannot say that this is standard syntax across different
SQL implementations.  In fact, (cynic that I am :-) I doubt it.
-- 
Norm Joseph - Oglevee Computer System, Inc.
  UUCP: ...!{pitt,cgh}!amanue!oglvee!norm
    /* you are not expected to understand this */

jkrueger@daitc.daitc.mil (Jonathan Krueger) (06/19/89)

In article <1591@munnari.oz.au>, kemp@munnari (David Kemp) writes:
>I have been disappointed at the complete lack of discussion about deductive
>databases in this news group.  I realize they are yet to hit the commercial
>world (except in the form of Prolog front ends to existing relational
>database products), but huge amounts of research is being done in the area.
>So why don't any of you researchers use the net?

Researchers, their peers, tenure committees, grant reviewers, and
employers don't consider USENET a technical or academic forum.  In
fact, they don't consider it at all.  A moment's calm thought should
convince anyone that they're quite right to think so.  But nothing
stops anyone from reading the literature.  You just won't find it in
comp.databases.  Usually.

>I wish to give a "deductive database" type solution [to outer joins]
>	answer(X, Y, nil, nil) :- relA(X, Y).
>	answer(X, Y, X, Z) :- relA(X, Y), relB(X, Z).

This syntax is indeed simpler.  However, no one ever claimed SQL has a
good syntax.  And you've left unspecified how Z gets instantiated or
not.  As a prolog dilettante, this has always been a source of great
mystery to me.  For instance, does it matter if rules are written in
the opposite order?  If so, the result is not the union of the results
of the two rules, but rather a bayes function.

>Using the data given in David's posting, the first rule would give:
>	a b _ _
>	c d _ _
>and the second rule would give
>	a b a a
>	c d c c

Seems to me the first rule would give
	a b
	c d

No?  Actually, checking through David's example, I see

> 	A 1 2		B 1 2
> 	  a b		  a a
> 	  c d		  c c
> 
> The outer join of "A.1 = B.1" should be:
> 
> 		a b a a
> 		a b _ _
> 		c d c c
> 		c d _ _

Isn't that the cartesian product?  Isn't the outer join rather:

 		a b a a
 		c d c c

To amplify the point, let's add another row:

 	A 1 2		B 1 2
	  a b		  a a
	  c d		  c c
	  e f		  g h

Is not the outer join of "A.1 = B.1":

 		a b a a
 		c d c c
		e f _ _
		_ _ g h

>Those of you who are familiar with Prolog may scream that Prolog is slow
>since it will use a very naive "loop join" of relA and relB.

No, we know it can't be any slower than temporary tables.  But it
would be comparable to performance of (nonstandard) SQL engines with
support for outer joins.  In general the syntax that lets you express
clearly and concisely what you want is likely to achieve efficient
implementation.

>However, the new deductive databases being developed can execute this
>query just as fast (if not faster) as any SQL based product.

Of course.  Better yet, they're not mutually exclusive, any more than
use of awk excludes use of sed on textual data.

>The outer joins in [RTI's technical note] are expressed as follows:
>Example 1. (The case where we are not interested in entries in appraisal
>that do not have corresponding personnel entries.)
>	answer(X, Y) :- personnel(X), appraisal(X, Y).
>	answer(X, 0) :- personnel(X), not some Y appraisal(X, Y).
>("some Y" means "there exists a Y")

So, which prolog is this with negation :-) ?  Seriously, "not exists"
seems as awkward to express in prolog as outer joins are in standard
SQL.  Prolog's limits appear to be straightforward consequences of the
expressive power of Horne clauses, however.  SQL limits appear to be
casualties of poor language design, unrelated to the relational model.
Although hardly to SQL's credit, this implies we might fix outer joins
for SQL more easily than add negation to prolog.  I'd be interested in
seeing the full example in prolog using cut and fail, and comparing
that syntax to the typical SQL superset for the same.

>Why have I posted this?  Is it an attempt to justify all the work we
>have been putting into developing one of these monsters?  Are
>deductive databases just another neat idea destined to never make much
>impact in the real world?

Well, I'd say the interest is there.  Database programmers need better
tools.  It just takes time to make them available off-the-shelf, to
evaluate them, and to relate them to existing problems and solutions.
Remember where we were only ten years ago.  Or where most systems and
applications still are.

-- Jon
--

hughes@math.Berkeley.EDU (Eric Hughes) (07/22/89)

In article <423@cimshop.UUCP>, davidm@cimshop (David Masterson) writes:
>	A 1 2		B 1 2
>	  a b		  a a
>	  c d		  c c
>
>The outer join of "A.1 = B.1" should be:
>
>		a b a a
>		a b _ _
>		c d c c
>		c d _ _

If I remember this correctly, what you have specified is a left handed
outer join.  For a full outer join, you also need the lines

		_ _ a a
		_ _ c c

Admittedly, a one-sided outer join is more useful.

Eric Hughes
hughes@math.berkeley.edu   ucbvax!math!hughes

kemp@munnari.cs.mu.oz (David Kemp) (07/22/89)

From article <423@cimshop.UUCP>, by davidm@cimshop.UUCP (David Masterson):
> What is the method for doing an outer join in SQL?  For instance, given:
> 
> 	A 1 2		B 1 2
> 	  a b		  a a
> 	  c d		  c c
> 
> The outer join of "A.1 = B.1" should be:
> 
> 		a b a a
> 		a b _ _
> 		c d c c
> 		c d _ _
> 
> but how do you express it in standard SQL????
> 

jkrueger@daitc.daitc.mil (Jonathan Krueger) answered David Masterson's
question by posting INGRES Technical Note #79. (Article: <538@daitc.daitc.mil>)

I wish to give a "deductive database" type solution.

I have been disappointed at the complete lack of discussion about deductive
databases in this news group.  I realize they are yet to hit the commercial
world (except in the form of Prolog front ends to existing relational
database products), but huge amounts of research is being done in the area.
So why don't any of you researchers use the net?

Now to the point of my posting.
Without trying to claim that languages based on relational calculus, or
symbolic logic are any better than SQL, I am posting a solution to
David's question which uses a deductive database language (just a pure
form of Prolog with some logical extensions).

The outer join of "A.1 = B.1", as defined by David Masterson, is expressed
using just two rules.  Those of you who are not familiar with Prolog, may
not understand what follows,  but believe me, it is far simpler than
explicitly creating temporary relations as suggested in the INGRES
Technical Note #79.
My solution:
	answer(X, Y, nil, nil) :- relA(X, Y).
	answer(X, Y, X, Z) :- relA(X, Y), relB(X, Z).

The result is the union of the results from the two rules. Using the data
given in David's posting, the first rule would give:
	a b _ _
	c d _ _
and the second rule would give
	a b a a
	c d c c

Those of you who are familiar with Prolog may scream that Prolog is slow
since it will use a very naive "loop join" of relA and relB.  However,
the new deductive databases being developed can execute this query just
as fast (if not faster) as any SQL based product.
A deductive database would actually execute the same relational algebra
operations as a relational one with a SQL front end.

The outer joins in Jonathan Krueger's posting are expressed as follows:
Example 1. (The case where we are not interested in entries in appraisal
that do not have corresponding personnel entries.)
	answer(X, Y) :- personnel(X), appraisal(X, Y).
	answer(X, 0) :- personnel(X), not some Y appraisal(X, Y).

("some Y" means "there exists a Y")

Example 2. (For cases where appraisal may have entries without corresponding
personnel entries and we want those included in the answer)...
	answer(X, Y) :- appraisal(X, Y).
	answer(X, 0) :- personnel(X), not some Y appraisal(X, Y).

By the way, this is only just the tip of the ice-burg.  Deductive databases
allow recursive rules and some will even allow structured data (non-first
normal form) and even nested relations.  This allows simple solutions to
problems like the "Correlated Update Problem" (posted to comp.databases by
carl.pedersen@dartmouth.edu (Carl Pedersen) on 24 May 89 article <13626@dartvax.Dartmouth.EDU>).

Why have I posted this?  Is it an attempt to justify all the work we have
been putting into developing one of these monsters?  Are deductive databases
just another neat idea destined to never make much impact in the real world?
------------------------------
David B. Kemp
University of Melbourne
Parkville 3052
Australia

e_mail: kemp@cs.mu.oz.au
------------------------------