davidm@cimshop.UUCP (David Masterson) (06/09/89)
I know I'm going to kick myself when I find the answer to this, but... What is the method for doing an outer join in SQL? For instance, given: A 1 2 B 1 2 a b a a c d c c The outer join of "A.1 = B.1" should be: a b a a a b _ _ c d c c c d _ _ but how do you express it in standard SQL???? aTdHvAaNnKcSe David Masterson (preferred address) uunet!cimshop!davidm or DMasterson@cup.portal.com
jlh@stech.UUCP (Jan Harrington) (06/09/89)
in article <423@cimshop.UUCP>, davidm@cimshop.UUCP (David Masterson) says: > > I know I'm going to kick myself when I find the answer to this, but... > > What is the method for doing an outer join in SQL? For instance, given: Add (*) after the join condition, as in: select people.name, orders.invoice#, orders.item from people, orders where people.id = orders.id (*); Jan Harrington Scholastech Telecommunications husc6!stech!jlh or allegra!stech!jlh ************************************************************************* Miscellaneous profundidity: "No matter where you go, there you are..." Buckaroo Banzai *************************************************************************
jkrueger@daitc.daitc.mil (Jonathan Krueger) (06/09/89)
In article <423@cimshop.UUCP>, davidm@cimshop (David Masterson) writes: >I know I'm going to kick myself when I find the answer to this, but... No, you mustn't, you mustn't!! Your mother and I thought you were over your autism! :-) >What is the method for doing an outer join in SQL? For instance, given: There is none. NOW will you STOP KICKING YOURSELF??? OK, for better exposition, I'll clip the relevant RTI technical note. -------------------------cut here----------------------------------- INGRES Technical Note #79 ========================= Outer Joins ----------- INGRES Version : all Operating System : VMS Date Written : 26-May-1987 Date Modified : What is an Outer Join? ---------------------- When a relational join is performed, data from either table can be obtained only if the joining field(s) contains matching data in both tables. There are circumstances, however, when one, or both, of the columns to be joined has values that the other column does not have. In such situations, it is not unusual to require that a query give a listing of both matched and non-matched values, along with other data values from those rows where possible. This is an "outer join". As an example, suppose a PERSONNEL table lists all employees. An APPRAISAL table lists the names of employees who have been evaluated, with their ratings. You want a list of all employees, with ratings for those evaluated, and a "0" rating for those not yet evaluated. For this outer join, we can refer to the PERSONNEL table as the "outer" table, and the APPRAISAL table as the "inner" table. This paper describes two ways to accomplish the outer join. The first method involves the creation of temporary tables to hold intermediate results and to store the final outer join. The second method involves the use of QUEL's aggregate functions, without temporary tables. Sample Data ----------- The data for the following discussion are the two tables: PERSONNEL APPRAISAL ------------ ------------------- | name | |name |rating| ------------ ------------------- | Adams | |Adams |7.5 | | Brown | |Dixon |9.2 | | Campbell | ------------------- | Dixon | | Estes | ------------ Technical Note #79 Outer Joins Page 2 Method 1 - Using Temporary Tables --------------------------------- Below is a description of how to do an outer join in four steps. The first column describes the step, the second column shows the results of that step, the third column shows the QUEL code that performs that step, and the fourth column shows the SQL code that performs that step. Description Result QUEL SQL =========== ====== ==== === Make a temporary TEMP range of p create table table that has the ------- is personnel temp as unique values of Adams retrieve into select name of the join field Brown temp from personnel p from the outer Campbell (p.name) table. Dixon Estes Perform the join FINAL range of a create table of the outer and ------- is appraisal final as select inner tables; put Adams 7.5 retrieve into p.name,a.rating the result in a Dixon 9.2 final (p.name, from appraisal a, table. a.rating) personnel p where p.name= where p.name= a.name a.name Delete from the TEMP range of t delete from temporary table ------- is temp temp t the rows that have Brown delete t where at least one Campbell where t.name= t.name= matching row in Estes a.name any(select the inner name from table. The appraisal) temporary table now contains the values in the outer table that have no match in the inner table. Append the FINAL append to insert into temporary table ----- final final to the join result Adams 7.5 (t.name, select name table. Use a Dixon 9.2 rating=0) rating=0 code or value for Brown 0.0 from temp; the column that has Campbell 0.0 no value. Estes 0.0 Let's modify our sample data slightly to show an extension of the outer join example above. Suppose the PERSONNEL table is updated daily. It always contains all of the currently active employees. The APPRAISAL table, however, is updated monthly and may contain the names of employ- ees no longer with the company. These names would not be in the PERSON- NEL table. In the method above, the outer join would contain non-match Technical Note #79 Outer Joins Page 3 names from PERSONNEL, but not the non-match names from APPRAISAL. How could we obtain the non-match names from APPRAISAL, as well as those from PERSONNEL? The four steps below describe how to do it. We've added one row to APPRAISAL to show this. The data are: name = Farrell, rating = 8.3 . Description Result QUEL SQL =========== ====== ==== === Create a temporary TEMP range of p create table temp table containing ---- is personnel (name c20, all rows from both Adams 0.0 range of a rating f4); tables. Brown 0.0 is appraisal insert into temp Campbell 0.0 create temp select name, Dixon 0.0 (name=c20, rating=0 from Estes 0.0 rating=f4) personnel; Adams 7.5 append to insert into temp Dixon 9.2 temp select name, Farrell 8.3 (name=p.name, rating from rating=0) appraisal; append to temp (name=a.name, rating= a.rating) Create a second FINAL retrieve into create table final table that holds ------- final (p.name as select name, the rows having Adams 7.5 a.rating) rating from the joining field Dixon 9.2 where p.name= personnel p, values in both a.name appraisal a tables. where p.name= a.name; Delete the matches TEMP range of t delete from temp from the first ------- is temp where temp.name= temporary table. Brown 0.0 range of f final.name; Campbell 0.0 is final Estes 0.0 delete t Farrell 8.3 where t.name= f.name Append the FINAL append to insert into final remaining rows ------- final select * from in the first Adams 7.5 (v.all) temp; temporary table Dixon 9.2 to the second Brown 0.0 temporary table. Campbell 0.0 Estes 0.0 Farrell 8.3 Technical Note #79 Outer Joins Page 4 Method 2 - Using Aggregate Functions ------------------------------------ The QUEL language (unlike SQL) offers an alternative method for perform- ing an outer join. This method relies on syntax rather than temporary tables. With the same data that we used in the previous method, we can do a sim- ple outer join that counts the number of matching rows in the inner table, and includes a count of 0 for values in the outer table that have no match in the inner table. Here is the retrieval and the resulting table: range of p is personnel range of a is appraisal retrieve (p.name,n=count(a.name by p.name where a.name=p.name)) --------------------- |name |n | |-----------|-------| |Adams |1 | |Brown |0 | |Campbell |0 | |Dixon |1 | |Estes |0 | --------------------- If we want to obtain the actual rows, and not just count them, then we have to use the aggregate function "any". If a retrieved row passes the qualification, then "any" returns a value of "1". A "0" is returned if the qualification is not passed. If we put a join in the "any" function, we are, in effect, testing whether a value is shared between two tables. By requiring that the function return a "0", we are asking for those values not shared by the two tables. If we combine that qualification with the usual join syntax, we obtain the rows that do have a shared value, as well as those that do not - an outer join. There is one extra step required to use this method. A dummy row must be added to the inner table to provide values for the data when no joining values are found. These can be codes or standard values that you have decided represent "missing data". Here is the QUEL syntax for performing the outer join: Technical Note #79 Outer Joins Page 5 range of p is personnel range of a is appraisal append to appraisal (name=" ",rating=0.0) retrieve (p.name,a.rating) where (p.name=a.name) or ((any(p.name by p.name where p.name=a.name) = 0) and a.name= " ") The result of this retrieve is the following table: +----------------------+ |name |rating | |-------------|--------| |Adams |7.5 | |Brown |0.0 | |Campbell |0.0 | |Dixon |9.2 | |Estes |0.0 | +----------------------+ First, the retrieve does a standard join by the "where (p.name=a.name)" syntax. Then, rows in PERSONNEL that do not have a match in APPRAISAL are included because of the "any...= 0" syntax. When they are included, they are joined to the dummy row in APPRAISAL by the "and a.name..." syntax so the dummy values can be included for the data fields. Summary ------- An outer join is a join of two tables in a database in which the values of the joining field(s) do not always occur in both tables. Outer joins can be performed in INGRES using QUEL or SQL code to build temporary tables that will hold intermediate data and the final outer join. QUEL is capable of performing an outer join without temporary tables, using the aggregate function "any" . Technical Note #79 -------------------------cut here----------------------------------- Hope this helps. -- Jon --
mike@blipyramid.BLI.COM (Mike Ubell) (06/10/89)
In article <1118@stech.UUCP> jlh@stech.UUCP (Jan Harrington) writes: >in article <423@cimshop.UUCP>, davidm@cimshop.UUCP (David Masterson) says: >> >> I know I'm going to kick myself when I find the answer to this, but... >> >> What is the method for doing an outer join in SQL? For instance, given: > Add (*) after the join condition, as in: > >select people.name, orders.invoice#, orders.item >from people, orders >where people.id = orders.id (*); > >Jan Harrington >Scholastech Telecommunications >husc6!stech!jlh or allegra!stech!jlh This is only true in Oracle (I think). The ANSI sql 1986 standard does not define an outer join. The proposed sql2 standard has a very differnet syntax. Something like: select... form A left join B on <predicate> where
norm@oglvee.UUCP (Norman Joseph) (06/13/89)
From article <106@blipyramid.BLI.COM>, by mike@blipyramid.BLI.COM (Mike Ubell):
# In article <1118@stech.UUCP> jlh@stech.UUCP (Jan Harrington) writes:
#>in article <423@cimshop.UUCP>, davidm@cimshop.UUCP (David Masterson) says:
#>>
#>> What is the method for doing an outer join in SQL? For instance, given:
#>
#> Add (*) after the join condition, as in:
#>
#>select people.name, orders.invoice#, orders.item
#>from people, orders
#>where people.id = orders.id (*);
#
# This is only true in Oracle (I think).
The last time I used Oracle, version 4.<something> on an HP 925 last fall,
the magic join symbol was (+), not (*), but still used in the way described
by Jan above. I cannot say that this is standard syntax across different
SQL implementations. In fact, (cynic that I am :-) I doubt it.
--
Norm Joseph - Oglevee Computer System, Inc.
UUCP: ...!{pitt,cgh}!amanue!oglvee!norm
/* you are not expected to understand this */
jkrueger@daitc.daitc.mil (Jonathan Krueger) (06/19/89)
In article <1591@munnari.oz.au>, kemp@munnari (David Kemp) writes: >I have been disappointed at the complete lack of discussion about deductive >databases in this news group. I realize they are yet to hit the commercial >world (except in the form of Prolog front ends to existing relational >database products), but huge amounts of research is being done in the area. >So why don't any of you researchers use the net? Researchers, their peers, tenure committees, grant reviewers, and employers don't consider USENET a technical or academic forum. In fact, they don't consider it at all. A moment's calm thought should convince anyone that they're quite right to think so. But nothing stops anyone from reading the literature. You just won't find it in comp.databases. Usually. >I wish to give a "deductive database" type solution [to outer joins] > answer(X, Y, nil, nil) :- relA(X, Y). > answer(X, Y, X, Z) :- relA(X, Y), relB(X, Z). This syntax is indeed simpler. However, no one ever claimed SQL has a good syntax. And you've left unspecified how Z gets instantiated or not. As a prolog dilettante, this has always been a source of great mystery to me. For instance, does it matter if rules are written in the opposite order? If so, the result is not the union of the results of the two rules, but rather a bayes function. >Using the data given in David's posting, the first rule would give: > a b _ _ > c d _ _ >and the second rule would give > a b a a > c d c c Seems to me the first rule would give a b c d No? Actually, checking through David's example, I see > A 1 2 B 1 2 > a b a a > c d c c > > The outer join of "A.1 = B.1" should be: > > a b a a > a b _ _ > c d c c > c d _ _ Isn't that the cartesian product? Isn't the outer join rather: a b a a c d c c To amplify the point, let's add another row: A 1 2 B 1 2 a b a a c d c c e f g h Is not the outer join of "A.1 = B.1": a b a a c d c c e f _ _ _ _ g h >Those of you who are familiar with Prolog may scream that Prolog is slow >since it will use a very naive "loop join" of relA and relB. No, we know it can't be any slower than temporary tables. But it would be comparable to performance of (nonstandard) SQL engines with support for outer joins. In general the syntax that lets you express clearly and concisely what you want is likely to achieve efficient implementation. >However, the new deductive databases being developed can execute this >query just as fast (if not faster) as any SQL based product. Of course. Better yet, they're not mutually exclusive, any more than use of awk excludes use of sed on textual data. >The outer joins in [RTI's technical note] are expressed as follows: >Example 1. (The case where we are not interested in entries in appraisal >that do not have corresponding personnel entries.) > answer(X, Y) :- personnel(X), appraisal(X, Y). > answer(X, 0) :- personnel(X), not some Y appraisal(X, Y). >("some Y" means "there exists a Y") So, which prolog is this with negation :-) ? Seriously, "not exists" seems as awkward to express in prolog as outer joins are in standard SQL. Prolog's limits appear to be straightforward consequences of the expressive power of Horne clauses, however. SQL limits appear to be casualties of poor language design, unrelated to the relational model. Although hardly to SQL's credit, this implies we might fix outer joins for SQL more easily than add negation to prolog. I'd be interested in seeing the full example in prolog using cut and fail, and comparing that syntax to the typical SQL superset for the same. >Why have I posted this? Is it an attempt to justify all the work we >have been putting into developing one of these monsters? Are >deductive databases just another neat idea destined to never make much >impact in the real world? Well, I'd say the interest is there. Database programmers need better tools. It just takes time to make them available off-the-shelf, to evaluate them, and to relate them to existing problems and solutions. Remember where we were only ten years ago. Or where most systems and applications still are. -- Jon --
hughes@math.Berkeley.EDU (Eric Hughes) (07/22/89)
In article <423@cimshop.UUCP>, davidm@cimshop (David Masterson) writes: > A 1 2 B 1 2 > a b a a > c d c c > >The outer join of "A.1 = B.1" should be: > > a b a a > a b _ _ > c d c c > c d _ _ If I remember this correctly, what you have specified is a left handed outer join. For a full outer join, you also need the lines _ _ a a _ _ c c Admittedly, a one-sided outer join is more useful. Eric Hughes hughes@math.berkeley.edu ucbvax!math!hughes
kemp@munnari.cs.mu.oz (David Kemp) (07/22/89)
From article <423@cimshop.UUCP>, by davidm@cimshop.UUCP (David Masterson): > What is the method for doing an outer join in SQL? For instance, given: > > A 1 2 B 1 2 > a b a a > c d c c > > The outer join of "A.1 = B.1" should be: > > a b a a > a b _ _ > c d c c > c d _ _ > > but how do you express it in standard SQL???? > jkrueger@daitc.daitc.mil (Jonathan Krueger) answered David Masterson's question by posting INGRES Technical Note #79. (Article: <538@daitc.daitc.mil>) I wish to give a "deductive database" type solution. I have been disappointed at the complete lack of discussion about deductive databases in this news group. I realize they are yet to hit the commercial world (except in the form of Prolog front ends to existing relational database products), but huge amounts of research is being done in the area. So why don't any of you researchers use the net? Now to the point of my posting. Without trying to claim that languages based on relational calculus, or symbolic logic are any better than SQL, I am posting a solution to David's question which uses a deductive database language (just a pure form of Prolog with some logical extensions). The outer join of "A.1 = B.1", as defined by David Masterson, is expressed using just two rules. Those of you who are not familiar with Prolog, may not understand what follows, but believe me, it is far simpler than explicitly creating temporary relations as suggested in the INGRES Technical Note #79. My solution: answer(X, Y, nil, nil) :- relA(X, Y). answer(X, Y, X, Z) :- relA(X, Y), relB(X, Z). The result is the union of the results from the two rules. Using the data given in David's posting, the first rule would give: a b _ _ c d _ _ and the second rule would give a b a a c d c c Those of you who are familiar with Prolog may scream that Prolog is slow since it will use a very naive "loop join" of relA and relB. However, the new deductive databases being developed can execute this query just as fast (if not faster) as any SQL based product. A deductive database would actually execute the same relational algebra operations as a relational one with a SQL front end. The outer joins in Jonathan Krueger's posting are expressed as follows: Example 1. (The case where we are not interested in entries in appraisal that do not have corresponding personnel entries.) answer(X, Y) :- personnel(X), appraisal(X, Y). answer(X, 0) :- personnel(X), not some Y appraisal(X, Y). ("some Y" means "there exists a Y") Example 2. (For cases where appraisal may have entries without corresponding personnel entries and we want those included in the answer)... answer(X, Y) :- appraisal(X, Y). answer(X, 0) :- personnel(X), not some Y appraisal(X, Y). By the way, this is only just the tip of the ice-burg. Deductive databases allow recursive rules and some will even allow structured data (non-first normal form) and even nested relations. This allows simple solutions to problems like the "Correlated Update Problem" (posted to comp.databases by carl.pedersen@dartmouth.edu (Carl Pedersen) on 24 May 89 article <13626@dartvax.Dartmouth.EDU>). Why have I posted this? Is it an attempt to justify all the work we have been putting into developing one of these monsters? Are deductive databases just another neat idea destined to never make much impact in the real world? ------------------------------ David B. Kemp University of Melbourne Parkville 3052 Australia e_mail: kemp@cs.mu.oz.au ------------------------------