aaron@grad2.cis.upenn.edu (Aaron Watters) (08/14/90)
The various varieties of outerjoins are interesting operations. In particular, they are interesting because they don't act like relational operations in the sense that (r outerjoin s) minus t or (r outerjoin s) join t don't seem to be well defined. To be fanciful, I would hypothesize that the outerjoin lies on the fringes of the standard relational paradigm, and its usefulness indicates useful directions in which the paradigm should be broadened. Comments? What are interesting (simple) examples of the uses for outerjoins? -aaron
swfc@ulysses.att.com (Shu-Wie F Chen) (08/15/90)
In article <28462@netnews.upenn.edu>, aaron@grad2.cis.upenn.edu (Aaron Watters) writes: |>The various varieties of outerjoins are interesting operations. The only reference that I could find on outer joins was in Elmasri and Navathe's Fundamentals of Database Systems (page 164-165). Since I didn't know what they were, I am providing a definition: "The OUTER JOIN operation was developed to take the union of tuples from two relations that are not union compatible. This operation will take the UNION of tuples in two relations that are partially compatible, meaning that only some of their attributes are union compatible. The attributes that are not union compatible from either relation are kept in the result, and tuples that have no values for these attributes are padded with null values..." |>In particular, they are interesting because they don't act like |>relational operations in the sense that |> (r outerjoin s) minus t |>or (r outerjoin s) join t |>don't seem to be well defined. To be fanciful, I would hypothesize I don't see how they are not well-defined given the above definition. |>that the outerjoin lies on the fringes of the standard relational paradigm, |>and its usefulness indicates useful directions in which the paradigm |>should be broadened. Comments? I don't quite see why the outerjoin lies on the fringes of the *standard relational paradigm*. What differentiates it from other operators with respect to the paradigm? It is certainly useful, but adding new operators within the relational model does not broaden the paradigm. |> |>What are interesting (simple) examples of the uses for outerjoins? -aaron Elmasri and Navathe give several examples: 1. Suppose we want a list of all employee names and also the name of the departments they manage *if they happen to manage a department*. The input relations are EMPLOYEE(ENAME, ESSN,...) and DEPARTMENT(DNAME, MGRSSN...). 2. Suppose we want a list of all the people at a school with their name, SSN, and department, their advisor if they are students, and their rank if they are faculty. The input relations are STUDENT(NAME, SSN, DEPARTMENT, ADVISOR) and FACULTY(NAME, SSN, DEPARTMENT, RANK). Cheers, *swfc
aaron@grad2.cis.upenn.edu (Aaron Watters) (08/15/90)
In article <13592@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F Chen) writes: > >I don't quite see why the outerjoin lies on the fringes of the *standard >relational paradigm*. What differentiates it from other operators with >respect to the paradigm? > The outerjoin(s) introduce null values. If you treat null values as you would any other value then then the outerjoin doesn't cause any problems. More intelligent systems (for example Informix -- since this is the only manual I have handy) treat null values as a special `unknown' value and, justifiably, do not define certain relational operations on relations that include nulls in certain circumstances. Consider a single relation Parental(parent,child) where either parent or child is allowed to be null whenever the parents or children for an individual aren't known. Using datalog type notatoin the natural way to find grandparents is with the rule Grandparent(Person) if Parental(Person,Parent) and Parental(Parent,Child). But this will give the wrong answer if we treat nulls as ordinary values, what you need, apparently, is CertainGparent(Person) if Parental(Person,Parent) and Parental(Parent,Child) and Parent<>NULL. which gives you people who are certainly grandparents, or alternatively MaybeGparent(Person) if Parental(Person,NULL). MaybeGparent(Person) if Parental(Person,Parent) and Parental(Parent,Child). which gives you people who might well be grandparents. Since the first rule is dangerous and misleading in the presence of NULLs, some relational systems require the user to explicitly handle NULLs, as in the latter two examples (correctly, I think). If anyone is interested in extensions to the relational approach that deal with nulls in a more automatic and uniform manner, let me know. -aaron (PS: Apologies to those not used to datalog-type notations. Also, this problem occurs if we interpret NULLs in other ways.)
tom@eds.com (Tom H. Meyer) (08/16/90)
In article <28462@netnews.upenn.edu> aaron@grad2.cis.upenn.edu.UUCP (Aaron Watters) writes: >The various varieties of outerjoins are interesting operations. >In particular, they are interesting because they don't act like >relational operations in the sense that > (r outerjoin s) minus t >or (r outerjoin s) join t >don't seem to be well defined. To be fanciful, I would hypothesize >that the outerjoin lies on the fringes of the standard relational paradigm, >and its usefulness indicates useful directions in which the paradigm >should be broadened. Comments? > >What are interesting (simple) examples of the uses for outerjoins? -aaron I recommend you get a hold of this year's SIGMOD proceedings. Arnon Rosenthal and Cesar Galindo-Legaria presented a paper, "Query Graphs, Implementing Trees, and Treely-Reorderable Outerjoins" which addresses your questions directly. To summarize, outerjoins are useful when you wish to see all the fields values of the join keys whether they matched something in the other table or not. As an example, perhaps a query which joined classes to students might not display all the classes if they had no students enrolled in them. If this is not acceptable, an outerjoin will solve the problem. A good introduction to outerjoin theory can be found in C.J. Date, "The Outer Join", Proc @nd International Conf on Databases (ICOD-2), Cambridge, England, Sept. 1983 There are many types of outer joins and Date introduces them all in this paper. However, the "left" and "right" outerjoins (which some SQL implementations denote: A.a *= B.a and A.a =* B.a) have some nasty properties like non-associativity and distributivity. This makes query optimization of outerjoins extremely hard. Dr. Rosenthal presents a complete algebra of these outerjoins including a characterization of when they are well behaved. Hope this helps tom meyer, EDS Research | If I don't see you in the future tom@edsr.eds.com or ...uunet!edsr!tom | I'll see you in the pasture
davidm@uunet.UU.NET (David S. Masterson) (08/16/90)
In article <28462@netnews.upenn.edu> aaron@grad2.cis.upenn.edu (Aaron Watters) writes: The various varieties of outerjoins are interesting operations. In particular, they are interesting because they don't act like relational operations in the sense that (r outerjoin s) minus t or (r outerjoin s) join t don't seem to be well defined. To be fanciful, I would hypothesize that the outerjoin lies on the fringes of the standard relational paradigm, and its usefulness indicates useful directions in which the paradigm should be broadened. Comments? Read Codd's new book on RMV2 before saying that outer-joins are on the fringes of the relational model. The outer join also goes far beyond just the outer equijoin to include all operators (<, >, !=, etc). What are interesting (simple) examples of the uses for outerjoins? When dealing with object-orientation in a relational database, normalization of the database may tend to break up an object into several tables. Not all of these tables may be filled in for an object. Trying to put the object back together at any point in time will be best done with an outer equijoin (based on the key of the object). -- ==================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mtn. View, CA 94043 ==================================================================== "If someone thinks they know what I said, then I didn't say it!"
garyp@cognos.UUCP (Gary Puckering) (08/21/90)
In article <28462@netnews.upenn.edu> aaron@grad2.cis.upenn.edu.UUCP (Aaron Watters) writes: >What are interesting (simple) examples of the uses for outerjoins? -aaron Examples are abundant in real data processing applications. Almost any application where there are master files and transaction files will yield outer join queries, because it is often the case that some masters will not have any transactions but must still be included in the processing. Consider a time reporting system where Employees work on Projects and Time Transactions are generated for the hours worked. The database might look like this: EMPLOYEES ( EMP_NO, EMP_NAME, ... ) PROJECTS ( PROJECT_NO, PROJECT_NAME, ... ) TIME_TXN ( EMP_NO, PROJECT_NO, HOURS ) One of the application requirements is to produce a report which shows the hours worked on each project. A simple join query would do the trick: SELECT PROJECT_NAME, SUM(HOURS) FROM PROJECTS P, TIME_TXN T GROUP BY PROJECT ORDER BY PROJECT WHERE P.PROJECT_NO = T.PROJECT_NO Unfortunately, this query will not report on those projects which have no time transactions in the current reporting period. In applications like this one it is often very important to report, for example, that Project 123 had no activity. Users sometimes get upset when they can't find what they are looking for. The simplest way to address this requirement is to use an outer join for this query, keeping the unmatched rows in the PROJECTS table. Without the outer join operation, it would be necessary to write this request as a parameterized query on TIME_TXN nested within a query on PROJECTS. For such a simple application, this is considerably more work -- and might not be executed very efficiently by the database engine. An advantage to having an outer join operation is that applications that need them can express them within a single query, thus providing the optimizer with more semantics about the request. If an outer join has to be broken up into two queries, the optimizer can't know what is really going on and therefore can't do a good job of optimizing such situations. This could become even more important if the database is distributed sometime in the future. The outer join I've described here, where unmatched rows in the leftmost table are preserved, seems to be the most useful of the various forms of outer join. The need for preserving unmatched rows in the left-hand table, or in both tables, seems far less prevalent. I can't even think of any good examples off the top of my head. Can anyone else? -- Gary Puckering Cognos Incorporated VOICE: (613) 738-1338 x6100 P.O. Box 9707 UUCP: uunet!mitel!sce!cognos!garyp Ottawa, Ontario INET: garyp%cognos.uucp@uunet.uu.net CANADA K1G 3Z4
aaron@grad2.cis.upenn.edu (Aaron Watters) (08/24/90)
In article <8720@cognos.UUCP> garyp@cognos.UUCP (Gary Puckering) writes: > >The outer join I've described here, where unmatched rows in the >leftmost table are preserved, seems to be the most useful of the >various forms of outer join. The need for preserving unmatched rows >in the left-hand table, or in both tables, seems far less >prevalent. I can't even think of any good examples off the top of >my head. Can anyone else? Well, if you have two tables (say Faculty and Student) sharing a common key (SS#) and representing the same kind of object (person) but having different attribute sets ([ss#, department, status] and [ss#, year, phone] respectively]) you might want to create a `union' of these tables, listing all faculty and students. However, if we assume that ss# identifies an individual uniquely you might want any entries with the same ss# to be combined. This is a hypothetical example for the `both table' case. -aaron.