ecec@ur-tut.UUCP (Eric Carleen) (08/14/86)
I've found that apparently I have a severe misunderstanding of the use of a relational data base, specifically Ingres. I'm sending out a summary of my question because I've found that a number of other people have the same misunderstanding. Suppose that I have 3 tables of names, call them table1, table2, and table3. The names in tables 2 or 3 may or may not be in table1. I would like to delete all those records in table1 that are already listed in the other two tables. If I give the command: delete table1 where table1.name = table2.name or table1.name = table3.name then I get what I was after: entries that match EITHER table2 OR table3 are deleted. unless... table3 is empty. If table3 is empty, then nothing is deleted from table1, regardless of whether or not there are any matching entries in table2. (The work-around is obvious, but I thought unnecessary.) Technical support at Relational Technologies insists that this is the correct way for a deletion to occur. I guess that I need to learn more about relational databases - so would people please mail me names of references that are readable? Thanks much. I want to emphasize that I am generally very happy with Ingres, just surprised at the behavior of this particular delete command. Eric Carleen University of Rochester Medical Center UUCP: {seismo|allegra}!rochester!ur-msbvax!edc Bitnet: heartedc@uorhbv Phone: (716)-275-5391
hoffman@hdsvx1.UUCP (Richard Hoffman) (08/26/86)
Eric Carleen at Univ. of Rochester Medical Center writes [edited slightly]: > Suppose that I have 3 tables of names, call them t1, t2, and t3. > The names in tables 2 or 3 may or may not be in t1. > I would like to delete all those records in t1 that are already listed > in the other two tables. > > If I give the command: > > delete t1 where t1.name = t2.name or t1.name = t3.name > > then I get what I was after: entries that match EITHER t2 OR t3 > are deleted. > > unless... t3 is empty. If t3 is empty, then nothing is deleted > from t1, regardless of whether or not there are any matching entries > in t2. (The work-around is obvious, but I thought unnecessary.) > > Technical support at Relational Technologies insists that this is the > correct way for a deletion to occur. And they are, unfortunately, right. This is because of the way that the relational theory (upon which Ingres is based) works. Relational theory starts with the assumption of sets, called domains, which contain all the legal values for a given attribute. For instance, the domain of sex = {male, female}. The domain of primary colors = {red, blue, yellow}. The domain of the national debt is probably the set containing all positive real numbers taken to two decimal places. Any set may be a domain. A table is defined as the subset of a cartesian product of domain sets. So consider a family whose members are listed in the domain F = {Sara, David, Jon}. Each of these members can be associated with a sex chosen from S = {M, F}. The Cartesian product of FxS is {(Sara, M), (Sara, F), (David, M), (David, F), (Jon, M), (Jon, F)}. However, a table which gave the sexes of the members of these families would be a subset of FxS, namely {(Sara, F), (David, M), (Jon, M)}. This formulation sheds light on a number of practices common in relational data bases, such as the use of domains for guaranteeing integrity (this should be required, but is often allowed to default to meaninglessly broad domains such as "all integers", "all strings", etc.) and the refusal of most RDBs to safeguard order inside a table (the table is just a set, and order is unimportant in a set). It also shows you why the Ingres way of handling your query is correct. What you are really doing when you run your query is forming a cartesian product on the sets represented by t1.name, t2.name and t3.name. Then you are removing the subset of this set in which t1.name = t2.name or t1.name = t3.name. If t3 is empty, it is an EMPTY SET! The cartestian product of any number of sets with an empty set MUST BE EMPTY! So there is nothing to remove. It's not that Ingres doesn't process your query correctly, it's just that it abandons processing as soon as it spots the empty set in the cartesian product. Sort of like a compiler whose optimizer was clever enough to spot that hundreds of operations on x followed by x = 0 meant that the hundreds of operations could be omitted. But the compiler would probably flag this, and RTI probably errs slightly by not pointing out joins involving empty sets. I'm sure you know the two work-arounds: either make t3 non-empty, with a dummy value, or make two queries, so that the emptiness of either t2 or t3 does not effect the deletions based on the other table. This last procedure will probably be more efficient anyway (even for non-empty sets) for sufficiently large tables, because the single query method will require on the order of S1xS2xS3 operations (where Sn is the size in rows of table n), while the double query method will only require on the order of S1x(S2+S3) operations, plus a couple of extraneous file opens and closes. I hear that RTI has really cleaned up their join optimizer in their most recent release, so the performance difference between the two queries may be much smaller than I report. Also, primary and secondary indexes may make a big difference. Still, if the tables are large enough to obviate the extra file opens and closes, you're always safe using the two-query method. -- Richard Hoffman | "Oh life is a wonderful cycle of song, Schlumberger Well Services | A medley of extemporanea. hoffman%hdsvx1@slb-doll.csnet | And Love is a thing that can never go wrong PO Box 2175, Houston, TX 77252 | ... And I am Marie of Roumania." --D. PARKER
avg@navajo.STANFORD.EDU (Allen Van Gelder) (09/06/86)
I disagree with Hoffman's attempted justification of the way Eric Carleen's query is handled. Having a calculus-based language loses its point if the user has to know how the system is going to translate it to relational algebra, possibly incorrectly. His query, essentially: retrieve t1 where t1.name = t2.name or t1.name = t3.name has an obvious meaning in relational calculus, and he has a right to expect the answer to be based on this meaning. The correct way to translate an "or" in a selection is as a union. This is shown in "Principles of DB Systems," by J.D. Ullman in proving that relational algebra has the same power as relational calculus. If some system chooses a different route, it is that system's responsibility to get it right. Anything else is like defining the meaning of a program to be the machine code produced by the compiler -- easy to implement, but not very useful. Maybe IBM can run this bluff, but I doubt RTI can.
rts@unc.UUCP (Rick Snodgrass) (09/08/86)
In article <819@navajo.STANFORD.EDU> avg@navajo.UUCP (Allen Van Gelder) writes: >I disagree with Hoffman's attempted justification of the way >Eric Carleen's query is handled. Having a calculus-based language >loses its point if the user has to know how the system is going >to translate it to relational algebra, possibly incorrectly. > >His query, essentially: > retrieve t1 where t1.name = t2.name or t1.name = t3.name >has an obvious meaning in relational calculus, and he has a right >to expect the answer to be based on this meaning. > >The correct way to translate an "or" in a selection is as a union. >This is shown in "Principles of DB Systems," by J.D. Ullman in >proving that relational algebra has the same power as relational >calculus. One must be careful in deciding what the meaning of a statement in any language is. The best way is to go back to the formal semantics of the language. In his book, Ullman gives the semantics for Quel (Section 6.4). For this query, it is {u | (Exists t1)(Exists t2)(Exists t3) (R(t1) and R(t2) and R(t3) and u = t1 and (t1[name] = t2[name] or t1[name] = t3[name]) ) } This semantics, which supports Hoffman's reply, is independent of the relational algebra. Rick Snodgrass Department of Computer Science, University of North Carolina at Chapel Hill
hoffman@hdsvx1.UUCP (Richard Hoffman) (09/09/86)
In article <819@navajo.STANFORD.EDU> avg@navajo.UUCP (Allen Van Gelder) writes: >I disagree with Hoffman's attempted justification of the way >Eric Carleen's query is handled. Having a calculus-based language >loses its point if the user has to know how the system is going >to translate it to relational algebra, possibly incorrectly. I wasn't attempting to justify it, merely to explain it. I agree with you that QUEL should be kinder to the non-RC-aware user; nevertheless, the story I told is the reason that Eric's query was not handled the way he expected it to be. >[Eric's] query, essentially: > retrieve t1 where t1.name = t2.name or t1.name = t3.name >has an obvious meaning in relational calculus, and he has a right >to expect the answer to be based on this meaning. >The correct way to translate an "or" in a selection is as a union. RTI implements "or" across multiple tables as a join (or, as I explained, cartesian product followed by selection). They do not implement unions at all (although I imagine that they will have to do so before they finish their fully distributed data base system). There you have part of the problem with Ingres -- it does not fully support the relational model, in that important operators such as "union" and "difference" and "divide" are missing (note that most of these can be obtained through user work-around, or multiple queries, but that's not the point). In fact, after 15 years of work and talk and usage, I believe there is still not a single commercially available RDB which successfully implements the entire relational model. After listening to a steady stream of user complaints about and problems with Ingres for almost three years, I am well aware of its many faults. However, I have yet to find a better overall alternative in terms of cost, performance, ease-of-use and maturity of the human interface. -- Richard Hoffman | "Oh life is a wonderful cycle of song, Schlumberger Well Services | A medley of extemporanea. hoffman%hdsvx1@slb-doll.csnet | And Love is a thing that can never go wrong PO Box 2175, Houston, TX 77252 | ... And I am Marie of Roumania." --D. PARKER
ark@ut-sally.UUCP (Arthur M. Keller) (09/12/86)
Query: retrieve t1 where t1.name = t2.name or t1.name = t3.name In article <316@unc.unc.UUCP> rts@unc.UUCP (Rick Snodgrass) writes: >One must be careful in deciding what the meaning of a statement in >any language is. The best way is to go back to the formal semantics >of the language. In his book, Ullman gives the semantics for Quel >(Section 6.4). For this query, it is > {u | (Exists t1)(Exists t2)(Exists t3) > (R(t1) and R(t2) and R(t3) and u = t1 > and (t1[name] = t2[name] or t1[name] = t3[name]) > ) } >This semantics, which supports Hoffman's reply, is independent of >the relational algebra. > >Rick Snodgrass >Department of Computer Science, University of North Carolina at Chapel Hill The problem is where the quantifiers go. Consider: {u | (Exists t1)(R(t1) and u = t1 and [(Exists t2) (S(t2) and t1[name]=t2[name]) or (Exists t3) (T(t3) and t1[name]=t3[name])])} This query has ordinary union semantics, even if either S or T is empty. It is not clear that when given a quantifier-free expression that quantifiers should always be introduced at the beginning. Arthur -- ------------------------------------------------------------------------------ Arpanet: ARK@SALLY.UTEXAS.EDU UUCP: {gatech,harvard,ihnp4,pyramid,seismo}!ut-sally!ark
rts@unc.UUCP (Rick Snodgrass) (09/15/86)
In article <5717@ut-sally.UUCP> ark@sally.UUCP (Arthur M. Keller) writes: >The problem is where the quantifiers go. Consider: > >{u | (Exists t1)(R(t1) and u = t1 and > [(Exists t2) (S(t2) and t1[name]=t2[name]) or > (Exists t3) (T(t3) and t1[name]=t3[name])])} > >This query has ordinary union semantics, even if either S or T is >empty. It is not clear that when given a quantifier-free expression >that quantifiers should always be introduced at the beginning. The Quel semantics states very clearly that quantifiers, one per tuple variable, are to be introduced at the beginning--see (6.5) on page 190 in Ullman's book. Allowing them to be put in arbitrary positions may solve this one confusion but introduces others, and is counter to the design of Quel.
aubery@oracle.UUCP (Eric aubery) (09/15/86)
In article <595@ur-tut.UUCP> ecec@ur-tut.UUCP (Eric Carleen) writes: > > >I've found that apparently I have a severe misunderstanding of >the use of a relational data base, specifically Ingres. I'm sending >out a summary of my question because I've found that a number of >other people have the same misunderstanding. > >Suppose that I have 3 tables of names, call them table1, table2, and >table3. The names in tables 2 or 3 may or may not be in table1. I >would like to delete all those records in table1 that are already listed >in the other two tables. > >If I give the command: > > delete table1 where table1.name = table2.name > or table1.name = table3.name > >then I get what I was after: entries that match EITHER table2 OR table3 >are deleted. > >unless... table3 is empty. If table3 is empty, then nothing is deleted >from table1, regardless of whether or not there are any matching entries >in table2. (The work-around is obvious, but I thought unnecessary.) > Here is one way to specify the delete in question in Oracle, and presumably with any SQL implementation. delete from table1 where name in ( select table1.name from table1,table2 where table1.name = table2.name union select table1.name from table1,table3 where table1.name = table3.name ) -- Eric N. Aubery ORACLE Corporation (415)598-8022 Belmont, California hplabs!oracle!aubery