egn@laidbak.UUCP (E. G. Nadhan) (04/28/89)
I came across an article in the March issue of DATABASE Programming and Design which discussed the performance advantages of using a correlated subquery using the EXISTS clause over the IN,ANY or ALL clauses in the ORACLE environment. The gist of the article is given below: Case 1: SELECT X FROM TABLEA WHERE X IN (SELECT Y FROM TABLEB WHERE (condition) ) Case 2: SELECT X FROM TABLEA WHERE EXISTS (SELECT * FROM TABLEB WHERE X = TABLEB.Y AND (condition) ) In the first case, ORACLE executes the subquery once and creates a temporary table with no indexes and compares each row of the temporary table with each of the rows generated from the main query. In the second case, the subquery is executed once for each of the rows generated from the main query. No temporary table is created. All indexes defined by the user are available during execution of these queries. Given this, Case 2 is much faster than Case 1. I was wondering if anybody out there knew the manner in which SQL has been implemented in the other RDBMS's around -- UNIFY, INFORMIX, INGRES etc. Certainly seems like this would be the case in all these packages unless of course, some of them provide the capability of creating an index for the temporary table too. Thanks, E.G.Nadhan {amdahl|att|cbosgd|spl1|sun|uwmcsd1|yclept|nucsrl} !laidbak!egn
jkrueger@daitc.daitc.mil (Jonathan Krueger) (05/03/89)
In article <2285@laidbak.UUCP>, egn@laidbak (E. G. Nadhan) writes: > I came across an article in the March issue of DATABASE Programming >and Design which discussed the performance advantages of using a >correlated subquery using the EXISTS clause over the IN,ANY or ALL clauses >in the ORACLE environment. > > The gist of the article is given below: > > Case 1: SELECT X FROM TABLEA > WHERE X IN > (SELECT Y FROM TABLEB > WHERE (condition) > ) > > Case 2: SELECT X FROM TABLEA > WHERE EXISTS > (SELECT * FROM TABLEB > WHERE X = TABLEB.Y > AND (condition) > ) > > In the first case, ORACLE executes the subquery once and >creates a temporary table with no indexes and compares each row of the >temporary table with each of the rows generated from the main query. > > In the second case, the subquery is executed once for each of >the rows generated from the main query. No temporary table is created. >All indexes defined by the user are available during execution of these >queries. If you convolute your code to get vendor-specific, version-specific performance, at the cost of clean, maintainable, reliable code, why stop at poorly expressed queries? Why not hardwire your applications to the tracks and cylinders on which your data currently reside? Hey, another article in Database Programming and Design laments SQL's "inability" to specify this. And why not tie your application to a particular instruction set, a specific operating system, and within that, a nontrivial and vendor-specific set of conventions for navigating a shared address space with varying degrees of unsafety? Yes, if you want CICS, you know where to find it. >[How do other DBMS's handle this] INGRES [5.0/05a (pyr.u42/04)] generates the following search strategy: Join(XY)(CO) / \ Proj-rest TABLEA / TABLEB For which "case" does INGRES choose this strategy? Either. INGRES has a query optimizer which recognizes that the two "cases" are identities. I would be interested in what vendor products fail to do this. Is it true that Oracle is thus handicapped? Have you timed the two "cases" on Oracle and come up with different performance results? If so, with what version, on what platform? -- Jon --
egn@laidbak.UUCP (E. G. Nadhan) (05/03/89)
In article <503@daitc.daitc.mil> jkrueger@daitc.daitc.mil (Jonathan Krueger) writes: >I would be interested in what vendor products fail to do this. Is it >true that Oracle is thus handicapped? Have you timed the two "cases" >on Oracle and come up with different performance results? If so, with >what version, on what platform? > >-- Jon >-- Like I mentioned earlier, I did not perform this tests but was quoting from an article in Database Programming and Design. To answer your question, I again quote from the article: The two cases that the article has taken into consideration are: Case 1: SELECT staff_num,job_num,job_phase FROM Time WHERE (Job_Num,Job_Phase) NOT IN (SELECT Job_Num,Job_Phase FROM Phase ) ; Case 2: SELECT Staff_Num,Job_Num,Job_Phase FROM Time WHERE NOT EXISTS (SELECT 'X' FROM Phase WHERE Time.Job_Num = Phase.Job_Num AND Time.Job_Phase = Phase.Job_Phase ) ; Case 1 Performance figures: =========================== On an IBM 4331(small mainframe) in a VM/CMS environment, with no other users on the system, the Case 1 query processed for 30 minutes, 25 seconds, selecting 19 rows. With other users active, the average time to complete was 37 minutes. On the PS/2, the query took six minutes 34 seconds. Case 1 Performance figures: =========================== On the IBM 4331, the Case 2 query selected the identical 19 rows in one minute, five seconds -- 34 times faster than the Case 1 query. When executed on the PS/2, this query finished in 13 seconds, a 30-fold improvement. E.G.Nadhan {amdahl|att|cbosgd|spl1|sun|uwmcsd1|yclept|nucsrl} !laidbak!egn