[comp.databases] ORACLE SQL subqueries performance

egn@laidbak.UUCP (E. G. Nadhan) (04/28/89)

	I came across an article in the March issue of DATABASE Programming
and Design which discussed the performance advantages of using a 
correlated subquery using the EXISTS clause over the IN,ANY or ALL clauses
in the ORACLE environment.

	The gist of the article is given below:

	Case 1:		SELECT X FROM TABLEA 
			WHERE X IN
				(SELECT Y FROM TABLEB
				 WHERE (condition)
				)

	Case 2:		SELECT X FROM TABLEA
			WHERE EXISTS
				(SELECT * FROM TABLEB
				 WHERE X = TABLEB.Y
				 AND (condition)
				)

	In the first case, ORACLE executes the subquery once and 
creates a temporary table with no indexes and compares each row of the 
temporary table with each of the rows generated from the main query.

	In the second case, the subquery is executed once for each of 
the rows generated from the main query.  No temporary table is created.
All indexes defined by the user are available during execution of these
queries.

	Given this, Case 2 is much faster than Case 1.

	I was wondering if anybody out there knew the manner in which SQL
has been implemented in the other RDBMS's around -- UNIFY, INFORMIX, INGRES etc.

	Certainly seems like this would be the case in all these packages
unless of course, some of them provide the capability of creating an 
index for the temporary table too.

Thanks,

E.G.Nadhan

{amdahl|att|cbosgd|spl1|sun|uwmcsd1|yclept|nucsrl} !laidbak!egn

jkrueger@daitc.daitc.mil (Jonathan Krueger) (05/03/89)

In article <2285@laidbak.UUCP>, egn@laidbak (E. G. Nadhan) writes:
>	I came across an article in the March issue of DATABASE Programming
>and Design which discussed the performance advantages of using a 
>correlated subquery using the EXISTS clause over the IN,ANY or ALL clauses
>in the ORACLE environment.
>
>	The gist of the article is given below:
>
>	Case 1:		SELECT X FROM TABLEA 
>			WHERE X IN
>				(SELECT Y FROM TABLEB
>				 WHERE (condition)
>				)
>
>	Case 2:		SELECT X FROM TABLEA
>			WHERE EXISTS
>				(SELECT * FROM TABLEB
>				 WHERE X = TABLEB.Y
>				 AND (condition)
>				)
>
>	In the first case, ORACLE executes the subquery once and 
>creates a temporary table with no indexes and compares each row of the 
>temporary table with each of the rows generated from the main query.
>
>	In the second case, the subquery is executed once for each of 
>the rows generated from the main query.  No temporary table is created.
>All indexes defined by the user are available during execution of these
>queries.

If you convolute your code to get vendor-specific, version-specific
performance, at the cost of clean, maintainable, reliable code, why
stop at poorly expressed queries?  Why not hardwire your applications
to the tracks and cylinders on which your data currently reside?  Hey,
another article in Database Programming and Design laments SQL's
"inability" to specify this.  And why not tie your application to a
particular instruction set, a specific operating system, and within
that, a nontrivial and vendor-specific set of conventions for
navigating a shared address space with varying degrees of unsafety?
Yes, if you want CICS, you know where to find it.

>[How do other DBMS's handle this]

INGRES [5.0/05a (pyr.u42/04)] generates the following search strategy:

                        Join(XY)(CO)
                     /             \
                 Proj-rest         TABLEA
               /
          TABLEB

For which "case" does INGRES choose this strategy?  Either.  INGRES
has a query optimizer which recognizes that the two "cases" are
identities.

I would be interested in what vendor products fail to do this.  Is it
true that Oracle is thus handicapped?  Have you timed the two "cases"
on Oracle and come up with different performance results?  If so, with
what version, on what platform?

-- Jon
--

egn@laidbak.UUCP (E. G. Nadhan) (05/03/89)

In article <503@daitc.daitc.mil> jkrueger@daitc.daitc.mil (Jonathan Krueger) writes:

>I would be interested in what vendor products fail to do this.  Is it
>true that Oracle is thus handicapped?  Have you timed the two "cases"
>on Oracle and come up with different performance results?  If so, with
>what version, on what platform?
>
>-- Jon
>-- 

	Like I mentioned earlier, I did not perform this tests but was quoting
from an article in Database Programming and Design.

	To answer your question, I again quote from the article:

	The two cases that the article has taken into consideration are:

		Case 1:	SELECT staff_num,job_num,job_phase
			FROM   Time
			WHERE (Job_Num,Job_Phase)
				NOT IN 
				(SELECT Job_Num,Job_Phase
				 FROM Phase
				)
			;

		Case 2:	SELECT Staff_Num,Job_Num,Job_Phase
			FROM Time
			WHERE NOT EXISTS
				(SELECT 'X'
					FROM Phase
					WHERE Time.Job_Num = Phase.Job_Num
					AND   Time.Job_Phase = Phase.Job_Phase
				)
			;

Case 1 Performance figures:
===========================

	On an IBM 4331(small mainframe) in a VM/CMS environment, with no other
	users on the system, the Case 1 query processed for 30 minutes,
	25 seconds, selecting 19 rows.  With other users active, the 
	average time to complete was 37 minutes.  On the PS/2, the query took
	six minutes 34 seconds.  

Case 1 Performance figures:
===========================

	On the IBM 4331, the Case 2 query selected the identical 19 rows in one
	minute, five seconds -- 34 times faster than the Case 1 query.  When
	executed on the PS/2, this query finished in 13 seconds, a 30-fold
	improvement.



E.G.Nadhan 

{amdahl|att|cbosgd|spl1|sun|uwmcsd1|yclept|nucsrl} !laidbak!egn