[comp.databases] Is order important in SQL?

poage@sunny.UUCP (Tom Poage) (11/06/88)

I'm writing a number of SQL scripts to perform queries
involving multiple tables.  My question is this:

	Is the order of entries in the "select", "from", and 
	"where" clauses important in terms of performance?

For example,

	select a,b,c,d,x,y,z,...
	from t1,t2,t3,...
	where t1.a = t2.a
		and t1.x = t2.x
		and z in <1,2,4,10>
		and ...
	/

If table t1 is the primary table from which I retrieve
references to entries in other tables, should it be located
first in the "from" clause?  Is order really important for table
names?  Further, can candidate selections can be eliminated
"earlier" in the qualification process by placing certain
comparisons before others in the where clause?  I.e., reduce the
number of comparisons before deciding on rejection based on my
knowledge of the data distribution.

I assume that order should, in a theoretical sense anyway,
make no difference in this type of HLL.  However, you
never know....

Any hints, references, etc. are appreciated.  Tom.
-- 

Tom Poage, UCDMC Clinical Engineering, Sacto., CA
ucdavis.ucdavis.edu!sunny!{poage,root,postmaster,news}
ucbvax!ucdavis!sunny!{poage,root,postmaster,news}

jkrueger@daitc.daitc.mil (Jonathan Krueger) (11/07/88)

In article <344@sunny.UUCP>, poage@sunny (Tom Poage) writes:
>	Is the order of entries in the "select", "from", and 
>	"where" clauses important in terms of performance?

If order affects performance, your system lacks a query optimizer.
I've heard there are still vendor products for which this is true.
Empirical testing can reveal such shoddy products.  Lack of evidence
does not constitute an endorsement, however.

The following are two QUEL queries put to RTI INGRES to answer the
question "who among us can afford his preferred style of wallpaper?":

range of w is wallpaper		/* styles and costs */
range of p is prefer		/* people's style preferences */
range of b is budgets		/* names of budgets and balences in them */
range of a is authorized	/* who's authorized to draw from what budget */

/* one way to put the query */
retrieve
	(b.account, w.cost, w.color, p.person)
where
	b.account = a.account and
	w.cost <= b.balence and
	w.color = p.color

/* semantically identical way (produces identical result relation) */
retrieve
	(w.color, w.cost, p.person, b.account)
where
	w.cost <= b.balence and
	b.account = a.account and
	w.color = p.color

As you see, order of qualifications, including those for joins, is
varied.  Order of columns named in the target list is varied, also
including those later joined on.  (Translate to SQL if you like, it
doesn't make any difference.)  Results look like this (abbreviated):

|color          |cost                |person              |accoun|
|----------------------------------------------------------------|
|blue           |              $10.00|bill                |disc  |
|blue           |              $10.00|darryl              |disc  |
|----------------------------------------------------------------|

|accoun|cost                |color          |person              |
|----------------------------------------------------------------|
|disc  |              $10.00|blue           |bill                |
|disc  |              $10.00|blue           |darryl              |
|----------------------------------------------------------------|

In other words, go for the cheap blue stuff, then you can hide it in
your discretionary account :-)

As an unsupported feature, RTI offers a map of its query execution
plan (QEP).  This shows in what order it plans to perform the search.
The QEP for the query above is shown below.  Which query?  Doesn't
matter.  Order of searching is quite independent of ordering of table
and column names in the two queries.  This confirms that RTI has a
query optimizer.

If your product doesn't provide a way to display QEP's or their ilk,
you can measure cpu time required.  You'll have to establish a "fuzz
factor", a threshold for differences smaller than which are not
considered significant.  You'll find this a tedious process, not
nearly as straightforward as saving QEP's to files and diff'ing them.

The QEP generated for either query is:

                        Join(account)
                        Sorted(account)
                        Pages 1 Tups 25
                        D12 C0
             /                                  \
            Proj-rest                           Sort(account)
            Sorted(account)                     Pages 1 Tups 25
            Pages 1 Tups 8                      D8 C0
            D4 C0
 /                                   /
authorized                          Cart-Prod
B-Tree(account)                     Heap
Pages 4 Tups 8                      Pages 1 Tups 25
                                    D8 C0
                         /                      \
                        Join(color)             budgets     
                        Heap                    Heap
                        Pages 1 Tups 5          Pages 1 Tups 5
                        D3 C0
             /                      \
            Proj-rest               Sort(color)
            Heap                    Pages 1 Tups 5
            Pages 1 Tups 5          D1 C0
            D2 C0
 /                       /
wallpaper               Proj-rest
Heap                    Heap
Pages 2 Tups 5          Pages 1 Tups 5
                        D1 C0
             /
            prefer      
            Heap
            Pages 1 Tups 5


You read this from botttom to top.  E.g. the first thing it plans to
do is a projection-restriction on prefer (w.color = p.color) before
jointing prefer to wallpaper.  The last thing it plans to do is join
budgets to authorized account on account.  "Heap" is the unkeyed,
unindexed storage structure of the table; i.e., not B-Tree, ISAM, or
Hash.  Note therefore that I'm doing a three-way join, two equijoins
and a reljoin, on columns not understood to be primary or secondary
keys.  Simpler tasks have more readable QEP's.  But then they wouldn't
be interesting enough to vary order in.

-- Jon
-- 

moiram@tekcae.CAX.TEK.COM (Moira Mallison) (11/10/88)

>In article <344@sunny.UUCP>, poage@sunny (Tom Poage) writes:
>>	Is the order of entries in the "select", "from", and 
>>	"where" clauses important in terms of performance?

Jon Krueger responds with a good explanation of QEPs.  And what
he states about order not making a difference is right most of
the time (> 95%), but we had a documented case in which the
ordering of operands made a significant difference as measured
in elapsed time to run the query.

As I recall, the query was a complex join on multiple relations.
Our "naive" user didn't know that ordering makes no difference (:-)
and switched the operands on either side of the equal sign.  That
is,  changed "where a = b" to "where b = a".  We carefully analyzed
the resulting QEPs and could find no clues.

The QEPs are good performance analysis tools and will most frequently
point our where the DBA can make improvements in storage structures
or data base design.  But if the query optimizer is making a really
bad choice, it wouldn't hurt to change the order of operands.  

Moira Mallison
CAX Data Management
Tektronix, Inc

allbery@ncoast.UUCP (Brandon S. Allbery) (11/14/88)

As quoted from <344@sunny.UUCP> by poage@sunny.UUCP (Tom Poage):
+---------------
| 	Is the order of entries in the "select", "from", and 
| 	"where" clauses important in terms of performance?
| 
| For example,
| 
| 	select a,b,c,d,x,y,z,...
| 	from t1,t2,t3,...
| 	where t1.a = t2.a
| 		and t1.x = t2.x
| 		and z in <1,2,4,10>
| 		and ...
| 	/
+---------------

I don't know what the ANSI standard says, but it's moot:  Unify SQL makes no
claim of ANSI compliance.

I've never seen any effect of ordering in Unify SQL.  (I *have* seen an
effect in Informix-SQL:  "where" clauses seem to be executed in "bottom-up"
order of conjunctions.)

+---------------
| names?  Further, can candidate selections can be eliminated
| "earlier" in the qualification process by placing certain
| comparisons before others in the where clause?  I.e., reduce the
| number of comparisons before deciding on rejection based on my
| knowledge of the data distribution.
+---------------

In theory, the query optimizer is better able to judge the distribution of
data then you are, assuming that commonly-used join fields are indexed (note
that explicit relationships and hashed primary keys count as a form of
"indexing").  In practice, Unify does a pretty good job no matter what the
order of conjunctions in a "where" clause is.

+---------------
| I assume that order should, in a theoretical sense anyway,
| make no difference in this type of HLL.  However, you
| never know....
+---------------

Theoretical, h*ll.  If an SQL doesn't optimize the WHERE clause properly,
it's broken.

++Brandon
-- 
Brandon S. Allbery, comp.sources.misc moderator and one admin of ncoast PA UN*X
uunet!hal.cwru.edu!ncoast!allbery  <PREFERRED!>	    ncoast!allbery@hal.cwru.edu
allberyb@skybridge.sdi.cwru.edu	      <ALSO>		   allbery@uunet.uu.net
comp.sources.misc is moving off ncoast -- please do NOT send submissions direct
      Send comp.sources.misc submissions to comp-sources-misc@<backbone>.

allbery@ncoast.UUCP (Brandon S. Allbery) (11/14/88)

As quoted from <13087@ncoast.UUCP> by allbery@ncoast.UUCP (Brandon S. Allbery):
+---------------
| I've never seen any effect of ordering in Unify SQL.  (I *have* seen an
| effect in Informix-SQL:  "where" clauses seem to be executed in "bottom-up"
| order of conjunctions.)
+---------------

I should point out that the version of Informix-SQL where this was observed
is neither current nor even particularly recent.  It *is* one of the 2.x
releases, though.  I have no idea how well more recent versions optimize.

++Brandon
-- 
Brandon S. Allbery, comp.sources.misc moderator and one admin of ncoast PA UN*X
uunet!hal.cwru.edu!ncoast!allbery  <PREFERRED!>	    ncoast!allbery@hal.cwru.edu
allberyb@skybridge.sdi.cwru.edu	      <ALSO>		   allbery@uunet.uu.net
comp.sources.misc is moving off ncoast -- please do NOT send submissions direct
      Send comp.sources.misc submissions to comp-sources-misc@<backbone>.