poage@sunny.UUCP (Tom Poage) (11/06/88)
I'm writing a number of SQL scripts to perform queries involving multiple tables. My question is this: Is the order of entries in the "select", "from", and "where" clauses important in terms of performance? For example, select a,b,c,d,x,y,z,... from t1,t2,t3,... where t1.a = t2.a and t1.x = t2.x and z in <1,2,4,10> and ... / If table t1 is the primary table from which I retrieve references to entries in other tables, should it be located first in the "from" clause? Is order really important for table names? Further, can candidate selections can be eliminated "earlier" in the qualification process by placing certain comparisons before others in the where clause? I.e., reduce the number of comparisons before deciding on rejection based on my knowledge of the data distribution. I assume that order should, in a theoretical sense anyway, make no difference in this type of HLL. However, you never know.... Any hints, references, etc. are appreciated. Tom. -- Tom Poage, UCDMC Clinical Engineering, Sacto., CA ucdavis.ucdavis.edu!sunny!{poage,root,postmaster,news} ucbvax!ucdavis!sunny!{poage,root,postmaster,news}
jkrueger@daitc.daitc.mil (Jonathan Krueger) (11/07/88)
In article <344@sunny.UUCP>, poage@sunny (Tom Poage) writes: > Is the order of entries in the "select", "from", and > "where" clauses important in terms of performance? If order affects performance, your system lacks a query optimizer. I've heard there are still vendor products for which this is true. Empirical testing can reveal such shoddy products. Lack of evidence does not constitute an endorsement, however. The following are two QUEL queries put to RTI INGRES to answer the question "who among us can afford his preferred style of wallpaper?": range of w is wallpaper /* styles and costs */ range of p is prefer /* people's style preferences */ range of b is budgets /* names of budgets and balences in them */ range of a is authorized /* who's authorized to draw from what budget */ /* one way to put the query */ retrieve (b.account, w.cost, w.color, p.person) where b.account = a.account and w.cost <= b.balence and w.color = p.color /* semantically identical way (produces identical result relation) */ retrieve (w.color, w.cost, p.person, b.account) where w.cost <= b.balence and b.account = a.account and w.color = p.color As you see, order of qualifications, including those for joins, is varied. Order of columns named in the target list is varied, also including those later joined on. (Translate to SQL if you like, it doesn't make any difference.) Results look like this (abbreviated): |color |cost |person |accoun| |----------------------------------------------------------------| |blue | $10.00|bill |disc | |blue | $10.00|darryl |disc | |----------------------------------------------------------------| |accoun|cost |color |person | |----------------------------------------------------------------| |disc | $10.00|blue |bill | |disc | $10.00|blue |darryl | |----------------------------------------------------------------| In other words, go for the cheap blue stuff, then you can hide it in your discretionary account :-) As an unsupported feature, RTI offers a map of its query execution plan (QEP). This shows in what order it plans to perform the search. The QEP for the query above is shown below. Which query? Doesn't matter. Order of searching is quite independent of ordering of table and column names in the two queries. This confirms that RTI has a query optimizer. If your product doesn't provide a way to display QEP's or their ilk, you can measure cpu time required. You'll have to establish a "fuzz factor", a threshold for differences smaller than which are not considered significant. You'll find this a tedious process, not nearly as straightforward as saving QEP's to files and diff'ing them. The QEP generated for either query is: Join(account) Sorted(account) Pages 1 Tups 25 D12 C0 / \ Proj-rest Sort(account) Sorted(account) Pages 1 Tups 25 Pages 1 Tups 8 D8 C0 D4 C0 / / authorized Cart-Prod B-Tree(account) Heap Pages 4 Tups 8 Pages 1 Tups 25 D8 C0 / \ Join(color) budgets Heap Heap Pages 1 Tups 5 Pages 1 Tups 5 D3 C0 / \ Proj-rest Sort(color) Heap Pages 1 Tups 5 Pages 1 Tups 5 D1 C0 D2 C0 / / wallpaper Proj-rest Heap Heap Pages 2 Tups 5 Pages 1 Tups 5 D1 C0 / prefer Heap Pages 1 Tups 5 You read this from botttom to top. E.g. the first thing it plans to do is a projection-restriction on prefer (w.color = p.color) before jointing prefer to wallpaper. The last thing it plans to do is join budgets to authorized account on account. "Heap" is the unkeyed, unindexed storage structure of the table; i.e., not B-Tree, ISAM, or Hash. Note therefore that I'm doing a three-way join, two equijoins and a reljoin, on columns not understood to be primary or secondary keys. Simpler tasks have more readable QEP's. But then they wouldn't be interesting enough to vary order in. -- Jon --
moiram@tekcae.CAX.TEK.COM (Moira Mallison) (11/10/88)
>In article <344@sunny.UUCP>, poage@sunny (Tom Poage) writes: >> Is the order of entries in the "select", "from", and >> "where" clauses important in terms of performance? Jon Krueger responds with a good explanation of QEPs. And what he states about order not making a difference is right most of the time (> 95%), but we had a documented case in which the ordering of operands made a significant difference as measured in elapsed time to run the query. As I recall, the query was a complex join on multiple relations. Our "naive" user didn't know that ordering makes no difference (:-) and switched the operands on either side of the equal sign. That is, changed "where a = b" to "where b = a". We carefully analyzed the resulting QEPs and could find no clues. The QEPs are good performance analysis tools and will most frequently point our where the DBA can make improvements in storage structures or data base design. But if the query optimizer is making a really bad choice, it wouldn't hurt to change the order of operands. Moira Mallison CAX Data Management Tektronix, Inc
allbery@ncoast.UUCP (Brandon S. Allbery) (11/14/88)
As quoted from <344@sunny.UUCP> by poage@sunny.UUCP (Tom Poage): +--------------- | Is the order of entries in the "select", "from", and | "where" clauses important in terms of performance? | | For example, | | select a,b,c,d,x,y,z,... | from t1,t2,t3,... | where t1.a = t2.a | and t1.x = t2.x | and z in <1,2,4,10> | and ... | / +--------------- I don't know what the ANSI standard says, but it's moot: Unify SQL makes no claim of ANSI compliance. I've never seen any effect of ordering in Unify SQL. (I *have* seen an effect in Informix-SQL: "where" clauses seem to be executed in "bottom-up" order of conjunctions.) +--------------- | names? Further, can candidate selections can be eliminated | "earlier" in the qualification process by placing certain | comparisons before others in the where clause? I.e., reduce the | number of comparisons before deciding on rejection based on my | knowledge of the data distribution. +--------------- In theory, the query optimizer is better able to judge the distribution of data then you are, assuming that commonly-used join fields are indexed (note that explicit relationships and hashed primary keys count as a form of "indexing"). In practice, Unify does a pretty good job no matter what the order of conjunctions in a "where" clause is. +--------------- | I assume that order should, in a theoretical sense anyway, | make no difference in this type of HLL. However, you | never know.... +--------------- Theoretical, h*ll. If an SQL doesn't optimize the WHERE clause properly, it's broken. ++Brandon -- Brandon S. Allbery, comp.sources.misc moderator and one admin of ncoast PA UN*X uunet!hal.cwru.edu!ncoast!allbery <PREFERRED!> ncoast!allbery@hal.cwru.edu allberyb@skybridge.sdi.cwru.edu <ALSO> allbery@uunet.uu.net comp.sources.misc is moving off ncoast -- please do NOT send submissions direct Send comp.sources.misc submissions to comp-sources-misc@<backbone>.
allbery@ncoast.UUCP (Brandon S. Allbery) (11/14/88)
As quoted from <13087@ncoast.UUCP> by allbery@ncoast.UUCP (Brandon S. Allbery): +--------------- | I've never seen any effect of ordering in Unify SQL. (I *have* seen an | effect in Informix-SQL: "where" clauses seem to be executed in "bottom-up" | order of conjunctions.) +--------------- I should point out that the version of Informix-SQL where this was observed is neither current nor even particularly recent. It *is* one of the 2.x releases, though. I have no idea how well more recent versions optimize. ++Brandon -- Brandon S. Allbery, comp.sources.misc moderator and one admin of ncoast PA UN*X uunet!hal.cwru.edu!ncoast!allbery <PREFERRED!> ncoast!allbery@hal.cwru.edu allberyb@skybridge.sdi.cwru.edu <ALSO> allbery@uunet.uu.net comp.sources.misc is moving off ncoast -- please do NOT send submissions direct Send comp.sources.misc submissions to comp-sources-misc@<backbone>.