[comp.databases] optimising sql

robf@squid.rtech.com (Robert Fair) (02/03/90)

>From: may@28182.dec.com (Patrick May) writes
>>In article <1990Jan31.010510.15242@welch.jhu.edu>, francois@welch.jhu.edu (Francois Schiettecatte) writes...
>>Does anyone out there know of any books which might help
>>me to optimise sql statements which are to be sent to a 
>
>An issue of _Database Programming and Design_ sometime around October or
>so 1988 had an article on this.  I'll try and dig up a more precise
>reference.
>
>Their major points concerned avoiding the use of IN, ANY, and ALL in the
>WHERE clause -- the same functionality can be acheived with EXISTS and
>at the same time force the use of an index if available.  There were
>also some suggestions on ordering the table names in the FROM clause,
>but these were more implementation dependent.

1) The order in the FROM clause should make NO DIFFERENCE to the query plan
   generated if the DBMS has any kind of decent optimizer. Having said that
   some DBMS products today still don't have proper optimizers, so you
   may still be forced to munge with the FROM clause to get decent
   performance. 

2) IN, ANY, ALL are indeed equivilent to EXISTS in many cases, but that
   doesn't  mean you shoudn't use them. If the DBMS has a good optimizer
   equivilent expressions should usually generate very similar
   query plans. It sounds like the book was written after using a
   DBMS which had an optimizer only capable of handling EXISTS effectively.

So, look for a product which has a smart optimizer and save yourself
a lot of hassle. Things to look for include:

- A statistical optimizer which has data distribution information
  available for smart use of indexes.
   
- Flattening subqueries into joins automatically when appropriate

- Multiple join strategies (tuple substitution, sort-merge etc)

- A cost-based method which thinks of things like CPU overhead
  and disk IO when choosing the best query plan.

If you have a distributed database (based on several different machines)
the optimizer should also be capable of allowing for things like
network overhead etc

Given a smart optimizer which turns your SQL into an intelligent query
plan, a more important issue become your database schema (layout) -
do you have appropriate indexes/storage structures on columns which can use
them ? Is the data normalized appropriately ? etc If you have a huge
database with absolutely no indexes the smartest optimizer in the world
won't help (although it'll find the best way of getting  the result
without using any indexes !)

Robert L. Fair
Ingres Corperation
Technical Support Group.

jkrueger@dev.dtic.dla.mil (Jonathan Krueger) (02/05/90)

francois@welch.jhu.edu (Francois Schiettecatte) writes:

>Does anyone out there know of any books which might help
>me to optimise sql statements which are to be sent to a 
>database. What I really need are a set of guidelines, of
>does and donts, of what type of query to use, etc.

It isn't this simple.

First design your database.  There are books on this.  If you do it
right, you'll find most commonly executed queries can be made to go
fast without special attention from the application programmer.

Next get the right performance analysis tools.  At a minimum, you need
a way to measure how expensive your queries are and a way to find out
how your engine is executing them.  Use these tools.  You'll likely
find that the queries you thought were problems can indeed be made to
go faster, but that it won't help your application nearly as much as
optimizing others you never considered.  This may lead you back to
database design again.

Then, and only then, is it worth playing games with "exists" versus
"in".  Yes, certain families of queries will be poorly optimized across
broad classes of vendor products and versions.  You're unlikely to
learn which ones in advertiser rags like "Database Programming and
Design".

-- Jon
-- 
Jonathan Krueger    jkrueger.dtic.dla.mil   uunet!dgis!jkrueger
Isn't it interesting that the first thing you do with your
color bitmapped window system on a network is emulate an ASR33?

nigelc@cognos.UUCP (Nigel Campbell) (02/12/90)

From experience with Interbase/Dg Sql / Vax rdb and a little Oracle
I would strongly recommend that you go on the vendors courses . There
are some very subtle features to each query optimizer that a generic
sql book cannot hope to bring out . 
 

I.e Does the optimizer process multiple indexes in parallel , does it 
    place more significance on a unique vs a repeating index , does 
    an index segment with a null value effect the decision .... these
    and many more cannot be found in a Sql book 
 

The major success lies in investing the time in getting a correct 
logical erm that is then transposed to a normalised database design .
Also it is worth while determining what the nature and volume of the
data will be as this could have an impact in the long term on the
performance of the system . Many sites I have visited have not considered
archival of data or due to a poor design have not investigated the 
life cycle of an entity an the impact of its' death on others . This 
usually leads to unwanted data being left in tables .

sweeneyd@handel.CS.ColoState.Edu (daniel j sweeney) (02/12/90)

In article <3601@dev.dtic.dla.mil> jkrueger@dev.dtic.dla.mil (Jonathan Krueger) writes:
>francois@welch.jhu.edu (Francois Schiettecatte) writes:
>
>>Does anyone out there know of any books which might help
>>me to optimise sql statements which are to be sent to a 
>>database. What I really need are a set of guidelines, of
>>does and donts, of what type of query to use, etc.
>
>It isn't this simple.

    [some of the response deleted]

>Next get the right performance analysis tools.  At a minimum, you need
>a way to measure how expensive your queries are and a way to find out
>how your engine is executing them.  Use these tools.  You'll likely
>find that the queries you thought were problems can indeed be made to
>go faster, but that it won't help your application nearly as much as
>optimizing others you never considered.  This may lead you back to
>database design again.

    [some of the response deleted]

>Jonathan Krueger    jkrueger.dtic.dla.mil   uunet!dgis!jkrueger

I'm a student at CSU taking my first database class.  I recently read
and article in the ACM Transactions on Databases March 1989, entitled:

"On Estimating the Cardinality of the Projection of a Database Relation"
by Ahad, Rao, and McLeod.  The authors discuss their formula for 
estimating the cost of a query (in disk accesses) with exceptable error
results.  The formula does not require a scan of the relation.

This article has intrigued me.  I would like to do a project for my class
on estimating the cost of queries.  However, I am not familar with
what performance analysis tools are available or any schools doing
research in this area.  If anyone can point me towards articles, research
and software (for Vax, HP, Sequent Balance or any UNIX based OSs), I
would be grateful.

Thanks,

-- 
Dan Sweeney                            sweeneyd@handel.cs.colostate.edu
Department of Computer Science         sweeneyd%colostate.csnet@relay.cs.net 
Colorado State University (RAMS)       Fort Collins, CO 80523