[comp.databases] smart queries and scientists

jean@beno.CSS.GOV (Jean Anderson) (12/05/90)

References: <1453@beaudin.UUCP> <49175@seismo.CSS.GOV>

DC> Query optimization is a tough task - with or without statistics - and I
DC> look forward to the day when Oracle makes it a standard on our platform,
DC> but for now, building a finely structured database and developing smart
DC> queries are our key to success.

JB> If you have the time, would you mind posting/mailing an english (non-sql)
JB> example of an inefficient query and then an example of a more efficient
JB> query? You may choose an extreme case to make a point, but a real life
JB> example would be more educational. Thank you for your input.

I'm in Dale's developement arm in San Diego and I have a real life 
seismology query that I think fits your request.

We have communications code that transfers raw signaling processing data 
from the ORACLE database in Norway to the Expert System database in D.C. 
on an hourly basis.  [Note in the sample below we don't transfer a count 
of the records, we transfer the actual records.  The actual transfer 
involves 5 tables, not 1, so the time delay to transfer gets more interesting.]

The seismologist complained that the transfer was taking too long for 
the amount of data in the table (only several thousand rows).

The detection table has one index on 'time', another on 'sta'.  He
wanted only two out of four stations transferred for a given time period:

          SQL> set timing on
          SQL> select count(*) from detection
            2  where sta in ('ARC', 'NRS')
            3* and time > 648519000  and time <= 648524000;

            COUNT(*)
          ----------
                  28

          Elapsed: 00:01:33.48

I have SQL_TRACE set to TRUE on our development database so I tkprof'd 
the trace file and discovered ORACLE did two separate scans of the sta 
index and ignored the time index.  I disabled ORACLE's use of the sta index 
by applying a function to the sta column, forcing ORACLE to use the index 
on time:

          SQL> select count(*) from detection
            2  where upper(sta) in ('ARC', 'NRS')
            3* and time > 648519000.000  and time <= 648524000;

            COUNT(*)
          ----------
                  28

          Elapsed: 00:00:00.61

After going to the IOUG in September and attending the session on "How
ORACLE uses indexes", it is now obvious to me what happened.  ORACLE prefers
a constant check above all and will instantly chose that index first.
Since doesn't gather statistics yet, it can't know that the distribution
of values for time is nearly unique and makes the better index.

The distribution of values is about 25% of the table per sta; time is 
nearly unique.  The end result--we eliminated the index on sta (we may 
create a concatenated index on time/sta).  But the seismologist gagged 
when I told him what the fix to his performance problem was.

Our scientists have to be smarter about their SQL queries--and they aren't.  
They don't have time to be.  Their goal in life is to do seismology,
not become expert SQL-ers.

   -	Jean Anderson
	DBA, SAIC Geophysics Division
	jean@seismo.css.gov -or- jean@esosun.css.gov
	(619)458-2727



            +++++++++++++++++++++++++++++++++++++++++++++++++
            ++   Any opinions are mine, not my employer's. ++
            +++++++++++++++++++++++++++++++++++++++++++++++++