[comp.databases] Need to optimize this incredibly S L O W update

bill@twg.bc.ca (Bill Irwin) (02/03/91)

I have an SQL that is supposed to select the clients that are due
for Preventative Maintenance (PM) in the coming month and for
each selected, add a new PM record to the pm table for each
hardware item whose most recently completed pm record is 60 days
old.  It actually works, but...

The first time this SQL was run with the pm table empty, it
selected 11 clients and about 200 items to add pm records for.
It took about 5 minutes to run.  The 2nd time it was run it
selected about 7 clients and still hasn't completed running as I
write this article.  The elapsed time has been over 4 hours
already, with the CPU time being 150 minutes - almost 3 hours!

With this kind of dramatic increase between the first two runs of
this SQL, I am visualizing a week to run the third one.  Below is
the update SQL.  Any suggestions on ways to make it run faster
would be greatly appreciated.  I am using Unify Turbo-SQL.

insert into pm (item_num):
select item.item_number from item, client, inv
        [the (inv)entory table stores item descriptions]
        [the item table records individual components]
where
        client.key = item.key and       # item belongs to client
        inv.key = item_id and           # item key matches inv key
        inv_group = 'H' and             # hardware items only
        quit_date = **/**/** and        # client hasn't terminated support
        item.inst_date > **/**/** and   # item has been installed
        [
        pm_month_1 = 2 or               # one of the 3 PM months for the
        pm_month_2 = 2 or               # client is February
        pm_month_3 = 2
        ]
        and [
           item.item_number =           # dont't add a new pm record if
           select pm.item_num from pm   # the most recent one is within
           where pm.item_num = item.item_number  # 2 months
           group by pm.item_num having 02/02/91 - max(pm.pm_date) > 60
           ;
        or 0 =                                  # handles the case where the
           select count(*) from pm              # item is new and has no PMs
           where pm.item_num = item.item_number # on file
           ;
        ]
/

I know it is the two innner selects that are slowing it down, but
I need to be selective about whether to add a new PM record for
an item.  Thanks for any suggestions.
-- 
Bill Irwin    -       The Westrheim Group     -    Vancouver, BC, Canada
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
uunet!van-bc!twg!bill     (604) 431-9600 (voice) |     Your Computer  
bill@twg.bc.ca            (604) 430-4329 (fax)   |    Systems Partner

lugnut@sequent.UUCP (Don Bolton) (02/05/91)

In article <600@twg.bc.ca> bill@twg.bc.ca (Bill Irwin) writes:
>
>I have an SQL that is supposed to select the clients that are due
>for Preventative Maintenance (PM) in the coming month and for
>each selected, add a new PM record to the pm table for each
>hardware item whose most recently completed pm record is 60 days
>old.  It actually works, but...
>
>The first time this SQL was run with the pm table empty, it
>selected 11 clients and about 200 items to add pm records for.
>It took about 5 minutes to run.  The 2nd time it was run it
>selected about 7 clients and still hasn't completed running as I
>write this article.  The elapsed time has been over 4 hours
>already, with the CPU time being 150 minutes - almost 3 hours!
>
>With this kind of dramatic increase between the first two runs of
>this SQL, I am visualizing a week to run the third one.  Below is
>the update SQL.  Any suggestions on ways to make it run faster
>would be greatly appreciated.  I am using Unify Turbo-SQL.
>

This sort of thing seems common to SQL based products in general. Had 
an istance with Oracle where I had 20,000 rows and had appx 7,000 that
needed updating with values from annother table (had a zip +4 done to
our contact lists by an external source).
                     an update where exists select blah blah blah chewed
and chewed and chewed. The solution was to create YAT (yet annother table)
as the "sum" of the selected criteria and do renames. went from 20 hours
to 20 minutes.

I've encountered a similar situation with Informix and opted for a similar
type solution.

>insert into pm (item_num):
>select item.item_number from item, client, inv
>        [the (inv)entory table stores item descriptions]
>        [the item table records individual components]
>where
>        client.key = item.key and       # item belongs to client
>        inv.key = item_id and           # item key matches inv key
>        inv_group = 'H' and             # hardware items only
>        quit_date = **/**/** and        # client hasn't terminated support
>        item.inst_date > **/**/** and   # item has been installed
>        [
>        pm_month_1 = 2 or               # one of the 3 PM months for the
>        pm_month_2 = 2 or               # client is February
>        pm_month_3 = 2
>        ]
>        and [
>           item.item_number =           # dont't add a new pm record if
>           select pm.item_num from pm   # the most recent one is within
>           where pm.item_num = item.item_number  # 2 months
>           group by pm.item_num having 02/02/91 - max(pm.pm_date) > 60
>           ;
>        or 0 =                                  # handles the case where the
>           select count(*) from pm              # item is new and has no PMs
>           where pm.item_num = item.item_number # on file
>           ;
>        ]
>/
>
>I know it is the two innner selects that are slowing it down, but
>I need to be selective about whether to add a new PM record for
>an item.  Thanks for any suggestions.
>-- 
>Bill Irwin    -       The Westrheim Group     -    Vancouver, BC, Canada
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>uunet!van-bc!twg!bill     (604) 431-9600 (voice) |     Your Computer  
>bill@twg.bc.ca            (604) 430-4329 (fax)   |    Systems Partner

Good luck

bill@twg.bc.ca (Bill Irwin) (02/05/91)

While I was waiting for responses to come in from the Net, I
thought I'd poke around with the SQL a bit.  I remembered
something that I read in a DB theory guide, to the effect:  when
you are listing selection criteria, list the criteria that is the
most likely to fail first and the criteria that is the least
likely to fail last.  As soon as a record fails a test, the other
criteria are not evaluated for that record.  By listing the
criteria most like to fail first, you stop many records from even
being evaluated at the second step.

I also thought that perhaps the query was getting royaly confused
because of the inner query using the same table as the outer one,
so I decided to try a self join.

I wrote:

:insert into pm (item_num):
:select item.item_number from item, client, inv
:        [the (inv)entory table stores item descriptions]
:        [the item table records individual components]
:where
=============================================
This next section used to be last, now first.
Records are most likely to fail this test.
=============================================
:        [
:        pm_month_1 = 2 or               # one of the 3 PM months for the
:        pm_month_2 = 2 or               # client is February
:        pm_month_3 = 2
:        ] and
:        client.key = item.key and       # item belongs to client
:        inv.key = item_id and           # item key matches inv key
:        inv_group = 'H' and             # hardware items only
:        quit_date = **/**/** and        # client hasn't terminated support
:        item.inst_date > **/**/** and   # item has been installed
:        and [
=============================================
Did a self join on the pm table, since this is the one I am adding to.
Called the temp pm table "x".
=============================================
:           item.item_number =           # dont't add a new pm record if
:           select x.item_num from pm x  # the most recent one is within
:           where x.item_num = item.item_number  # 2 months
:           group by x.item_num having 02/02/91 - max(x.pm_date) > 60
:           ;
:        or 0 =                                  # handles the case where the
:           select count(*) from pm              # item is new and has no PMs
:           where pm.item_num = item.item_number # on file
:           ;
:        ]
:/

Guess what?  It now runs in under three minutes, compared to over
four hours previously (I never did wait to see if it would
finish, as it had brought the system to its knees and response
time was nonexistent).

Steven List offered a suggestion that agreed with one of the two
changes I had made.

:Here's what occurs to me:  this query is recursive.  That is, it
:is inserting into a table that it is also searching.  Based on
:my knowledge of Unify, it is possible that Bill is creating all
:sorts of confusing loops for SQL and causing it to lose its
:place.

:That is, Unify tracks the CURRENT RECORD in each table.  When
:you traverse a set, unless you're using an index, my
:recollection is that that CURRENT RECORD pointer is used too.
:So if both the subquery and the insertion are working with the
:same table, then the key question is:  What's the CURRENT RECORD
:at any given moment, and how does changing it affect the query
:at that moment?

:Let's see if I can clarify...  If I'm doing a simple search
:through a set, and as a result of finding a record I insert a
:record, then I've changed the CURRENT RECORD to the one I've
:just added.  But I'm still traversing a "set".  So my position
:in the set has changed.  If I'm doing a sequential search, and
:the new record is added (through the delete chain) BEFORE the
:record that triggered the insertion, then will I look at all the
:intervening records again?  If I'm traversion an explicit
:relation (link) and the same thing happens such that the
:inserted record is in a different set, then what's the effect?
:It gets VERY confusing!!

:I hope Bill will let us all know what the resolution of this is
:- it's fascinating!

:--
:+----------------------------------------------------------------------------+
::                Steven List @ Transact Software, Inc. :^>~                  :
::           Chairman, Unify User Group of Northern California                :
::    {apple,coherent,limbo,mips,pyramid,ubvax}!itkin@guinan.Transact.COM     :
::                        Voice: (415) 961-6112                               :
:+----------------------------------------------------------------------------+

Here's your wish Steven.  I hope this exercise is of use to
others in optimizing their queries.  It sure can make a huge
difference to processing time to write the thing correctly!

Thanks to all who responded.
-- 
Bill Irwin    -       The Westrheim Group     -    Vancouver, BC, Canada
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
uunet!van-bc!twg!bill     (604) 431-9600 (voice) |     Your Computer  
bill@twg.bc.ca            (604) 430-4329 (fax)   |    Systems Partner