citron@cit-vax.Caltech.Edu (Mark C. Citron) (12/24/88)
Hope someone can understand what is going wrong here. I have a table in a database that has three indices: a composite unique index made up of date, name1 and slotnumber an index of name2 an index of unitnumber there are about 50,000 rows in the table. In about 90% of the rows the unitnumber is 0. In time these will be changed to other numbers (the slots will be "filled"). Now here is the problem: When I update the unitnumber field and that field already contains a number OTHER THAN 0 the update goes very fast. GREAT! Updating the unitnumber field to 0 (independent of what it was) also goes very fast (less than a sec). HOWEVER, when I update the unitnumber field and that field already contains a 0, the update takes forever (about 30 sec). There is alot of disk accessing. This is an Informix database. I might guess that the problem is that although unitnumber is indexed, there are so many rows with the same unit number that Informix has trouble locating the record to update. Is this correct? How do I get around this? This performance is awful! Thanks for any (ANY!) suggestions. Write or post. citron@csvax.caltech.edu citron@cit-vax.UUCP
ben@calvin.sybase.com (ben ullrich) (12/26/88)
In article <8973@cit-vax.Caltech.Edu> citron@cit-vax.UUCP (Mark C. Citron) writes: >Hope someone can understand what is going wrong here. I have a table in >a database that has three indices: >a composite unique index made up of date, name1 and slotnumber >an index of name2 >an index of unitnumber > >Now here is the problem: When I update the unitnumber field >and that field already contains a number OTHER THAN 0 >the update goes very fast. GREAT! >Updating the unitnumber field to 0 (independent of what it was) also >goes very fast (less than a sec). yes. the length of time involves finding the row to update, not what it ends up being. >HOWEVER, when I update the unitnumber field >and that field already contains a 0, the update takes forever (about 30 >sec). There is alot of disk accessing. > >This is an Informix database. I might guess that the problem is >that although unitnumber is indexed, there are so many rows with >the same unit number that Informix has trouble locating the >record to update. Is this correct? How do I get around this? >This performance is awful! your assumption about ``having trouble locating the record to update'' sounds correct to me. with all those rows having the same unit number, it probably has to do something really time consuming like a table scan. it also sounds to me like you're using only the unit number to qualify the row. i'd suggest using other fields in the row (preferably ones contained in one of the other indexes described above) to further qualify the row. hopefully informix is smart enough to use one of the other indexes when you give it fields within it instead of doing a table scan through all those 0 unit numbers. ...ben --- ben ullrich consider my words disclaimed sybase, inc. "everybody gets so much information all day long that emeryville, ca they lose their common sense." -- gertrude stein (415) 596 - 3654 ben%sybase.com@sun.com {pyramid,pacbell,sun,lll-tis,capmkt}!sybase!ben
davek@infmx.UUCP (David Kosenko) (12/28/88)
In article <8973@cit-vax.Caltech.Edu>, citron@cit-vax.Caltech.Edu (Mark C. Citron) writes: > Hope someone can understand what is going wrong here. I have a table in > a database that has three indices: > an index of unitnumber > > there are about 50,000 rows in the table. In about 90% of the rows the > unitnumber is 0. In time these will be changed to other numbers (the > slots will be "filled"). [ problem description deleted ] An index on a field that has 90% duplicate entries is only useful when accessing (or updating) records where the value is other than that field; the index can be used to get to the record quickly. Thus, update where unitnumber = 1 (or any value other than 0) will work quite nicely. When the value is 0, however, the index becomes not only useless, but detrimental; a sequential access would be better in this case. Unfortunately, the optimizer does not have the means to detect this (at least not now). Using the index to grab all those records means a disk read (often) for each index entry, thus the poor performance. This is expected behavior. You may want to reconsider the index on unitnumber. If most of your accesses are where unitnumber = 0, the index is doing more harm than good. If, however, you tend to access where unitnumber != 0 most often, you probably want to keep it around, though at the cost of the performance you are now seeing. Also, if you can provide any values for the other fields indexed the queries may run faster (i.e., cause other indexes to be used when unitnumber = 0 - if you can provide values matching the composite index, it would be best since it is unique). Hope this helps. Dave -- Disclaimer: The opinions expressed herein |"As we hang beneath the heavens are by no means those of Informix Software | and we hover over hell (though they make you wonder about the | our hearts become the instruments strange people they hire). | we learn to play so well"
davek@infmx.UUCP (David Kosenko) (12/28/88)
In article <710@infmx.UUCP>, davek@infmx.UUCP (David Kosenko) writes: > An index on a field that has 90% duplicate entries is only > useful when accessing (or updating) records where the value is other > than that field; the index can be used to get to the record quickly. ^^^^^ this should be "value" (i.e., 0), not "field" My apologies for the flub. Dave -- Disclaimer: The opinions expressed herein |"As we hang beneath the heavens are by no means those of Informix Software | and we hover over hell (though they make you wonder about the | our hearts become the instruments strange people they hire). | we learn to play so well"
cjh@hpausla.HP.COM (Clifford Heath) (12/28/88)
Sorry, couldn't get to you by E-mail. Anyhow, this is interesting to others. > Hope someone can understand what is going wrong here. I have a table in > a database that has three indices: > a composite unique index made up of date, name1 and slotnumber > an index of name2 > an index of unitnumber > there are about 50,000 rows in the table. In about 90% of the rows the > unitnumber is 0. In time these will be changed to other numbers (the > slots will be "filled"). > HOWEVER, when I update the unitnumber field > and that field already contains a 0, the update takes forever (about 30 > sec). There is alot of disk accessing. This is because Informix uses a 16 bit "duplicate number" in order to maintain the proper (ANSI COBOL required) historical sequencing of duplicates. In order to find the BTREE index entry to delete, it cannot search for the appropriate duplicate by number, since that information is only in the index, but rather must search the entire chain of duplicates to find the index entry that matches the record number being deleted. In this case up to 45000 entries, or on average over 22000 entries must be searched. Because of the time-ordering, recent entries are at the end of the list, resulting in worst-case searching. This is a good case for requesting Informix to add a new flag to be used for creation of duplicate indices, saying "Don't bother time-ordering duplicates". BTW, if you get to the limit of 65K duplicates in a chain, bcheck will *ALWAYS* report "index out of order", simply because the duplicate number wraps around and bcheck notices it. I don't know what the wrapping does to time-ordering of duplicates, but it doesn't seem to corrupt the index otherwise. Disclaimer: This information is based on version 2.10 of C/ISAM* source code, and so may be out of date. * C/ISAM is a trademark of Informix Corp. This article contains only opinions that are my own, and may be wrong. Clifford Heath, HP Australian Software Operation. ACSnet: cjh@hpausla.oz, Internet: cjh%hpausla@hplabs.HP.COM, UUCP: hplabs!hpfcla!hpausla!cjh
aland@infmx.UUCP (Dr. Scump) (12/28/88)
In article <8973@cit-vax.Caltech.Edu>, citron@cit-vax.Caltech.Edu (Mark C. Citron) writes: > Hope someone can understand what is going wrong here. I have a > table in a database that has three indices: > a composite unique index made up of date, name1 and slotnumber > an index of name2 > an index of unitnumber > > there are about 50,000 rows in the table. In about 90% of the rows the > unitnumber is 0. In time these will be changed to other numbers (the > slots will be "filled"). I am assuming that you query on this field often, looking for the "exceptions" (e.g. those rows with "filled" slots). If you do *not* search on this field often, then having an index on it is counter- productive. > Now here is the problem: When I update the unitnumber field > and that field already contains a number OTHER THAN 0 > the update goes very fast. GREAT! Well, that's nice to hear :-] > Updating the unitnumber field to 0 (independent of what it was) also > goes very fast (less than a sec). Now I'm confused. Well, maybe I'm not. Lessee... in other words, as long as the field's value *before the update* was other than 0, it goes fast. OK, makes sense... > HOWEVER, when I update the unitnumber field > and that field already contains a 0, the update takes forever (about 30 > sec). There is a lot of disk accessing. Yikes! 30 seconds is forever? I'm older than I thought! :-] Seriously, though, this makes sense. What happens when there is a lot of duplication within a field is that those "identical" index entries (since you had just the one field in that index) must be traversed sequentially, since it can't identify whether or not other WHERE clause criteria (if any) are satisfied just by looking at that particular index. When you run your update, the proper row(s) must be identified first. *That* is what takes so much time; since you have an index on that exact field already, the optimizer will use that index first to qualify the needed row(s). The high degree of disk usage that you see is the traversal of these identical entries in the B+ index tree, checking the other search criteria for each of these rows which have unitnumber = 0. > This is an Informix database. I might guess that the problem is > that although unitnumber is indexed, there are so many rows with > the same unit number that Informix has trouble locating the > record to update. Is this correct? How do I get around this? > This performance is awful! Exactly. Not so much that there is "trouble locating the record" as "we have to check every record qualified by this value in the index." Same effect. To avoid this duplicate search, the best bet is to make that index a composite index also. I would guess that this update you run probably uses the same criteria each time, just with different values; you probably search for something like: .. WHERE unitnumber = 0 AND slotnumber = 32768 or .. WHERE unitnumber = 0 AND date = today or whatever. The solution to this is simple: create that index for unitnumber as a *composite* index, tacking whatever other field is your most frequent "additional" search criteria after the unitnumber. For example, if the first criterion above is what you use, create a composite of unitnumber and slotnumber, thus: CREATE INDEX unitnslot ON tablename(unitnumber, slotnumber) or, if the second criterion above is what you use, create a composite of unitnumber and date: CREATE INDEX unitndate ON tablename(unitnumber, curdate) *instead of* the singleton index on unitnumber. Now, there will be very few duplicates (or none at all) in the index. Also, for a compound condition search (e.g. WHERE unitnumber = 0 AND slotnumber = 32768), the composite index will be used directly to find what should be only a few rows (and therefore find them quickly). If a search is performed on unitnumber alone, it will still be fast, since the optimizer can use composite indexes for searches on the first column in the index as easily as if it was a singleton index on that column. (Of course, if you search on unitnumber=0 alone, it will be slower because you will be qualifying some 45,000 rows). By the way, there is a limit of 64K occurrences of the same value in a C-ISAM index (in the current and past releases, anyway). If you had a column with, say, the value 0 in 70,000 of the rows and had a singleton index on that column, you would have problems with that index. (BCHECK and CHECK TABLE would complain). > Thanks for any (ANY!) suggestions. Write or post. > citron@csvax.caltech.edu > citron@cit-vax.UUCP You're welcome. -- Alan S. Denney | Informix Software, Inc. | {pyramid|uunet}!infmx!aland Disclaimer: These opinions are mine alone. If I am caught or killed, the secretary will disavow any knowledge of my actions. Santos' 4th Law: "Anything worth fighting for is worth fighting *dirty* for"