greg@harvey.qualcomm.com (greg Noel) (03/14/91)
Please forgive me if this is the wrong news group; I'm not normally embroiled in SQL problems. If this is not the right newsgroup, I'd appreciate being told what the correct one is. I have an Oracle data base table, with three columns, which I will call A, B, and C. There are multiple thousands of value A. For each value A, there are a handful of records, perhaps as many as a dozen. Within each value A, value B is almost always unique (it's really a time value when the record became active). If value B is not unique within A, the tie is broken by value C, which is unique within each A/B combination (it's the timestamp of when the record was modified). Here's the problem: I want to define a view on this table that only consists of the most recent active record for each value of A. How do I define this view such that the lookup is optimal? Intuitivly, what I would like to tell the data base engine is to look up all records with the same value of A, riffle through them, and select the one with the largest B.C value. What I am working with right now is select * from TBL T1 where A = 'value' and B = (select MAX(B) from TBL T2 where T2.B = T1.B) which works except for the rare case when there is more than one maximum value of B, which I handle programmatically. This query is also not optimal as it requires two indexed lookups within TBL. Is there a way to do do the query more optimally or to do the whole query in one request, so that the execptions don't need to be handled by the program? Tks,
miket@blia.sharebase.com (Mike Tossy) (03/15/91)
In article <1991Mar13.224746.5293@qualcomm.com>, greg@harvey.qualcomm.com (greg Noel) writes: > ... > I have an Oracle data base table, with three columns, which I will call A, B, > and C. There are multiple thousands of value A. For each value A, there are > a handful of records, perhaps as many as a dozen. Within each value A, value > B is almost always unique (it's really a time value when the record became > active). If value B is not unique within A, the tie is broken by value C, > which is unique within each A/B combination (it's the timestamp of when the > record was modified). > > Here's the problem: I want to define a view on this table that only consists > of the most recent active record for each value of A. How do I define this > view such that the lookup is optimal? Intuitivly, what I would like to tell > the data base engine is to look up all records with the same value of A, riffle > through them, and select the one with the largest B.C value. > > What I am working with right now is > select * from TBL T1 where A = 'value' and B = > (select MAX(B) from TBL T2 where T2.B = T1.B) > which works except for the rare case when there is more than one maximum value > of B, which I handle programmatically. This query is also not optimal as it > requires two indexed lookups within TBL. > > Is there a way to do do the query more optimally or to do the whole query in > one request, so that the execptions don't need to be handled by the program? > > Tks, This is a fairly common problem. I believe it is caused by a database design problem. The problem is that B (and C) tells when a particular record became valid, but the time when the record became invalid is stored in another record (or implied by the absence of any record with a higher time stamp). Notice that your SQL demonstrates this: it joins the table to itself to create the record you need. I suggest that the solution is to add another column B'. Column B is the time the record became active and B' is the time the record became inactive (was replaced by a more current record). The active row should have a null value (or a VERY LARGE value) for B'. Depending on your system, this may elimate the need for C or you may have to fix C in an analogous way. Your SQL becomes (probably not exact Oracle syntax): select * from TBL T1 where A = 'value' and B <= now() and B' = null; or select * from TBL T1 where A = 'value' and B <= now() and now() <= B'; This should be a very fast query and it is easy to build the view you want. This solution is not without cost. Inserting a new record becomes a longer process - you must update the old current record also. But, the update should be very fast. The database does becomes larger. But, before you reject this solution out of hand because of the "extra disk space" required. Do yourself a favor and do a back of the envolope calculation of how much extra space is required. Disk space is cheap. If it really is only "multiple thousands" of values of A, you will problably never even notice the extra disk space utilized by this solution. I suggest that my solution is the correct one from a database design point of view and you should only use your solution for an overwhelming reason. -- >>>>>> The above statements are only my opinions <<<<<< Mike Tossy ShareBase Coropration miket@sharebase.com 14600 Wichester Blvd (408) 378-7575 ext2200 Los Gatos, CA 95030 (ShareBase is a subsidiary of Teradata Corportation)
manatt@sirius.llnl.gov (Doug Manatt) (03/16/91)
In article <13659@blia.sharebase.com>, miket@blia.sharebase.com (Mike Tossy) writes: >In article <1991Mar13.224746.5293@qualcomm.com>, greg@harvey.qualcomm.com (greg Noel) writes: >> ... >>Intuitivly, what I would like to tell >>the data base engine is to look up all records with the same value of A riffle >> through them, and select the one with the largest B.C value. >>... >> Is there a way to do do the query more optimally or to do the whole query in >> one request, so that the execptions don't need to be handled by the program? >... >I suggest that the solution is to add another column B'. Column B is the >time the record became active and B' is the time the record became inactive The quick and cheap solution is to add a 1 byte boolean column that tells only if the record is currently valid or invalid. In our design, many of our timestamped entries have a supercede column that takes on the two values ("T" or "F") and when an existing entry is updated all "F"s are changed to "T"s before the insert. Doug Manatt Lawrence Livermore Natl Lab Human -- (415) 422-7257 FAX -- (415) 422-3160 manatt@lll-winken.llnl.gov -- Doug Manatt Lawrence Livermore Natl Lab Human -- (415) 422-7257 FAX -- (415) 422-3160 manatt@lll-winken.llnl.gov
greg@harvey.qualcomm.com (Greg Noel) (03/21/91)
I wrote: >I have an Oracle data base table, with three columns, which I will call A, B, >and C. .... I want to define a view on this table that only consists >of the most recent active record for each value of A. How do I define this >view such that the lookup is optimal? Intuitively, what I would like to tell >the data base engine is to look up all records with the same value of A, >riffle through them, and select the one with the largest B.C value. I received several replies to this; my thanks for the interest. Here's what was suggested: J. Pitt <jmp@stan.xx.swin.OZ.AU> (isn't OZ.AU redundant???) asks: >Have you tried the obvious ? > SELECT A, B, MAX(C) > FROM TBL > GROUP BY A,B; >(include other columns as required of course) Unfortunately, it's the last line that's the killer -- it doesn't seem to be possible to include other columns, except as summary functions (such as the MAX or MIN functions). What I need is the record corresponding to MAX(B).MAX(C). Mike Tossy <miket@blia.sharebase.com> suggests adding another column B' that contains the end of the range for which the record is valid. He notes: >This solution is not without cost. Inserting a new record becomes a longer >process - you must update the old current record also. But, the update >should be very fast. .... In a similar vein, someone at U.Iowa <cmdkrj@pmvax.weeg.uiowa.edu> and Doug Manatt <manatt@sirius.llnl.gov> both suggest adding a flag to the table that marks the current record. These are nice solutions, but ones that won't work in this case. Originally, we had a historical table containing date ranges and a current table with the most-recent value. It was a maintenance nightmare, and this design is an attempt to eliminate some of those difficulties. (Mike worries that we might be put off by the extra disk space involved; that's the least of our worries!) There are two reasons why these won't work: - I'm not the data base administrator; I can't add another column just to improve the processing of a single program. And because of the maintenance hassle described in the next point, it's probably not worth the complexity elsewhere in the system. - There are several programs that update this table, and because of the order in which the data may be processed, the update is not always to the "current" record -- that is, sometimes it is the history that is being changed. The updating programs must be very careful to notice when they are not updating the current record and correctly splice in the dates. I do agree with the point Mike makes that this is the "true" solution to the data base problem; we may consider adopting it at a later date. Cmdkrj also quotes Codd, "The only time not to normalize is when you can make an efficiency gain." I agree, and this might be one of the times. Lastly, Edward Screven <escreven@us.oracle.com> suggests: > select A, B, C from T t1 > where not exists > (select 'x' from T t2 > where t2.A = t1.A > and (t2.B > t1.B or > (t2.B >= t1.B and > t2.C > t1.C))); I've looked at this until I'm cross-eyed and still have no intuition as to why this works, but it does. It correctly selects the most-recent record. It's not optimal in that it joins the table with itself (requiring two indexed lookups to select on a value from column A), but it does capture both of the constraints from columns B and C in a single sub-query. For the moment, I'm using this last solution, but once things settle down a bit from this conversion, I plan to examine a solution involving date ranges. Thanks for all the response! -- Greg Noel, Unix Guru greg@qualcomm.com or greg@noel.cts.com