hemphill@csc000.csc.ti.com (Charles Hemphill) (11/18/89)
Can the following type of question be answered with an SQL expression? List the 10 oldest employees. Basically, I'd like the top N tuples from an ordered table. I don't want to use embedded SQL.
itkin@mrspoc.Transact.COM (Steven M. List) (11/20/89)
hemphill@csc000.csc.ti.com (Charles Hemphill) writes: >Can the following type of question be answered with an SQL expression? > List the 10 oldest employees. >Basically, I'd like the top N tuples from an ordered table. I don't >want to use embedded SQL. Wouldn't this be something like select name, birthdate from employee order by birthdate desc and pipe the result through "head -10"? Otherwise, there's no way to set a limit in SQL. -- +----------------------------------------------------------------------------+ : Steven List @ Transact Software, Inc. :^>~ : : Chairman, Unify User Group of Northern California : : {apple,coherent,limbo,mips,pyramid,ubvax}!itkin@guinan.Transact.COM :
ltung@oracle.com (Lei Tung) (11/21/89)
In article <7@csc000.csc.ti.com>, hemphill@csc000.csc.ti.com (Charles Hemphill) writes: > Can the following type of question be answered with an SQL > expression? > > List the 10 oldest employees. The following works in Oracle's SQL (I really have no idea if it is ANSI compliant). It is tricky since a subquery is being compared to a constant. I make no claims as to this statement's performance. select name, birthdate from employee e1 where 10 > (select count(*) from employee e2 where e2.birthdate < e1.birthdate); Note that results may be screwed if employees share birthdays. Hope this helps, Lei Tung Oracle Corp. -- *------------------------------------------------------------------------* | email: ltung@oracle.com | | or: {hplabs,apple,pacbell,mit-eddie,uunet}!oracle!ltung | | | | It'll all work out, | | you know I'm a little freaked out . . . | | - D. B. | *------------------------------------------------------------------------*
gernot@hpuviea.UUCP (Gernot Kunz) (11/21/89)
itkin@mrspoc.Transact.COM (Steven M. List) writes: >hemphill@csc000.csc.ti.com (Charles Hemphill) writes: >> List the 10 oldest employees. > select name, birthdate > from employee > order by birthdate desc >and pipe the result through "head -10"? >Otherwise, there's no way to set a limit in SQL. Oops, what a tricky UNIX hack to circumvent an obviously missing SQL functionality. In ORACLE you could do THIS: create table temptable(name ..., birthdate ...); insert into temptable select name, birthdate from employee order by birthdate desc; select name,birthdate from temptable where rownum <= 10 ; and remain in pure SQL. It requires a temporary table, though. --- Gernot Kunz gernot@hpuviea.UUCP Hewlett-Packard Austria GmbH, ...mcvax!tuvie!hpuviea!gernot A-1220 Vienna, Austria ...hplabs!hpfcla!hpbbn!hpuviea!gernot (0043) (222) 2500 x428 (9-18 CET)
scottj@ncrcae.Columbia.NCR.COM (L. Scott Johnson) (11/21/89)
In article <98813@ti-csl.csc.ti.com> hemphill@ti.com (Charles Hemphill) writes: > >Can the following type of question be answered with an SQL expression? > > List the 10 oldest employees. > >Basically, I'd like the top N tuples from an ordered table. I don't >want to use embedded SQL. Since the table is ordered, you could use the following in ORACLE (which supplies a column called rownum, I'm not sure how or if other products offer this feature): select empname from emp where rownum <=10 Hope this helps (despite it's lack of generality) ---------- L. Scott
nico@unify.uucp (Nico Nierenberg) (11/22/89)
In article <1217@hpuviea.UUCP> gernot@hpuviea.UUCP (Gernot Kunz) writes: > >Oops, what a tricky UNIX hack to circumvent an obviously >missing SQL functionality. In ORACLE you could do THIS: > > create table temptable(name ..., birthdate ...); > > insert into temptable > select name, birthdate > from employee > order by birthdate desc; > > select name,birthdate > from temptable > where rownum <= 10 ; > >and remain in pure SQL. It requires a temporary table, though. > You have got to be kidding to call this "pure" SQL. You are relying on an artifact of the Oracle system that the rownum can be reliably counted on in a new table. I actually think that this is less desirable than Steven's original idea. If you don't like the Unix flavor then use some other filter or embedded SQL.
john@anasaz.UUCP (John Moore) (11/22/89)
In article <1217@hpuviea.UUCP> gernot@hpuviea.UUCP (Gernot Kunz) writes: ]itkin@mrspoc.Transact.COM (Steven M. List) writes: ] ]>hemphill@csc000.csc.ti.com (Charles Hemphill) writes: ] ]>> List the 10 oldest employees. ] ]> select name, birthdate ]> from employee ]> order by birthdate desc ]>and pipe the result through "head -10"? ]>Otherwise, there's no way to set a limit in SQL. ] ]Oops, what a tricky UNIX hack to circumvent an obviously ]missing SQL functionality. In ORACLE you could do THIS: ] ] create table temptable(name ..., birthdate ...); ] ] insert into temptable ] select name, birthdate ] from employee ] order by birthdate desc; ] ] select name,birthdate ] from temptable ] where rownum <= 10 ; ] ]and remain in pure SQL. It requires a temporary table, though. In practical terms, however, this means that all rows were read from the database, inserted into the temptable, after which ten rows were extracted. If one is concerned about per-query performance (as we are when we are doing 20 queries per second!), this doesn't do the job. Try this for an even harder one: Table Frod: Location: Name: Day_of_Birth: Query: Give me the first ten names by date of birth where location is one of a set of locations which is a small subset of all the locations. Does anyone know how do do this one (you may choose any index you want) in either SQL or embedded SQL without the RDBMS having to do a lot of extra work? One way I know is to have N cursors open, one per location, and then do a "merge" on them. This, however, is hard to generalize in embedded C (especially in Informix, which we are using). Even then, RDBMS' tend to do read-ahead on cursors, causing extra I/O. The other approach I thought of is to write a query like the one below, and hope the optimizer is smart enough to make it efficient. Does someone have an optimizer which would generate a GOOD strategy for: SELECT Name, Location FROM Frod WHERE Location in ( "Phoenix", "Atlanta", "Denver", "Seattle" ) AND Day_of_Birth >= 12/31/47 ORDER BY Day_of_Birth ; Note: 500 locations, 1000 names at each, days distributed unevenly over 1 year from 6/30/46 to 6/30/47. CREATE INDEX ...any way you want, but keep in mind that this data is dynamic, with lots of insertions to the table. Definition of good: performance >= what could be achieved by doing it by hand using low level operations. If your optimizer generates a nice strategy for this, please post what the strategy would be - especially if there are neat tricks. I am assuming that the RDBMS has RAM cache, logging etc. as are used in OLTP versions such as Oracle 6 or Informix Turbo or others. -- John Moore (NJ7E) mcdphx!anasaz!john asuvax!anasaz!john (602) 861-7607 (day or eve) long palladium, short petroleum 7525 Clearwater Pkwy, Scottsdale, AZ 85253 The 2nd amendment is about military weapons, NOT JUST hunting weapons!
hargrove@harlie.sgi.com (Mark Hargrove) (11/23/89)
In article <1217@hpuviea.UUCP> gernot@hpuviea.UUCP (Gernot Kunz) writes: >itkin@mrspoc.Transact.COM (Steven M. List) writes: > >>hemphill@csc000.csc.ti.com (Charles Hemphill) writes: > >>> List the 10 oldest employees. > >> select name, birthdate >> from employee >> order by birthdate desc >>and pipe the result through "head -10"? >>Otherwise, there's no way to set a limit in SQL. > >Oops, what a tricky UNIX hack to circumvent an obviously >missing SQL functionality. In ORACLE you could do THIS: > > create table temptable(name ..., birthdate ...); > > insert into temptable > select name, birthdate > from employee > order by birthdate desc; > > select name,birthdate > from temptable > where rownum <= 10 ; > >and remain in pure SQL. It requires a temporary table, though. ^^^^^^^^ Just which *pure* SQL are you referring to?!? I think your version is just as much a "hack" as you claim Steven's is. Foo. His may take advantage of a tool outside of SQL, but at least it will work for ANY dialect of SQL (under Unix, obviously; but there's a way to do the same thing under VMS). -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Mark Hargrove Silicon Graphics, Inc. email: hargrove@harlie.corp.sgi.com 2011 N.Shoreline Drive voice: 415-962-3642 Mt.View, CA 94039
jkrueger@dgis.dtic.dla.mil (Jon) (11/23/89)
john@anasaz.UUCP (John Moore) writes: [For a db like +------+ | Frod | Location Name Birthdate +------+----------+------+------------+ | Atlanta | Doe | 1 Sep 1960 | | Boston | Ray | 2 Sep 1960 | +----------+------+------------+ ] >Query: Give me the first ten names by date of birth where > location is one of a set of locations which is a small subset > of all the locations. select * from frod where location = "Boston" or location = "Hartford" order by birthdate; Alternatively: select * from frod f, locations l where f.location = l.location order by birthdate; No cursors necessary or desirable. No 3GL necessary. The problem is trivial. This is exactly what DBMS are designed to do. Most are quite good at it. >Does someone have an optimizer which would generate a GOOD strategy for: > SELECT Name, Location > FROM Frod > WHERE Location in ( "Phoenix", "Atlanta", "Denver", "Seattle" ) > AND Day_of_Birth >= 12/31/47 > ORDER BY Day_of_Birth ; Yes. Most of us do. This is a simple single-table query. The optimization possibilities are small and amount to deciding whether to select first on location or birth date. If exactly one of them has an index, it's a good candidate. All commercial products can do this much. If both or neither have indices, or as an additional check, we could examine data distributions in each column and predict which would return fewer rows. In this example, if locations were evenly distributed among cities but only 10% of the birth dates were after 12/31/47, birth date would be a better candidate. To my knowledge, only INGRES's query optimizer does this. After that, we could check integrities defined on the columns. If location draws from a list of valid locations, and "Phoenix" isn't on the list, we can skip checking for it. To my knowledge, no commercial product does this. Probably none supports integrities in a sufficiently well-defined manner to make this optimization safe anyway. For instance, consider integrities maintained by triggers. In summary, three optimizations is about it, and they all boil down to deciding which of two columns to select on first. After that, execution speed is a near-linear function of disk-memory bandwidth. If the table size is small with respect to memory size, caching can change this to a very-near-linear function of processor speed. >Definition of good: performance >= what could be achieved by doing it >by hand using low level operations. Will you add in the labor costs of maintaining the low level optimizations? If not, long will your hand crafted solution compete on existing hardware with the portable solution as it continues to run on faster hardware? -- Jon -- Jonathan Krueger jkrueger@dtic.dla.mil uunet!dgis!jkrueger Isn't it interesting that the first thing you do with your color bitmapped window system on a network is emulate an ASR33?
jim@bilpin.UUCP (JimG) (12/01/89)
{v_databases.10} IN ARTICLE <98813@ti-csl.csc.ti.com>, hemphill@csc000.csc.ti.com (Charles Hemphill) WRITES: > Can the following type of question be answered with an SQL expression? > > List the 10 oldest employees. > > Basically, I'd like the top N tuples from an ordered table. I don't > want to use embedded SQL. It can be done in one SQL statement as follows : SELECT * FROM tablename tablealias WHERE 10 > ( SELECT COUNT(*) FROM tablename WHERE tablename.age > tablealias.age ) Now isn't that neat? How it works is : for each row, select the number of rows where the age is greater than the current row; if there are less than 10 rows which satisfy this condition, then the current row must be amongst the 10 oldest, so print it. -- Another Fine Product from <mcvax!ukc!icdoc!bilpin!jim> <jim@bilpin.uucp> {JimG : Hatfield, England} Programmers' Maxim If it's not aesthetically pleasing, it's probably wrong
monty@delphi.uchicago.edu (Monty Mullig) (12/03/89)
In article <2510@bilpin.UUCP> jim@bilpin.UUCP (JimG) writes: > IN ARTICLE <98813@ti-csl.csc.ti.com>, > hemphill@csc000.csc.ti.com (Charles Hemphill) WRITES: > >> Can the following type of question be answered with an SQL expression? >> >> List the 10 oldest employees. >> > > It can be done in one SQL statement as follows : > > SELECT * FROM tablename tablealias > WHERE 10 > ( SELECT COUNT(*) FROM tablename > WHERE tablename.age > tablealias.age > ) > > Now isn't that neat? you mean, a statement like this: select * from master m1, master m2 where 10 > (select count(*) from m1 where m1.bdate < m2.bdate); funny, but INGRES (5.0) on my machine choked on this query. now, is INGRES at fault or is the SQL wrong ? --monty
jim@bilpin.UUCP (JimG) (12/04/89)
#{v_databases.10a} IN ARTICLE <6518@tank.uchicago.edu>, monty@delphi.uchicago.edu (Monty Mullig) WRITES: > IN ARTICLE <2510@bilpin.UUCP> jim@bilpin.UUCP ( that's me ) WRITES: > > > IN ARTICLE <98813@ti-csl.csc.ti.com>, > > hemphill@csc000.csc.ti.com (Charles Hemphill) WRITES: > >> Can the following type of question be answered with an SQL expression? > >> List the 10 oldest employees. > > > > It can be done in one SQL statement as follows : > > SELECT * FROM tablename tablealias > > WHERE 10 > ( SELECT COUNT(*) FROM tablename > > WHERE tablename.age > tablealias.age > > ) > > > you mean, a statement like this: > select * from master m1, master m2 > where 10 > (select count(*) from m1 > where m1.bdate < m2.bdate); > funny, but INGRES (5.0) on my machine choked on this query. now, is > INGRES at fault or is the SQL wrong ? Well, no, I didn't mean that, which is why I didn't write that. The query structured as I entered it worked fine in Oracle 5 - obviously I checked it before posting - the approach is sound. Please e-mail if you want to discuss the problems with your alternative - they aren't relevant to the original question, so I won't cover them here. Some points to bear in mind with regard to this solution: a) the 'age' would obviously not be held as a column in the table - 'age' would be a function calculating the age from birth date and current date - preferably to an accuracy which would minimise the possibility of two or more people having the same age; because b) if two or more people do have the same age, even to the day (minute, picosecond, ...) the query may return more than 10 rows, if some of those people occur in the oldest 10; eg. if age is only calculated to the year, and the 8 oldest are 62, and the next four oldest are 60, you will get back 12 rows - in these circumstances the query can only return the set of rows within which the oldest 10 will be contained, as the data is not sufficiently accurate to do otherwise; c) the same approach should be feasible for cutting out a section of rows, by setting both top and bottom limits, with two sub-queries, although it's going to be reading the table an awful lot of times. -- Another Fine Product from <mcvax!ukc!icdoc!bilpin!jim> <jim@bilpin.uucp> {JimG : Hatfield, England} Programmers' Maxim If it's not aesthetically pleasing, it's probably wrong
dlw@odi.com (Dan Weinreb) (12/05/89)
In article <2510@bilpin.UUCP> jim@bilpin.UUCP (JimG) writes: > Basically, I'd like the top N tuples from an ordered table. I don't > want to use embedded SQL. It can be done in one SQL statement as follows : SELECT * FROM tablename tablealias WHERE 10 > ( SELECT COUNT(*) FROM tablename WHERE tablename.age > tablealias.age ) Now isn't that neat? Yes, it's neat. I'd be interested in knowing how fast it is, compared to the using embedded SQL with cursors (asking for them all, and just using a cursor to get the first ten), on some real-world commercial RDBMS's. It seems to me that an extremely clever query optimizer could do a great job here, by realizing that all it needs to do is sort and grab the top ten, but I don't know whether or not any of the commercial products do this kind of optimization well.
mikei@ctdi.UUCP (Mike Israel) (12/07/89)
In article <2511@bilpin.UUCP> you answered the question:: >> >> >> Can the following type of question be answered with an SQL expression? >> >> List the 10 oldest employees. >> > >> > It can be done in one SQL statement as follows : >> > SELECT * FROM tablename tablealias >> > WHERE 10 > ( SELECT COUNT(*) FROM tablename >> > WHERE tablename.age > tablealias.age >> > ) > You then stated that: > > Some points to bear in mind with regard to this solution: > > b) if two or more people do have the same age, even to the day (minute, > picosecond, ...) the query may return more than 10 rows, if some of > those people occur in the oldest 10; eg. if age is only calculated to > the year, and the 8 oldest are 62, and the next four oldest are 60, > you will get back 12 rows - in these circumstances the query can only > return the set of rows within which the oldest 10 will be contained, > as the data is not sufficiently accurate to do otherwise; I am not familiar with Oracle, but I can confirm that the query as written above does not work in Ingres (at least not as entered). I posted a follow up msg with a similiar solution that would work for Ingres: select * from mytable d where 10>=all (select count(age) from mytable d2 where d2.age >= d.age); I have run this query against a table containing multiple duplicate birthdates. I always get the OLDEST TEN employees, I never get back more than ten rows. Perhaps if Oracle will allow you to enter a query as I have shown it then you will get the desired ten rows without any unexpected extras. I would be interested to know if this works out. Regards, -- Michael A. Israel || uucp: mikei@ctdi.UUCP || ...!uunet!cbmvax!ctdi1!ctdi Communications Test Design Inc. || West Chester, PA || Please direct all complaints to /dev/null