[comp.databases] Simple SQL question

hemphill@csc000.csc.ti.com (Charles Hemphill) (11/18/89)

Can the following type of question be answered with an SQL expression?

	List the 10 oldest employees.

Basically, I'd like the top N tuples from an ordered table.  I don't
want to use embedded SQL.

itkin@mrspoc.Transact.COM (Steven M. List) (11/20/89)

hemphill@csc000.csc.ti.com (Charles Hemphill) writes:

>Can the following type of question be answered with an SQL expression?

>	List the 10 oldest employees.

>Basically, I'd like the top N tuples from an ordered table.  I don't
>want to use embedded SQL.

Wouldn't this be something like

	select name, birthdate
	from employee
	order by birthdate desc

and pipe the result through "head -10"?

Otherwise, there's no way to set a limit in SQL.
-- 
 +----------------------------------------------------------------------------+
 :                Steven List @ Transact Software, Inc. :^>~                  :
 :           Chairman, Unify User Group of Northern California                :
 :     {apple,coherent,limbo,mips,pyramid,ubvax}!itkin@guinan.Transact.COM    :

ltung@oracle.com (Lei Tung) (11/21/89)

In article <7@csc000.csc.ti.com>, hemphill@csc000.csc.ti.com (Charles
Hemphill) writes:

> Can the following type of question be answered with an SQL
> expression?
>
>     List the 10 oldest employees.

The following works in Oracle's SQL (I really have no idea if it is
ANSI compliant).  It is tricky since a subquery is being compared to a
constant.  I make no claims as to this statement's performance.


select name, birthdate
  from employee e1
 where 10 >
       (select count(*)
          from employee e2
         where e2.birthdate < e1.birthdate);

Note that results may be screwed if employees share birthdays.

Hope this helps,

Lei Tung
Oracle Corp.
--
  *------------------------------------------------------------------------*
  |  email:  ltung@oracle.com                                              |
  |     or:  {hplabs,apple,pacbell,mit-eddie,uunet}!oracle!ltung           |
  |                                                                        |
  |  It'll all work out,                                                   |
  |      you know I'm a little freaked out . . .                           |
  |                             - D. B.                                    |
  *------------------------------------------------------------------------*

gernot@hpuviea.UUCP (Gernot Kunz) (11/21/89)

itkin@mrspoc.Transact.COM (Steven M. List) writes:

>hemphill@csc000.csc.ti.com (Charles Hemphill) writes:

>>	List the 10 oldest employees.

>	select name, birthdate
>	from employee
>	order by birthdate desc
>and pipe the result through "head -10"?
>Otherwise, there's no way to set a limit in SQL.

Oops, what a tricky UNIX hack to circumvent an obviously
missing SQL functionality. In ORACLE you could do THIS:

       create table temptable(name ..., birthdate ...);

       insert into temptable
       select name, birthdate
       from employee
       order by birthdate desc;

       select name,birthdate
       from temptable
       where rownum <= 10 ;

and remain in pure SQL. It requires a temporary table, though.

---
Gernot Kunz				gernot@hpuviea.UUCP 
Hewlett-Packard Austria GmbH, 		...mcvax!tuvie!hpuviea!gernot
A-1220 Vienna, Austria			...hplabs!hpfcla!hpbbn!hpuviea!gernot
(0043) (222) 2500 x428 (9-18 CET)

scottj@ncrcae.Columbia.NCR.COM (L. Scott Johnson) (11/21/89)

In article <98813@ti-csl.csc.ti.com> hemphill@ti.com (Charles Hemphill) writes:
>
>Can the following type of question be answered with an SQL expression?
>
>	List the 10 oldest employees.
>
>Basically, I'd like the top N tuples from an ordered table.  I don't
>want to use embedded SQL.

Since the table is ordered, you could use the following in ORACLE (which
supplies a column called rownum, I'm not sure how or if other products offer
this feature):

	select empname
	from emp
	where rownum <=10

Hope this helps (despite it's lack of generality)
----------
L. Scott

nico@unify.uucp (Nico Nierenberg) (11/22/89)

In article <1217@hpuviea.UUCP> gernot@hpuviea.UUCP (Gernot Kunz) writes:
>
>Oops, what a tricky UNIX hack to circumvent an obviously
>missing SQL functionality. In ORACLE you could do THIS:
>
>       create table temptable(name ..., birthdate ...);
>
>       insert into temptable
>       select name, birthdate
>       from employee
>       order by birthdate desc;
>
>       select name,birthdate
>       from temptable
>       where rownum <= 10 ;
>
>and remain in pure SQL. It requires a temporary table, though.
>

You have got to be kidding to call this "pure" SQL.  You are relying on
an artifact of the Oracle system that the rownum can be reliably
counted on in a new table.  I actually think that this is less desirable
than Steven's original idea.  If you don't like the Unix flavor then
use some other filter or embedded SQL.

john@anasaz.UUCP (John Moore) (11/22/89)

In article <1217@hpuviea.UUCP> gernot@hpuviea.UUCP (Gernot Kunz) writes:
]itkin@mrspoc.Transact.COM (Steven M. List) writes:
]
]>hemphill@csc000.csc.ti.com (Charles Hemphill) writes:
]
]>>	List the 10 oldest employees.
]
]>	select name, birthdate
]>	from employee
]>	order by birthdate desc
]>and pipe the result through "head -10"?
]>Otherwise, there's no way to set a limit in SQL.
]
]Oops, what a tricky UNIX hack to circumvent an obviously
]missing SQL functionality. In ORACLE you could do THIS:
]
]       create table temptable(name ..., birthdate ...);
]
]       insert into temptable
]       select name, birthdate
]       from employee
]       order by birthdate desc;
]
]       select name,birthdate
]       from temptable
]       where rownum <= 10 ;
]
]and remain in pure SQL. It requires a temporary table, though.

In practical terms, however, this means that all rows were
read from the database, inserted into the temptable, after which
ten rows were extracted. If one is concerned about per-query
performance (as we are when we are doing 20 queries
per second!), this doesn't do the job.

Try this for an even harder one:

Table Frod:
    Location:
    Name:
    Day_of_Birth:

Query: Give me the first ten names by date of birth where
  location is one of a set of locations which is a small subset
  of all the locations.

Does anyone know how do do this one (you may choose any index you want)
in either SQL or embedded SQL without the RDBMS having to do a lot of
extra work? One way I know is to have N cursors open, one per
location, and then do a "merge" on them. This, however, is hard to
generalize in embedded C (especially in Informix, which we are using).
Even then, RDBMS' tend to do read-ahead on cursors, causing extra I/O. 
The other approach I thought of is to write a query like the one below, 
and hope the optimizer is smart enough to make it efficient.

Does someone have an optimizer which would generate a GOOD strategy for:
  SELECT Name, Location 
    FROM Frod 
    WHERE Location in ( "Phoenix", "Atlanta", "Denver", "Seattle" )
     AND  Day_of_Birth >= 12/31/47
    ORDER BY Day_of_Birth ;

Note: 500 locations, 1000 names at each, days distributed unevenly
  over 1 year from 6/30/46 to 6/30/47.

CREATE INDEX ...any way you want, but keep in mind that this data is 
   dynamic, with lots of insertions to the table.

Definition of good: performance >= what could be achieved by doing it
  by hand using low level operations.

If your optimizer generates a nice strategy for this, please
post what the strategy would be - especially if there are
neat tricks. I am assuming that the RDBMS has RAM cache,
logging etc. as are used in OLTP versions such as Oracle 6
or Informix Turbo or others.
-- 
John Moore (NJ7E)           mcdphx!anasaz!john asuvax!anasaz!john
(602) 861-7607 (day or eve) long palladium, short petroleum
7525 Clearwater Pkwy, Scottsdale, AZ 85253
The 2nd amendment is about military weapons, NOT JUST hunting weapons!

hargrove@harlie.sgi.com (Mark Hargrove) (11/23/89)

In article <1217@hpuviea.UUCP> gernot@hpuviea.UUCP (Gernot Kunz) writes:

>itkin@mrspoc.Transact.COM (Steven M. List) writes:
>
>>hemphill@csc000.csc.ti.com (Charles Hemphill) writes:
>
>>>	List the 10 oldest employees.
>
>>	select name, birthdate
>>	from employee
>>	order by birthdate desc
>>and pipe the result through "head -10"?
>>Otherwise, there's no way to set a limit in SQL.
>
>Oops, what a tricky UNIX hack to circumvent an obviously
>missing SQL functionality. In ORACLE you could do THIS:
>
>      create table temptable(name ..., birthdate ...);
>
>      insert into temptable
>      select name, birthdate
>      from employee
>      order by birthdate desc;
>
>      select name,birthdate
>      from temptable
>      where rownum <= 10 ;
>
>and remain in pure SQL. It requires a temporary table, though.
               ^^^^^^^^

Just which *pure* SQL are you referring to?!?  I think your version
is just as much a "hack" as you claim Steven's is.  Foo.  His may
take advantage of a tool outside of SQL, but at least it will work for
ANY dialect of SQL (under Unix, obviously; but there's a way to do the
same thing under VMS).

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Mark Hargrove                                       Silicon Graphics, Inc.
email: hargrove@harlie.corp.sgi.com                 2011 N.Shoreline Drive
voice: 415-962-3642                                 Mt.View, CA 94039

jkrueger@dgis.dtic.dla.mil (Jon) (11/23/89)

john@anasaz.UUCP (John Moore) writes:

[For a db like
	+------+
	| Frod | Location   Name   Birthdate
	+------+----------+------+------------+
               | Atlanta  |  Doe | 1 Sep 1960 |
               |  Boston  |  Ray | 2 Sep 1960 |
	       +----------+------+------------+
]
>Query: Give me the first ten names by date of birth where
>  location is one of a set of locations which is a small subset
>  of all the locations.

select * from frod where location = "Boston" or location = "Hartford"
	order by birthdate;

Alternatively:

select * from frod f, locations l where f.location = l.location
	order by birthdate;

No cursors necessary or desirable.  No 3GL necessary.  The problem
is trivial.  This is exactly what DBMS are designed to do.  Most
are quite good at it.

>Does someone have an optimizer which would generate a GOOD strategy for:
>  SELECT Name, Location 
>    FROM Frod 
>    WHERE Location in ( "Phoenix", "Atlanta", "Denver", "Seattle" )
>     AND  Day_of_Birth >= 12/31/47
>    ORDER BY Day_of_Birth ;

Yes.  Most of us do.  This is a simple single-table query.  The
optimization possibilities are small and amount to deciding whether to
select first on location or birth date.  If exactly one of them has an
index, it's a good candidate.  All commercial products can do this
much.

If both or neither have indices, or as an additional check, we could
examine data distributions in each column and predict which would
return fewer rows.  In this example, if locations were evenly
distributed among cities but only 10% of the birth dates were after
12/31/47, birth date would be a better candidate.  To my knowledge,
only INGRES's query optimizer does this.

After that, we could check integrities defined on the columns.  If
location draws from a list of valid locations, and "Phoenix" isn't on
the list, we can skip checking for it.  To my knowledge, no commercial
product does this.  Probably none supports integrities in a
sufficiently well-defined manner to make this optimization safe
anyway.  For instance, consider integrities maintained by triggers.

In summary, three optimizations is about it, and they all boil down to
deciding which of two columns to select on first.  After that, execution
speed is a near-linear function of disk-memory bandwidth.  If the table
size is small with respect to memory size, caching can change this to
a very-near-linear function of processor speed.

>Definition of good: performance >= what could be achieved by doing it
>by hand using low level operations.

Will you add in the labor costs of maintaining the low level
optimizations?  If not, long will your hand crafted solution compete on
existing hardware with the portable solution as it continues to run on
faster hardware?

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
Isn't it interesting that the first thing you do with your
color bitmapped window system on a network is emulate an ASR33?

jim@bilpin.UUCP (JimG) (12/01/89)

    {v_databases.10}


    IN ARTICLE <98813@ti-csl.csc.ti.com>, 
		hemphill@csc000.csc.ti.com (Charles Hemphill) WRITES:
 
>   Can the following type of question be answered with an SQL expression?
>   
>   	List the 10 oldest employees.
>   
>   Basically, I'd like the top N tuples from an ordered table.  I don't
>   want to use embedded SQL.

    It can be done in one SQL statement as follows :

    SELECT * FROM tablename tablealias
	WHERE 10 > (	SELECT COUNT(*) FROM tablename
				WHERE tablename.age > tablealias.age
		   )

    Now isn't that neat?

    How it works is :
     for each row, select the number of rows where the age is greater than 
     the current row;
     if there are less than 10 rows which satisfy this condition, then the
     current row must be amongst the 10 oldest, so print it.
-- 
    Another Fine Product from <mcvax!ukc!icdoc!bilpin!jim> <jim@bilpin.uucp>
			   {JimG : Hatfield, England}
			       Programmers' Maxim
	    If it's not aesthetically pleasing, it's probably wrong

monty@delphi.uchicago.edu (Monty Mullig) (12/03/89)

In article <2510@bilpin.UUCP> jim@bilpin.UUCP (JimG) writes:

>    IN ARTICLE <98813@ti-csl.csc.ti.com>, 
>		hemphill@csc000.csc.ti.com (Charles Hemphill) WRITES:
> 
>>   Can the following type of question be answered with an SQL expression?
>>   
>>   	List the 10 oldest employees.
>>   
>
>    It can be done in one SQL statement as follows :
>
>    SELECT * FROM tablename tablealias
>	WHERE 10 > (	SELECT COUNT(*) FROM tablename
>				WHERE tablename.age > tablealias.age
>		   )
>
>    Now isn't that neat?


you mean, a statement like this:

select * from master m1, master m2 
	where 10 > (select count(*) from m1 
		    where m1.bdate < m2.bdate);


funny, but INGRES (5.0) on my machine choked on this query.  now, is
INGRES at fault or is the SQL wrong ?

--monty

jim@bilpin.UUCP (JimG) (12/04/89)

    #{v_databases.10a}

    IN ARTICLE <6518@tank.uchicago.edu>, 
		monty@delphi.uchicago.edu (Monty Mullig) WRITES:
>   IN ARTICLE <2510@bilpin.UUCP> jim@bilpin.UUCP ( that's me ) WRITES:
> 
> >    IN ARTICLE <98813@ti-csl.csc.ti.com>, 
> >		hemphill@csc000.csc.ti.com (Charles Hemphill) WRITES:
> >>   Can the following type of question be answered with an SQL expression?
> >>   	List the 10 oldest employees.
> >
> >    It can be done in one SQL statement as follows :
> >    SELECT * FROM tablename tablealias
> >	WHERE 10 > (	SELECT COUNT(*) FROM tablename
> >				WHERE tablename.age > tablealias.age
> >		   )
> 
> 
> you mean, a statement like this:
> select * from master m1, master m2 
> 	where 10 > (select count(*) from m1 
> 		    where m1.bdate < m2.bdate);
> funny, but INGRES (5.0) on my machine choked on this query.  now, is
> INGRES at fault or is the SQL wrong ?

    Well, no, I didn't mean that, which is why I didn't write that. The
    query structured as I entered it worked fine in Oracle 5 - obviously I
    checked it before posting - the approach is sound. Please e-mail if you
    want to discuss the problems with your alternative - they aren't
    relevant to the original question, so I won't cover them here.

    Some points to bear in mind with regard to this solution:
    a) the 'age' would obviously not be held as a column in the table -
       'age' would be a function calculating the age from birth date and
       current date - preferably to an accuracy which would minimise the
       possibility of two or more people having the same age; because
    b) if two or more people do have the same age, even to the day (minute,
       picosecond, ...) the query may return more than 10 rows, if some of
       those people occur in the oldest 10; eg. if age is only calculated to
       the year, and the 8 oldest are 62, and the next four oldest are 60,
       you will get back 12 rows - in these circumstances the query can only
       return the set of rows within which the oldest 10 will be contained,
       as the data is not sufficiently accurate to do otherwise;
    c) the same approach should be feasible for cutting out a section of
       rows, by setting both top and bottom limits, with two sub-queries,
       although it's going to be reading the table an awful lot of times.
-- 
    Another Fine Product from <mcvax!ukc!icdoc!bilpin!jim> <jim@bilpin.uucp>
			   {JimG : Hatfield, England}
			       Programmers' Maxim
	    If it's not aesthetically pleasing, it's probably wrong

dlw@odi.com (Dan Weinreb) (12/05/89)

In article <2510@bilpin.UUCP> jim@bilpin.UUCP (JimG) writes:

   >   Basically, I'd like the top N tuples from an ordered table.  I don't
   >   want to use embedded SQL.

       It can be done in one SQL statement as follows :

       SELECT * FROM tablename tablealias
	   WHERE 10 > (	SELECT COUNT(*) FROM tablename
				   WHERE tablename.age > tablealias.age
		      )

       Now isn't that neat?

Yes, it's neat.  I'd be interested in knowing how fast it is, compared
to the using embedded SQL with cursors (asking for them all, and just
using a cursor to get the first ten), on some real-world commercial
RDBMS's.  It seems to me that an extremely clever query optimizer
could do a great job here, by realizing that all it needs to do is
sort and grab the top ten, but I don't know whether or not any of the
commercial products do this kind of optimization well.

mikei@ctdi.UUCP (Mike Israel) (12/07/89)

In article <2511@bilpin.UUCP> you answered the question::
>> 
>> >>   Can the following type of question be answered with an SQL expression?
>> >>   	List the 10 oldest employees.
>> >
>> >    It can be done in one SQL statement as follows :
>> >    SELECT * FROM tablename tablealias
>> >	WHERE 10 > (	SELECT COUNT(*) FROM tablename
>> >				WHERE tablename.age > tablealias.age
>> >		   )
>

You then stated that:
>
>    Some points to bear in mind with regard to this solution:
>
>    b) if two or more people do have the same age, even to the day (minute,
>       picosecond, ...) the query may return more than 10 rows, if some of
>       those people occur in the oldest 10; eg. if age is only calculated to
>       the year, and the 8 oldest are 62, and the next four oldest are 60,
>       you will get back 12 rows - in these circumstances the query can only
>       return the set of rows within which the oldest 10 will be contained,
>       as the data is not sufficiently accurate to do otherwise;


I am not familiar with Oracle, but I can confirm that the query as written
above does not work in Ingres (at least not as entered).  I posted a follow
up msg with a similiar solution that would work for Ingres:

select * from mytable d
where 10>=all
        (select count(age)
         from mytable d2
         where d2.age >= d.age);

I have run this query against a table containing multiple duplicate birthdates.
I always get the OLDEST TEN employees, I never get back more than ten rows.
Perhaps if Oracle will allow you to enter a query as I have shown it then you 
will get the desired ten rows without any unexpected extras.  I would be
interested to know if this works out.

Regards,
-- 
Michael A. Israel               ||  uucp: mikei@ctdi.UUCP
                               	||        ...!uunet!cbmvax!ctdi1!ctdi
Communications Test Design Inc.	||
West Chester, PA                || Please direct all complaints to /dev/null