[comp.databases] SQL GROUP BY queries and empty tables

kent@manzi.unx.sas.com (Paul Kent) (05/17/91)

hello,

SQL queries with summary functions when the where clause 
eliminates all rows from consideration have a strange
quirk that i'd like to know more about.


> create table gr ( a int, b char );
> insert into gr values( 1, 'a' );
> insert into gr values( 2, 'b' );


> select count(*) from gr;

this gets predictable results.. a single row with "2"



> select count(*) from gr where a > 10;

this too. there are no rows that satisfy the where clause, so we
get a single row showing that "0" records were counted.



> select    count(*) from gr where a > 10 group by b ;
> select b, count(*) from gr where a > 10 group by b ;

this is wierd. there were no rows, so presumably there are
no groups. however 

  o) DB2(v2r2)
  o) Oracle(v6 on a vax)
  o) RDB(sorry, no version details, vms5.3 however)

dont produce any output for these latter queries.

does anyone know the rationale behind this. 
can they point at the rules in the standard 
(ansi 135.1 or draft stndard for sql2)
that say that this they way it should be?

our implementation of SQL produces a single row for all
of these queries. and a user has asked why the difference.
unfortunately, thats the way we read it in the standard.


what does your favorite SQL database do?
and do you like it that way?


thanks,
paul.

--
Paul Kent (SQL r&d)                   " nothing ventured, nothing disclaimed "
kent@unx.sas.com         SAS Institute Inc, SAS Campus Dr, Cary NC 27513-2414.

jfr@locus.com (Jon Rosen) (05/17/91)

In article <1991May16.212202.22391@unx.sas.com> kent@manzi.unx.sas.com (Paul Kent) writes:
>SQL queries with summary functions when the where clause 
>eliminates all rows from consideration have a strange
>quirk that i'd like to know more about.
>> create table gr ( a int, b char );
>> insert into gr values( 1, 'a' );
>> insert into gr values( 2, 'b' );
>
>> select count(*) from gr;
>this gets predictable results.. a single row with "2"
>> select count(*) from gr where a > 10;
>this too. there are no rows that satisfy the where clause, so we
>get a single row showing that "0" records were counted.
>> select    count(*) from gr where a > 10 group by b ;
>> select b, count(*) from gr where a > 10 group by b ;
>this is wierd. there were no rows, so presumably there are
>no groups. however 
>  o) DB2(v2r2)
>  o) Oracle(v6 on a vax)
>  o) RDB(sorry, no version details, vms5.3 however)
>dont produce any output for these latter queries.
>does anyone know the rationale behind this. 
>can they point at the rules in the standard 
>(ansi 135.1 or draft stndard for sql2)
>that say that this they way it should be?
>our implementation of SQL produces a single row for all
>of these queries. and a user has asked why the difference.
>unfortunately, thats the way we read it in the standard.
>what does your favorite SQL database do?
>and do you like it that way?

I went and looked at Date's book on the SQL Standard (I unfortunately
don't have the actual Standard definition book here at work today).
It is obvious that the issues involving COUNT have all sorts of
weird exceptions...
 
  (a) for other aggregates, you can add DISTINCT before the column
      name; for COUNT, the DISTINCT is required UNLESS you use
      COUNT(*) in which case the DISTINCT is NOT ALLOWED.
  (b) Nulls are eliminated in other expressions EXCEPT for COUNT
      where nulls are never eliminated.
  (c) If the argument happens to be a empty set, other functions
      return null; COUNT returns 0.

This last one is probably on point here.  SELECT COUNT(*) without
a GROUP BY is specially defined to return one row with the COUNT
of all rows in the table (after the WHERE clause has been applied).
Even if the WHERE clause eliminates all of the rows, you still get
a value of zero returned.  Apparently, when the GROUP BY is added,
the WHERE clause is used PRIOR to the grouping and there are no
rows left to partition so no results are returned.  This is clearly
one of the problems with SQL's lack of orthogonality.
 
By the way, one could argue that your implementation is incorrect also.
Since SELECT COUNT(*) returns on row (representing the rows of the
table) why shouldn't the GROUP BY clause then partition the table 
and count the rows for each group, returning all B's with the
COUNT of zero for each row (this would mostly make sense in your
4th example where B is included in the query).  
 
I will further verify this in my standard definition manual.  
 
Jon Rosen