[comp.databases] Informix sql sum/group by question

carver@bsadrc.UUCP (Darrel R. Carver) (03/21/89)

I have a order entry application written in Informix SQL.  The cash
transaction system has a table that looks somewhat like this

	create table tr_dat (
		tr_num	serial(1000),	
		tr_vendor integer,	{ link to vendor table }
		tr_date	date,		{ Date of transaction	}
		tr_amt	money(5,2)	{ Amount of transaction }
	);

I wish to sum up and display all the transactions for a particular
vendor with the following statement

	select sum(amt), amt from tr_dat where tr_vendor = 5;

Informix gives me an error saying the amt must be in a group by
statement, i.e.

	select sum(amt), amt from tr_dat where tr_vendor = 5 group by amt;

My question is why?  Why should I have to have the amt in a group by
clause?
-- 
Darrel R Carver	            -	A clever quote goes here.  Have we got any in
CommTek Publishing Company  -	the unquotable quotes?
Vienna, Va  22180           -
uunet!bsadrc!carver	    -			Mr. Bear

jeffl@sybase.Sybase.COM (Jeff Lichtman) (03/22/89)

> 	select sum(amt), amt from tr_dat where tr_vendor = 5;
> 
> Informix gives me an error saying the amt must be in a group by
> statement....
> 
> My question is why?  Why should I have to have the amt in a group by
> clause?
> -- 
> Darrel R Carver	            -	A clever quote goes here.  Have we got any in

Informix does it according to the ANSI standard.  Sybase doesn't have this
restriction, but I know why it exists, and I will try to explain.

Loosely, the rule is that if you have aggregates in your query, no columns may
appear in the query that are not inside an aggregate or part of the
group by clause.  The rule makes it impossible to write queries that do not
associate aggregated values with un-aggregated columns.

In ANSI SQL, the result of a query is either grouped or not grouped.  If it
is not grouped, all the aggregates in the query produce single values.  If
it is grouped, each aggregate produces one value for each group.

Suppose you have the following table, TABX:

	COL1 | COL2 | COL3
	------------------
	  1  |   1  |  1
	  2  |   1  |  1
	  3  |   1  |  2
	  4  |   2  |  1
	  5  |   2  |  2

The query:

	select sum(COL1) from TABX

will produce:

	-----
	 15

The query:

	select sum(COL1), COL2 from TABX
	group by COL2

will produce:

	     | COL2
	------------
	  6  |  1
	  9  |  2

Now, what should the following produce:

	select sum(COL1), COL2 from TABX

There is no association between the aggregate and COL2.  In ANSI SQL,
not only is the syntax illegal, but there is no definition for what this
should produce if it were legal.  In Sybase, we give the answer:

	     | COL2
	------------
	  15 |  1
	  15 |  2
	  15 |  3
	  15 |  4
	  15 |  5

That is, since there is no grouping, Sybase simply duplicates the scalar
(single-value) result for each value of COL2.  This is the only way I
can think of to interpret this query, but I can't imagine why someone
would want this result.  It would make more sense to do two selects:

	select sum(COL1) from TABX
	select COL1 from TABX

Now, try to think of what the following query should produce:

	select sum(COL1), COL2, COL3 from TABX
	group by COL2

This query is also illegal under ANSI SQL.  COL3 is neither in an aggregate
nor in the "group by" clause.  Not only does the standard make this illegal,
it doesn't define what such a query should do if it were legal, because
there is no association between COL3 and the groups produced by the query.

Sybase produces this result:

	     | COL2 | COL3
	-------------------
	  6  |   1  |  1
	  6  |   1  |  1
	  6  |   1  |  2
	  9  |   2  |  1
	  9  |   2  |  2

Eh?  How do we get that?

In Sybase, if you have a query with columns that aren't in the "group by"
clause or an aggregate, we first form an intermediate result (which you
can see is the same as if we had left COL3 out of the query):

	     | COL2
	------------
	  6  |  1
	  9  |  2
	
and then join this with the base table (TABX) on the grouped column (COL2).
I maintain that this is a reasonable way to interpret this query - since
there is no explicit association of the aggregate values with the column
SUM3, we match each aggregate value with all of the rows where the grouping
columns are equal.

I think most people would admit that there is no intuitively obvious result
for such a query.  Not only that, but it's hard to implement the interpretation
I have given above.  That's why ANSI and many products disallow it.
---
Jeff Lichtman at Sybase
{mtxinu,pacbell}!sybase!jeffl  -or- jeffl@sybase.com
"Saints should always be judged guilty until they are proved innocent..."

wcb@sorsac.UUCP (20SO940210) (03/24/89)

In article <3462@sybase.sybase.com> jeffl@sybase.Sybase.COM (Jeff Lichtman) writes:
>In article <whatever> (Darrel R Carver) writes:
>> 	select sum(amt), amt from tr_dat where tr_vendor = 5;
>> 
>> Informix gives me an error saying the amt must be in a group by
>> statement....
>> 
>> My question is why?  Why should I have to have the amt in a group by
>> clause?
>> -- 
>> Darrel R Carver	            -	A clever quote goes here.  Have we got any in

>Informix does it according to the ANSI standard.  Sybase doesn't have this
>restriction, but I know why it exists, and I will try to explain.

An excellent explanation then follows, of which I have deleted all but the
following portion:

>- - -
>Suppose you have the following table, TABX:
>
>	COL1 | COL2 | COL3
>	------------------
>	  1  |   1  |  1
>	  2  |   1  |  1
>	  3  |   1  |  2
>	  4  |   2  |  1
>	  5  |   2  |  2
>- - -
>Now, what should the following produce:
>
>	select sum(COL1), COL2 from TABX
>
>There is no association between the aggregate and COL2.  In ANSI SQL,
>not only is the syntax illegal, but there is no definition for what this
>should produce if it were legal.  In Sybase, we give the answer:
>
>	     | COL2
>	------------
>	  15 |  1
>	  15 |  2
>	  15 |  3
>	  15 |  4
>	  15 |  5
>
>That is, since there is no grouping, Sybase simply duplicates the scalar
>(single-value) result for each value of COL2.  This is the only way I
>can think of to interpret this query, but I can't imagine why someone
>would want this result.  It would make more sense to do two selects:
>
>	select sum(COL1) from TABX
>	select COL1 from TABX
>
>- - -

With the above values in TABX, it seems the statement

	select sum(COL1), COL2 from TABX

would produce the result:

	     | COL2
	------------
	  15 |  1
	  15 |  1
	  15 |  1
	  15 |  2
	  15 |  2

Which implies to me that Sybase will make more than one pass through
the data, per select statement - which, of course, Informix will not do.

To try to get around this, I came up with the following:

	select col1 from tabx into temp temptab;
	insert into temptab values ("");
	insert into temptab select sum(col1) from tabx;
	select * from temptab;

Which will produce something like:

	col1

	   1
	   2
	   3
	   4
	   5

	  15

with a report of having retreived 7 rows. (In "running" this as a
".sql" file, the "retreive" statements that are produced can be
redirected to /dev/null - assuming you are running UNIX.)

If Sybase performs differently than I have indicated, I would be
interested in hearing about it.
 
-------------------------------------------------------------------------

-- 
"With friends like these, who needs hallucinations?" - Buddy Ryan in
                                                      "Night Court"

Bill Barksdale	        AT&T Network Systems            Atlanta, Ga.

jeffl@sybase.Sybase.COM (Jeff Lichtman) (03/26/89)

||	COL1 | COL2 | COL3
||	------------------
||	  1  |   1  |  1
||	  2  |   1  |  1
||	  3  |   1  |  2
||	  4  |   2  |  1
||	  5  |   2  |  2
|| - -
||	select sum(COL1), COL2 from TABX
|| - -
| 
| With the above values in TABX, it seems the statement
| would produce the result:
| 
| 	     | COL2
| 	------------
| 	  15 |  1
| 	  15 |  1
| 	  15 |  1
| 	  15 |  2
| 	  15 |  2

Oops.  Yes, you're right.  My original posting had the wrong result for
this query.  The result I gave was for the query:

	select sum(COL1), COL1 from TABX
---
Jeff Lichtman at Sybase
{mtxinu,pacbell}!sybase!jeffl  -or- jeffl@sybase.com
"Saints should always be judged guilty until they are proved innocent..."