carver@bsadrc.UUCP (Darrel R. Carver) (03/21/89)
I have a order entry application written in Informix SQL. The cash transaction system has a table that looks somewhat like this create table tr_dat ( tr_num serial(1000), tr_vendor integer, { link to vendor table } tr_date date, { Date of transaction } tr_amt money(5,2) { Amount of transaction } ); I wish to sum up and display all the transactions for a particular vendor with the following statement select sum(amt), amt from tr_dat where tr_vendor = 5; Informix gives me an error saying the amt must be in a group by statement, i.e. select sum(amt), amt from tr_dat where tr_vendor = 5 group by amt; My question is why? Why should I have to have the amt in a group by clause? -- Darrel R Carver - A clever quote goes here. Have we got any in CommTek Publishing Company - the unquotable quotes? Vienna, Va 22180 - uunet!bsadrc!carver - Mr. Bear
jeffl@sybase.Sybase.COM (Jeff Lichtman) (03/22/89)
> select sum(amt), amt from tr_dat where tr_vendor = 5; > > Informix gives me an error saying the amt must be in a group by > statement.... > > My question is why? Why should I have to have the amt in a group by > clause? > -- > Darrel R Carver - A clever quote goes here. Have we got any in Informix does it according to the ANSI standard. Sybase doesn't have this restriction, but I know why it exists, and I will try to explain. Loosely, the rule is that if you have aggregates in your query, no columns may appear in the query that are not inside an aggregate or part of the group by clause. The rule makes it impossible to write queries that do not associate aggregated values with un-aggregated columns. In ANSI SQL, the result of a query is either grouped or not grouped. If it is not grouped, all the aggregates in the query produce single values. If it is grouped, each aggregate produces one value for each group. Suppose you have the following table, TABX: COL1 | COL2 | COL3 ------------------ 1 | 1 | 1 2 | 1 | 1 3 | 1 | 2 4 | 2 | 1 5 | 2 | 2 The query: select sum(COL1) from TABX will produce: ----- 15 The query: select sum(COL1), COL2 from TABX group by COL2 will produce: | COL2 ------------ 6 | 1 9 | 2 Now, what should the following produce: select sum(COL1), COL2 from TABX There is no association between the aggregate and COL2. In ANSI SQL, not only is the syntax illegal, but there is no definition for what this should produce if it were legal. In Sybase, we give the answer: | COL2 ------------ 15 | 1 15 | 2 15 | 3 15 | 4 15 | 5 That is, since there is no grouping, Sybase simply duplicates the scalar (single-value) result for each value of COL2. This is the only way I can think of to interpret this query, but I can't imagine why someone would want this result. It would make more sense to do two selects: select sum(COL1) from TABX select COL1 from TABX Now, try to think of what the following query should produce: select sum(COL1), COL2, COL3 from TABX group by COL2 This query is also illegal under ANSI SQL. COL3 is neither in an aggregate nor in the "group by" clause. Not only does the standard make this illegal, it doesn't define what such a query should do if it were legal, because there is no association between COL3 and the groups produced by the query. Sybase produces this result: | COL2 | COL3 ------------------- 6 | 1 | 1 6 | 1 | 1 6 | 1 | 2 9 | 2 | 1 9 | 2 | 2 Eh? How do we get that? In Sybase, if you have a query with columns that aren't in the "group by" clause or an aggregate, we first form an intermediate result (which you can see is the same as if we had left COL3 out of the query): | COL2 ------------ 6 | 1 9 | 2 and then join this with the base table (TABX) on the grouped column (COL2). I maintain that this is a reasonable way to interpret this query - since there is no explicit association of the aggregate values with the column SUM3, we match each aggregate value with all of the rows where the grouping columns are equal. I think most people would admit that there is no intuitively obvious result for such a query. Not only that, but it's hard to implement the interpretation I have given above. That's why ANSI and many products disallow it. --- Jeff Lichtman at Sybase {mtxinu,pacbell}!sybase!jeffl -or- jeffl@sybase.com "Saints should always be judged guilty until they are proved innocent..."
wcb@sorsac.UUCP (20SO940210) (03/24/89)
In article <3462@sybase.sybase.com> jeffl@sybase.Sybase.COM (Jeff Lichtman) writes: >In article <whatever> (Darrel R Carver) writes: >> select sum(amt), amt from tr_dat where tr_vendor = 5; >> >> Informix gives me an error saying the amt must be in a group by >> statement.... >> >> My question is why? Why should I have to have the amt in a group by >> clause? >> -- >> Darrel R Carver - A clever quote goes here. Have we got any in >Informix does it according to the ANSI standard. Sybase doesn't have this >restriction, but I know why it exists, and I will try to explain. An excellent explanation then follows, of which I have deleted all but the following portion: >- - - >Suppose you have the following table, TABX: > > COL1 | COL2 | COL3 > ------------------ > 1 | 1 | 1 > 2 | 1 | 1 > 3 | 1 | 2 > 4 | 2 | 1 > 5 | 2 | 2 >- - - >Now, what should the following produce: > > select sum(COL1), COL2 from TABX > >There is no association between the aggregate and COL2. In ANSI SQL, >not only is the syntax illegal, but there is no definition for what this >should produce if it were legal. In Sybase, we give the answer: > > | COL2 > ------------ > 15 | 1 > 15 | 2 > 15 | 3 > 15 | 4 > 15 | 5 > >That is, since there is no grouping, Sybase simply duplicates the scalar >(single-value) result for each value of COL2. This is the only way I >can think of to interpret this query, but I can't imagine why someone >would want this result. It would make more sense to do two selects: > > select sum(COL1) from TABX > select COL1 from TABX > >- - - With the above values in TABX, it seems the statement select sum(COL1), COL2 from TABX would produce the result: | COL2 ------------ 15 | 1 15 | 1 15 | 1 15 | 2 15 | 2 Which implies to me that Sybase will make more than one pass through the data, per select statement - which, of course, Informix will not do. To try to get around this, I came up with the following: select col1 from tabx into temp temptab; insert into temptab values (""); insert into temptab select sum(col1) from tabx; select * from temptab; Which will produce something like: col1 1 2 3 4 5 15 with a report of having retreived 7 rows. (In "running" this as a ".sql" file, the "retreive" statements that are produced can be redirected to /dev/null - assuming you are running UNIX.) If Sybase performs differently than I have indicated, I would be interested in hearing about it. ------------------------------------------------------------------------- -- "With friends like these, who needs hallucinations?" - Buddy Ryan in "Night Court" Bill Barksdale AT&T Network Systems Atlanta, Ga.
jeffl@sybase.Sybase.COM (Jeff Lichtman) (03/26/89)
|| COL1 | COL2 | COL3 || ------------------ || 1 | 1 | 1 || 2 | 1 | 1 || 3 | 1 | 2 || 4 | 2 | 1 || 5 | 2 | 2 || - - || select sum(COL1), COL2 from TABX || - - | | With the above values in TABX, it seems the statement | would produce the result: | | | COL2 | ------------ | 15 | 1 | 15 | 1 | 15 | 1 | 15 | 2 | 15 | 2 Oops. Yes, you're right. My original posting had the wrong result for this query. The result I gave was for the query: select sum(COL1), COL1 from TABX --- Jeff Lichtman at Sybase {mtxinu,pacbell}!sybase!jeffl -or- jeffl@sybase.com "Saints should always be judged guilty until they are proved innocent..."