[comp.databases] INGRES oddity

seligson@nunki.usc.edu (Richard Seligson) (12/18/88)

Hello,

I am using the interactive SQL interface to Ingres Version 5.0/01.

My database contains the following tables:

Dept (dname, dcode)
Class (classnum, cnum, dcode, instructor, location, mtgtimes)
Enrollment (socsec, classnum)
... and others, but not referenced here.

I want to create a relation of departments, classes, and the number of 
students in each class.  Based on that relation, I want to create a second
relation of departments and total number of students in its classes.

When I create a table, and then a second table based on the first, I get
the correct results.  However, when I create a view and then create a 
second view based upon the first view, the second view contains incorrect data
(the first view is correct).

I am wondering if it is incorrect to build a relation based on a view (Ingres
doesn't flag any error), or if I've done something wrong.

The SQL statements are:

create table t1 as
select dept.dname, class.classnum, students=count(enrollment.socsec)
from dept, enrollment, class
where class.dcode = dept.dcode
and   enrollment.classnum = class.classnum
group by dept.dname, class.classnum;

create table t2 as
select dname, tot_students=sum(students)
from t1
group by dname;

The (almost) identical statements to create the views are:

create view v1 as
select dept.dname, class.classnum, students=count(enrollment.socsec)
from dept, enrollment, class
where class.dcode = dept.dcode
and   enrollment.classnum = class.classnum
group by dept.dname, class.classnum;

create view v2 as
select dname, tot_students=sum(students)
from v1
group by dname;


The relations created are:

t1 & v1 (they are identical):
|dname               |classnum     |students     |
|------------------------------------------------|
|Comparative Literatu|        92882|            2|
|Comparative Literatu|        92886|            1|
|Computer Science    |        11929|            2|
|Computer Science    |        11932|            1|
|Computer Science    |        11934|            2|
|Computer Science    |        11936|            1|
|Computer Science    |        11938|            1|
|Economics           |        34633|            1|
|Economics           |        73652|            1|
|Economics           |        89234|            3|
|Economics           |        93825|            1|
|Philosophy          |        92822|            3|
|Philosophy          |        92824|            1|
|Philosophy          |        92826|            2|
|Philosophy          |        92830|            1|
|Physical Education  |        74397|            2|
|Physical Education  |        74401|            1|
|Physical Education  |        74403|            1|
|------------------------------------------------|

t2:
|dname               |tot_students |  <-- These are correct
|----------------------------------|
|Comparative Literatu|            3|
|Computer Science    |            7|
|Economics           |            6|
|Philosophy          |            7|
|Physical Education  |            4|
|----------------------------------|

v2:
|dname               |tot_students |
|----------------------------------|
|Comparative Literatu|            3|
|Computer Science    |            3| <-- tot_students is incorrect
|Economics           |            4| <-- tot_students is incorrect
|Philosophy          |            6| <-- tot_students is incorrect
|Physical Education  |            3| <-- tot_students is incorrect
|----------------------------------|


Thanks for your help.  If you respond to me via Email, I will post a
summary.

Rich Seligson    seligson@oberon.usc.edu  - InterNet
                 ...uunet!oberon!seligson - UUCP

p.s.  I may not be able to respond until after Jan. 8th.  I'll
      be away for some of winter break - thanks again!

debra@alice.UUCP (Paul De Bra) (12/19/88)

In article <2183@nunki.usc.edu> seligson@nunki.usc.edu (Richard Seligson) writes:
>Hello,
>
>I am using the interactive SQL interface to Ingres Version 5.0/01.
>
>My database contains the following tables:
>
> ... tables deleted ...
>
>The SQL statements are:
>
>create table t1 as
>select dept.dname, class.classnum, students=count(enrollment.socsec)
>from dept, enrollment, class
>where class.dcode = dept.dcode
>and   enrollment.classnum = class.classnum
>group by dept.dname, class.classnum;
>
>create table t2 as
>select dname, tot_students=sum(students)
>from t1
>group by dname;
>
>The (almost) identical statements to create the views are:
> ... nearly identical statements deleted...

The problem with the creation of t2, or v2, is that there are 2 ways
to interpret this query:
1) Perform the "sum" operation on the original tuples from t1 and then
   remove the classnum attribute
2) First remove the classnum attribute and then perform the "sum" operation
   on the subtuples.

The first operation will yield the "correct" result, whereas the second
may count only the classes with different numbers of students, since after
the projection one may have duplicate tuples, and an SQL implementation has
the freedom of deciding whether to remove duplicates or not. (unless the
"unique" keyword is used forcing the removal of duplicates)

The problem with your SQL implementation is that in the relational algebra
(or domain calculus) the selection and projection operator may be performed
in either order, yielding the same result. But this property is no longer
true if "counting" is added to the domain calculus. SQL is an extention
of the domain calculus for which this property of the domain calculus no
longer holds, and obviously the implementors overlooked this.

An additional problem is that the syntax of SQL does not really indicate
whether it is the "select" (the projection) or the "where" (the selection)
should be executed first.

What you may try to solve this problem is:

create table t2 as
select dname, tot_students
from
select dname, classnum, tot_students=sum(students)
from t1
group by dname;

This may help indicate to SQL to evaluate the sum before the projection, but
a (bogus) optimizer may remove the extra step again.

Paul.
-- 
------------------------------------------------------
|debra@research.att.com   | uunet!research!debra     |
------------------------------------------------------

jkrueger@daitc.daitc.mil (Jonathan Krueger) (12/20/88)

In article <8558@alice.UUCP>, debra@alice (Paul De Bra) writes:
>[masterful summary of SQL syntax ambiguity deleted]

>[possible workaround:]
>create table t2 as
>select dname, tot_students
>from
>select dname, classnum, tot_students=sum(students)
>from t1
>group by dname;
>
>This may help indicate to SQL to evaluate the sum before the projection, but
>a (bogus) optimizer may remove the extra step again.

I believe QUEL disambiguates such queries:
	retrieve into t2 (t1.dname, t1.classnum, tot_students = sum(students))
since each aggregation is independent of the others in the target list.

The cost of this is great difficulty getting table names, domains, and
restrictions to act on all clauses in common.  Best example is the
awkward syntax for a difference operation
	/* find Republicans with acceptable pasts */
	retrieve (person.all) where person.party = "GOP" and
		any(crimes.severity by crimes.person where
			person.party = "GOP" and
			crimes.person = person.name and
			crimes.severity > "misdemeanor") = 0

The repeated GOP qualifier is needed because the person domain inside
the aggregation bears no relation to the person domain elsewhere.  The
any(foo) = 0 is a separate awkwardness.  Much nicer is the SQL

	select * from person where party = "GOP" and not exists
		(select * from crimes where severity > "misdemeanor"
			and crimes.person = person.name);

So I'd say the tradeoffs all circle around the semantics of
aggregation.  If aggregation is isolated from select and project
elsewhere in the query, it saves us from unpredictable order of query
evaluation/execution effects, but it pushes the limits of grouping
syntax and leads to replicate code.

-- Jon
-- 

ajs@datlog.co.uk ( Andy Simms ) (12/21/88)

In article <2183@nunki.usc.edu> seligson@nunki.usc.edu (Richard Seligson) writes:
>I am using the interactive SQL interface to Ingres Version 5.0/01.
>
>I want to create a relation of departments, classes, and the number of 
>students in each class.  Based on that relation, I want to create a second
>relation of departments and total number of students in its classes.
>
>When I create a table, and then a second table based on the first, I get
>the correct results.  However, when I create a view and then create a 
>second view based upon the first view, the second view contains incorrect data
>(the first view is correct).
>
>I am wondering if it is incorrect to build a relation based on a view (Ingres
>doesn't flag any error), or if I've done something wrong.
>
>create view v1 as
>select dept.dname, class.classnum, students=count(enrollment.socsec)
>from dept, enrollment, class
>where class.dcode = dept.dcode
>and   enrollment.classnum = class.classnum
>group by dept.dname, class.classnum;
>
>create view v2 as
>select dname, tot_students=sum(students)
>from v1
>group by dname;
>

Most SQLs disallow GROUP BY if the table referenced in the FROM clause
is itself a "grouped view" (i.e. a view with a GROUP BY). To quote the
ANSI standard (section 5.19 defining <table expression> - the FROM ...
WHERE ... GROUP BY ... HAVING ... stuff):

	"If the table identified in the <from clause> is a grouped
	view, then the <table expression> shall not contain a
	<where clause>, <group by clause>, or <having clause>."

I suspect this restriction is present mainly for implementation reasons.
No SQL that I know of materialises views at the definition stage. Instead
the definition is stored away and incorporated into the overall query
at run time, so the resultant query must be syntactically valid. In your
example you are effectively asking for

    tot_students=sum(count(enrollment.socsec))

which is invalid syntax. WHERE is banned for similar reasons, e.g.

	select dname, students
	from v1
	where students > 1

This ends up as

	where count(enrollment.socsec) > 1

and you can't have COUNT in a WHERE clause. Note that you can't get round
this by using a HAVING clause because that's banned too, which is particularly
annoying.

Finally, you might also note how the lack of outer joins in INGRES means
that you never see the department/class combinations for which there are
no enrolled students.

seligson@sal44.usc.edu (Richard Seligson) (01/12/89)

A while back I had posted a question about views based upon views in Ingres.
I'd like to thank all those who replied.

The two basic points that were brought up were:

1) There is an ambiguity in SQL regarding whether projection is done before
   or after selection. Pointed out by Paul De Bra (debra@alice.UUCP).

2) And as pointed out by Andy Simms (ajs@datlog.co.uk):

> Most SQLs disallow GROUP BY if the table referenced in the FROM clause
> is itself a "grouped view" (i.e. a view with a GROUP BY). To quote the
> ANSI standard (section 5.19 defining <table expression> - the FROM ...
> WHERE ... GROUP BY ... HAVING ... stuff):
> 
> 	"If the table identified in the <from clause> is a grouped
>	view, then the <table expression> shall not contain a
>	<where clause>, <group by clause>, or <having clause>."

Unfortunately, INGRES is violating both.  It allows me to use a
"grouped query" based on a "grouped view", and it interprets the 
ambiguity differently depending upon whether a "grouped view" was 
involved or not.

If my "grouped query" is based on a "grouped view" then INGRES projects
then selects, otherwise INGRES selects then projects.

Has anyone else found this?  I can supply a small INGRES script that
demonstrates this.

Rich Seligon                seligson@oberon.usc.edu  - ARPA
                            ...uunet!oberon!seligson - UUCP