[comp.databases] Sybase bug?

christie@kylie.oz (Chris Tham) (07/03/89)

I found the following anomaly and wondered if it is a bug in Sybase or
normal in SQL:
(All lines beginning with '>' are typed by me)

> create table hello(a int)
> create table world(b int)
> insert into hello(a) values (1)
(1 row affected)
> select * from hello where a = 1
 a
 -----------
           1

(1 row affected)
> select * from hello where a = 1 or a in (select b from world)
 a
 -----------

(0 rows affected)
> insert into world(b) values (2)
(1 row affected)
> select * from hello where a = 1 or a in (select b from world)
 a
 -----------
           1

(1 row affected)
> drop table hello,world

As you can see, the "select" with a subselect in an "or" clause fails if the
subselect fails but succeeds if the subselect succeeds.

If this is in fact not a bug but proper behaviour, can anyone tell me
the proper way to phrase this query?

I await comments with anticipation.  Thank you.
-- 
Internet: christie@kylie.otr.oz		"I'm Pink, therefore I'm Spam"
JANET: christie%kylie.oz@uk.cc.ucl.cs	Phone: +612 235-0255
UUCP: {uunet,hplabs,mcvax,ukc,nttlab}!munnari!christie@kylie.otr.oz
Mail: Optech Research Pty Ltd, Level 60 MLC Centre Sydney NSW 2000 AUSTRALIA

jkrueger@daitc.daitc.mil (Jonathan Krueger) (07/04/89)

In article <613@kylie.oz>, christie@kylie (Chris Tham) writes:
>I found the following anomaly and wondered if it is a bug in Sybase or
>normal in SQL:  [test case looks like:

	+-------+			+-------+
	| hello |    a          	| world |    b
	+-------+------+		+-------+------+
		|    1 |			+------+
		+------+

	select * from hello where a = 1 or a in (select b from world);

			+--------+
			| result |    a
			+--------+------+
				 +------+

It's normal.  The semantics are usually unintended, however.  The
subquery implies a join to an empty table.  This means a cartesian
product where one matrix has size zero.  Or if you prefer, in the
relational model intersection is undefined on two relations if either
is empty.

To force the semantics you intend, use UNION:

	select a from hello where a = 1
	UNION
	select a from hello where a in (select b from world);

			+--------+
			| result |    a
			+--------+------+
				 |    1 |
				 +------+

-- Jon
-- 
--

jklein@oracle.uucp (Jonathan Klein) (07/04/89)

st! omp: O: O:___rrrssstttu-I-I--s:


Fon:on:on@orps s sO18181on:IDArlelelOra 8 t 8 t 9 ": w: w: 1 16 16  8@o@o@ubs n@eiMs sOr.c.c."FctZylttGMFFFU-! atiatiauburfffbutnRep: jZy-R-R-up'''B Kgegeg1rssq=n&g?sa1S
hverouleinn g Kgpppsgl 8 P
P
anF
Danijkjm(J:
P&gcppppp`lrism( thag?ttclczRrZCa:2hzzzacle veraclFrourB= 8 ga:! F 8 t B=48Refr(J:aoooclcZyp.g?sq22abagup'e:st-I-T>.d19,j2news.daceMna1natiJbui$flq%#!INriF#!v#!v#acle ttlqlqlO
O
tiontime:acle tnttld
jkjmsgl

lcain@cuc1.UUCP (Leroy Cain) (07/05/89)

In article <571@daitc.daitc.mil>, jkrueger@daitc.daitc.mil (Jonathan Krueger) writes:
> In article <613@kylie.oz>, christie@kylie (Chris Tham) writes:
> >I found the following anomaly and wondered if it is a bug in Sybase or
> >normal in SQL:  [test case looks like:
> 
> 	+-------+			+-------+
> 	| hello |    a          	| world |    b
> 	+-------+------+		+-------+------+
> 		|    1 |			+------+
> 		+------+
> 
> 	select * from hello where a = 1 or a in (select b from world);
> 
> 			+--------+
> 			| result |    a
> 			+--------+------+
> 				 +------+
> 
> It's normal.  The semantics are usually unintended, however.  The
> subquery implies a join to an empty table.  This means a cartesian
> product where one matrix has size zero.  Or if you prefer, in the
> relational model intersection is undefined on two relations if either
> is empty.


That is an interesting explanation but I don't buy it.  I checked
Informix-SQL and Unify 3.2 and they both product the correct results.
 
 			+--------+
 			| result |    a
 			+--------+------+
				 |   1  |
 				 +------+

[ example of union delete ]

So why the example using a union ( which is correct ) if you think there is
an intersection?  Or am I just misunderstand you?


MS-DOS Just say NO!!!!!			      OS/2 Why????
Leroy Cain;      Columbia Union College;      Mathematical Sciences Department
7600 Flower Ave. WH406;  	              Takoma Park, Md 20912
(301) 891-4172				      uunet!cucstud!lcain

gupta@cullsj.UUCP (Yogesh Gupta) (07/08/89)

In article <613@kylie.oz>, christie@kylie.oz (Chris Tham) writes:
> I found the following anomaly and wondered if it is a bug in Sybase or
> normal in SQL:
> [...] 
> > select * from hello where a = 1 or a in (select b from world)
>  a
>  -----------
> 
> (0 rows affected)
> -- 

According to the ANSI SQL86 standard, a the result of a <search condition>
is
  "The result derived by the application of the specified boolean
  operators to the conditions that result from the application of
  EACH specified <predicate> to a given row of a table or a given
  group of a grouped table."
(General Rule #1).  [Emphasis on EACH is mine].

Thus, the predicate a = 1 should be applied to each row of the table
hello, and would be true for the one row that exists in table hello.

Also, according to the truth tables described in General Rule #2,
the search condition would be true for the row.  Thus, the result
should contain 1 row.

Yogesh Gupta.
----
The above opinions are my own, and not those of Cullinet.

jeffl@sybase.Sybase.COM (Jeff Lichtman) (07/09/89)

>> > select * from hello where a = 1 or a in (select b from world)
> 
> Thus, the predicate a = 1 should be applied to each row of the table
> hello, and would be true for the one row that exists in table hello...
> Thus, the result should contain 1 row.
> 
> Yogesh Gupta.

The result of a SQL query is supposed to be as if the following operations
were performed (in this order):

	1) Cartesian product
	2) Restriction (where clause)
	3) Projection (select list)

(Note that this doesn't necessarily describe the actual processing strategy.)
In this case, the Cartesian product of the tables contains no rows, so the
answer contains no rows.

A previous posting had it right: 'or' does not mean 'union'.
---
Jeff Lichtman at Sybase
{mtxinu,pacbell}!sybase!jeffl  -or- jeffl@sybase.com
"Saints should always be judged guilty until they are proved innocent..."

lcain@cucstud.UUCP (Leroy Cain) (07/10/89)

In article <4982@sybase.sybase.com>, jeffl@sybase.Sybase.COM (Jeff Lichtman) writes:
> >> > select * from hello where a = 1 or a in (select b from world)
> > 
> > Thus, the predicate a = 1 should be applied to each row of the table
> > hello, and would be true for the one row that exists in table hello...
> > Thus, the result should contain 1 row.
> >
> > Yogesh Gupta.

This is correct!


[ stuff about query plan deleted]

> A previous posting had it right: 'or' does not mean 'union'.

Well in that case a number of text books need rewriting.  A quote from
"An Introduction to Database Systems" by C.J. Date page 264:

"2. R WHERE c1 OR c2
    is defined to be equivalent to
       ( R WHERE c1 ) UNION ( R WHERE c2 )"

> ---
> Jeff Lichtman at Sybase
> {mtxinu,pacbell}!sybase!jeffl  -or- jeffl@sybase.com
> "Saints should always be judged guilty until they are proved innocent..."


MS-DOS Just say NO!!!!!			      OS/2 Why????
Leroy Cain;      Columbia Union College;      Mathematical Sciences Department
7600 Flower Ave. WH406;  	              Takoma Park, Md 20912
(301) 891-4172				      uunet!cucstud!lcain

ajs@datlog.co.uk ( Andy Simms ) (07/10/89)

In article <4982@sybase.sybase.com> jeffl@sybase.Sybase.COM (Jeff Lichtman) writes:
>>> > select * from hello where a = 1 or a in (select b from world)
>> 
>
>The result of a SQL query is supposed to be as if the following operations
>were performed (in this order):
>
>	1) Cartesian product
>	2) Restriction (where clause)
>	3) Projection (select list)
>
>In this case, the Cartesian product of the tables contains no rows, so the
>answer contains no rows.
>
>A previous posting had it right: 'or' does not mean 'union'.
>---
>Jeff Lichtman at Sybase

You're wrong, because there is NO cartesian product involved here. The
(conceptual) cartesian product only applies when there is more than one
table named in the FROM clause; here we have a predicate based on a single
table with two conditions joined by a disjunction, the second condition
being a subquery. Now, given that the predicate "a = 1" is TRUE for the
single row in table "hello" and that "TRUE OR x" is TRUE for all three truth
values of x, the result of the subquery is immaterial: the predicate is
satisfied. In fact, irrelevant though it may be, the value of x is treated as
FALSE (rather then MAYBE) by ANSI SQL (see under <quantified predicate> 
case 2(d)).

If the Sybase query optimiser is transforming subqueries into joins, then
I suspect you have a nice little problem to solve here. Furthermore, lazy
evaluation suggests the subquery does not need to be executed at all (so
you certainly don't need to do any transformation).

Andy Simms
Data Logic (ajs@datlog.co.uk).

gupta@cullsj.UUCP (Yogesh Gupta) (07/11/89)

In article <4982@sybase.sybase.com>, jeffl@sybase.Sybase.COM (Jeff Lichtman) writes:
> >> > select * from hello where a = 1 or a in (select b from world)
> > 
> > Thus, the predicate a = 1 should be applied to each row of the table
> > hello, and would be true for the one row that exists in table hello...
> > Thus, the result should contain 1 row.
> > 
> > Yogesh Gupta.
> 
> The result of a SQL query is supposed to be as if the following operations
> were performed (in this order):
> 
> 	1) Cartesian product

You missed a very important point though - The cartesian product is of
only those tables that are in the <from clause>. (see lengthy ANSI quote
below).  Thus, in our query, there is NO cartesian product involved!

> 	2) Restriction (where clause)
> 	3) Projection (select list)
> 
> (Note that this doesn't necessarily describe the actual processing strategy.)
> In this case, the Cartesian product of the tables contains no rows, so the
> answer contains no rows.

Again, there is no Cartesian product involved.

The following is a long discussion which tries to explain how subqueries
are different from simple joins:

What is the difference between the following the queries?

Q1.	Select a
	  from hello
	  where a = 1
		 or a in (select b from world);

Q2.	Select a
	  from hello, world
	  where a = 1
		 or a = b;

Note that the behaviour of the subquery is different in the sense that there
is NO initial cartesian product!  Thus, to quote ANSI again (X3.135-1986):

	<table expression> ::=
		<from clause>
		[<where clause>]
		[<group by clause>]
		[<having clause>]

	General Rules

	1) If all the optional clauses are omitted, then the table is the
	   result of the <from clause>.  Otherwise, each specified clause
	   is applied to the result of the previously specified clause and
	   the table is the result of of the application of the last
	   specified clause. ...



	<from clause>

	General Rules:
	 2) Case:
	    a)  If the <from clause> contains a single <table name>, then
		the result of the <from clause> is the table identified by
		that <table name>.


	
	<where clause>

	General Rules:

	1)  Let R denote the result of the <from clause>.

	2)  The <search condition> is applied to each row of R.  The
	    result of the <where clause> is a table of those rows of
	    R for which the result of the <search condition> is true.

	3)  Each <subquery> in the <search condition> is effectively
	    executed for each row of R and the results used in the
	    application of the <search condition> to the given row of
	    R.  If any executed <subquery> contains an outer reference
	    ...

From the above, the logical order of processing (not necessarily the
actual order of execution) of the query Q1 is as follows:

	Step A)  The <from clause> has the table "hello".  The result
		 of the from clause is the single row with the value
		 of "a" being 1.

	Step B)  The <where clause> contains two predicates.  Applying
		 each of them to the value of the single row, the first
		 predicate (a = 1) yields true while the second (subquery)
		 yields false.  The result of True OR False is True.
		 The the result of applying the <where clause> is also
		 the single row with the value of "a" being 1.

However, the logical order of processing of the query Q2 is as follows:

	Step A)  The <from clause> has the tables "hello" and "world".
		 The result of the from clause is the extended cartesian
		 product of the two tables.  As the number of rows in the
		 table "world" is zero, the result has no rows.

I hope this makes it clear.
(I shouldn't have to explain this to two people from Sybase, should I :-)

Yogesh Gupta.

Cullinet Software, Inc.
-----
The opinions expressed in this article are those of the author only and do
not represent those of Cullinet.

jeffl@sybase.Sybase.COM (Jeff Lichtman) (07/11/89)

I hate being wrong....

My previous analysis of the problem was incorrect.  The query:

	select * from hello where a = 1 or a in
		(select b from world)

should return rows even if the table "world" contains no rows.
The rule about forming the Cartesian product first applies to
tables in the same "from" clause.  The SQL 86 standard says that
each row from the Cartesian product is applied to the where clause
one at a time.  If there is a subquery, it is evaluated once per 
Cartesian product row from the outer query.

> "2. R WHERE c1 OR c2
>     is defined to be equivalent to
>        ( R WHERE c1 ) UNION ( R WHERE c2 )"

I can't find this in my edition of Date, but I have an old 3rd edition.
I don't think it is correct.  At least, it doesn't match the ANSI
standard.  What edition of Date do you have, and what section are
you quoting?
---
Jeff Lichtman at Sybase
{mtxinu,pacbell}!sybase!jeffl  -or- jeffl@sybase.com
"Saints should always be judged guilty until they are proved innocent..."

lcain@cuc1.UUCP (Leroy Cain) (07/12/89)

In article <5003@sybase.sybase.com>, jeffl@sybase.Sybase.COM (Jeff Lichtman) writes:
> I hate being wrong....
> 
> My previous analysis of the problem was incorrect.  The query:
> 
> 	select * from hello where a = 1 or a in
> 		(select b from world)

[ a clarification of SQL 86 delete]

Here another way of viewing the problem.

first the query in SQL.

1. select * from hello where a = 1 or a in (select b from world)

Then we converted it into logical equivalent form. ( Relation Calculus )

2. FORALL (hello) ( hello.a = 1 OR EXIST(world) (hello.a = world.b) )

We now convert the relational calculus into relation algebra.

3. (hello WHERE a=1) UNION ( hello X world WHERE hello.a=world.b )[hello.a] 

As you can see we move from an OR to a UNION.  My quote below was not
completely relavent to the discussion since R in the case we are examining
are two different things.  Though the general principal seems to still
apply.

> > "2. R WHERE c1 OR c2
> >     is defined to be equivalent to
> >        ( R WHERE c1 ) UNION ( R WHERE c2 )"
> 
> I can't find this in my edition of Date, but I have an old 3rd edition.
> I don't think it is correct.  At least, it doesn't match the ANSI
> standard.  What edition of Date do you have, and what section are
> you quoting?

I am sorry it is the 4rd edition section 13.4.

> ---
> Jeff Lichtman at Sybase
> {mtxinu,pacbell}!sybase!jeffl  -or- jeffl@sybase.com
> "Saints should always be judged guilty until they are proved innocent..."

Leroy Cain;      Columbia Union College;      Mathematical Sciences Department
7600 Flower Ave. WH406;  	              Takoma Park, Md 20912
(301) 891-4172				      uunet!cucstud!lcain

jkrueger@daitc.daitc.mil (Jonathan Krueger) (07/13/89)

Leroy Cain writes:
>in the 4rd edition section 13.4.
>	"2. R WHERE c1 OR c2
>	is defined to be equivalent to
>	 ( R WHERE c1 ) UNION ( R WHERE c2 )"

This is a relational algebra used by Date.  SQL may agree.  Then again
it may not.  Perhaps someone with the ANSI spec could say.

-- Jon
--