[comp.databases] Ease of SQL with an ugly schema

ajayshah@aludra.usc.edu (Ajay Shah) (07/09/90)

I'm pretty naive of databases and SQL, and wondered how this
query would be represented using SQL:

	There is a table containing fields X Y and FLAG.  FLAG
	takes values zero or one.  One wishes to create the
	summation over the table of a variable Z where Z is defined
	as follows:

		If (flag = 0) then Z := X - Y
			else Z := X + Y

	So the query is essentially implemented by the algorithm:

		sum := 0;
		for (all records in table) do
			with current_record do
			begin
				If (flag = 0) then Z := X - Y
						else Z := X + Y
				sum := sum + Z
			end;
		writeln(sum);


	Representing this in SQL wasn't immediately obvious to
	me... can someone produce a demo?


Thankx
_______________________________________________________________________________
Ajay Shah, (213)747-9991, ajayshah@usc.edu
                              The more things change, the more they stay insane.
_______________________________________________________________________________

jkrueger@dgis.dtic.dla.mil (Jon) (07/10/90)

ajayshah@aludra.usc.edu (Ajay Shah) writes (my rewrites):

> table containing fields X Y and FLAG.  FLAG
> takes values zero or one.  One wishes to create the
> summation over the table of a variable Z defined as:
>	If (flag = 0) then
>		Z := X - Y
>	else
>		Z := X + Y

One way to do it:

	select sum(x + (y * ((2 * z) - 1))) from t;

The expression (y * ((2 * z) - 1)) is of course the 
transform usually written in algebraic form 2z-1, which
maps flag values {0, 1} to {1, -1}.  This takes a boolean
flag column and plugs it into a mathematical evaluation.

This is a kludge.  It works because of a convention for
representing boolean values in integers; break the convention
and the database will be none the wiser, but the result of
the query becomes unpredictable.

An additional penalty is paid in performance; the query as written
hides a useful boolean expression:

	((flag) and (z = (x - y)) or ((flag) and (z = (x + y))))

in some useless arithmetic calculations (above).  Given the math, the
query optimiser is not likely to find any shortcuts.  Although
aggregates without selects aren't likely to get optimized much.

But the performance problem is a symptom of the correctness problem.
Expressing the query need not be so prone to kludges.  Improved
database design might represent things without flags.  Or the problem
may be a poor fit for the data model.  Or the object may be susceptible
to representing as an ADT which returns values without all the math
or logic in each query.  The point here is, yes you can write the
query, no it's not a good solution, but without knowing more about
your problem one can't say how to get a better solution.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
Drop in next time you're in the tri-planet area!

daveb@comspec.uucp (dave berman) (07/11/90)

In article <10698@chaph.usc.edu>, ajayshah@aludra.usc.edu (Ajay Shah) writes:
> I'm pretty naive of databases and SQL, and wondered how this
> query would be represented using SQL:
>       There is a table containing fields X Y and FLAG.  FLAG
>       takes values zero or one.  One wishes to create the
>       summation over the table of a variable Z where Z is defined
>       as follows:
>               If (flag = 0) then Z := X - Y
>                       else Z := X + Y
> Ajay Shah, (213)747-9991, ajayshah@usc.edu

Change your original premise, of FLAG equal to 0 or 1, to FLAG equal to -1 or
+1. The new SQL becomes

SQL> SELECT TOTAL( X + FLAG*Y ) FROM tablename;

To change the entire file to the correct format simply

SQL> CHANGE FLAG TO -1 FOR FLAG = 0 IN tablename;

Disclaimer: Since your question was kinda theoretical, my answer is
theoretical as well, since it combines the syntax of a couple of different SQL
implementations I use. The concepts transfer fine, but the syntax and keywords
change a little.

Keep well,       -dave
-- 
Dave Berman
436 Perth Av #U-907   daveb@comspec.UUCP   Computer at work
Toronto Ontario       uunet!mnetor!becker!comspec!daveb
Canada M6P 3Y7        416-785-3668         Fax at work

jeffl@sybase.Sybase.COM (Jeff Lichtman) (07/14/90)

> 	There is a table containing fields X Y and FLAG.  FLAG
> 	takes values zero or one.  One wishes to create the
> 	summation over the table of a variable Z where Z is defined
> 	as follows:
> 
> 		If (flag = 0) then Z := X - Y
> 			else Z := X + Y

	select (select sum(X - Y) from TAB where FLAG = 0) +
		(select sum(X + Y) from TAB where FLAG = 1)

A potential problem here is that the FLAG column isn't very selective.
Even if you have an index on that column, the query optimizer may
decide not to use the index for at least one of the subqueries.  So,
unless you're careful, you could end up with two full scans of the data.

Suppose you redefined your data, so that the FLAG column contained either
1 or -1 instead of 1 or 0.  That would allow you to write the query like this:

	select sum(X + Y * FLAG) from TAB

This way, you scan the table exactly once.

If you can't use -1 instead of 0 in the FLAG column, you could write the
query like this:

	select sum(X + 2.0 * (Y * (FLAG - 0.5))) from TAB

You may have to massage this query to make it work, since different database
systems have different rules for arithmetic conversion.

I hope this helps!
---
Jeff Lichtman at Sybase
{mtxinu,pacbell}!sybase!jeffl  -or- jeffl@sybase.com
"Saints should always be judged guilty until they are proved innocent..."

jkrueger@dgis.dtic.dla.mil (Jon) (07/20/90)

jeffl@sybase.Sybase.COM (Jeff Lichtman) writes:

>	select (select sum(X - Y) from TAB where FLAG = 0) +
>		(select sum(X + Y) from TAB where FLAG = 1)

An excellent solution, except it isn't what the question asked for, SQL.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
Drop in next time you're in the tri-planet area!