[comp.databases] joining 3 tables in SQL

kmeek@cti1.UUCP (Kevin Meek) (02/16/91)

I never did get SQL to do what I wanted.  This time I'll
try and be a little clearer and see if anyone can set me straight.

We have a table with account numbers and account information.

We have a table with planning data that is linked to the
account table via the account number.

We have a table with spending data that is linked to the account
table via the account number.

We want an output record for each account with the total 
planned amount and total expended amount.  It is easy to get
either one using sum and group by but getting both in one query
is more difficult.

A sketchy description of the tables I'm working with follows

Accounts table:
Account# [ string 10 ] ( primary key ),
name   [ string 10 ],
number [ string 10 ]
etc...


planning table:  ( NO PRIMARY KEY ) 
Account# [ string 10 ],
plan_amount [ huge amount ]
etc.  


spending table: ( no primary key )
Account# [ string 10 ],
exp_amount [ huge amount ]



there is a link index on plan account# references accounts(accounts#)
and a link index on exec account# references accounts(accounts#)


if there are 3 planning records and 2 spending records for a given
account say account #100

We would have

Accounts data as follows

100|widgets

Planning records like this

100|100.00|....
100|200.00|....
100|300.00|....

Spending records like this:

100|150.00|...
100|250.00|...

I want a SQL select statement that will give me the following output

100|widgets|600.00|400.00

It is easy to get a query to give
100|widgets|600.00
OR
100|widgets|450.00

But I can't get both answers with one query.

I tried creating a view for each separately and then joining my views
but you can't use a where when your view has a group by clause. 

Anyone suggestions would be appreciated.

Thanks

Kevin
-- 
Kevin Meek 
kmeek@cti.com

butenko@bob.srcc.msu.su (Vladimir A. Butenko) (02/18/91)

In <447@cti1.UUCP> cti1!kmeek (Kevin Meek) writes:
>
>I never did get SQL to do what I wanted.  This time I'll
>try and be a little clearer and see if anyone can set me straight.
>
>We have a table with account numbers and account information.
>
>We have a table with planning data that is linked to the
>account table via the account number.
>
>We have a table with spending data that is linked to the account
>table via the account number.
>
>We want an output record for each account with the total 
>planned amount and total expended amount.  It is easy to get
>either one using sum and group by but getting both in one query
>is more difficult.
>
>A sketchy description of the tables I'm working with follows
>
>Accounts table:
>Account# [ string 10 ] ( primary key ),
>name   [ string 10 ],
>number [ string 10 ]
>etc...
>
>
>planning table:  ( NO PRIMARY KEY ) 
>Account# [ string 10 ],
>plan_amount [ huge amount ]
>etc.  
>
>
>spending table: ( no primary key )
>Account# [ string 10 ],
>exp_amount [ huge amount ]
>
It's a *very* common problem... I think that the only solution is:
SELECT Account#,name,
 SUM(SELECT plan_amount FROM planning_table WHERE Account# = Accounts.Account#),
 SUM(SELECT exp_amount  FROM spending_table WHERE Account# = Accounts.Account#),
FROM Accounts
 
 But:
 1) I'm not sure, that your SQL server can eat this expression
 2) It's rather slow, because tables plan_amount and exp_amount are scanned for
    each row in Accounts (of course, indexes are very important here).

 Maybe it's interesting for you how to write such a query in RQL:
 JOIN Accounts,
      plan :(GROUP planning_table BY Account# FOR Account#,S:SUM(plan_amount)),
      spend:(GROUP spending_table BY Account# FOR Account#,S:SUM(exp_amount )),
 WHERE Accounts.Account# = plan.Account# AND Accounts.Account# = spend.Account#
 FOR   Account#: Accounts.Account#, Name: Accounts.name,
       Plan:     plan.S           , Spend:spend.S

 I.e. you grouping both planning and spending tables by Account#, and then join
      them together and with Accounts table (I think that you tried to do it
      with SQL).

 Again, this query can be executed ineffectively by some stupid RQL server, so
 you can get your data by the single grouping operation:

  JOIN Accounts,
       plan_and_spend: 
         (GROUP (UNION
           (SELECT planning_table FOR Account#,plan:plan_amount,exp:0),
           (SELECT spending_table FOR Account#,plan:0,          exp:exp_amount)
                ) By Account# for plan:SUM(plan),exp:SUM(exp))
  WHERE Accounts.Account# = plan_and_select.Account#
  FOR   Account#: Accounts.Account#,     Name: Accounts.name,
        Plan:     plan_and_select.plan, Spend:plan_and_select:exp

Comments: we add an 'exp'  column to planning_table (with zero values),
             add an 'plan' column to spending_table (with zero values)
 (so these tables now have the same structure), glue them together with UNION 
 operation, then GROUP this table by Account#, calculating SUM(plan) and
 SUM(exp) and, finally, JOIN the result with the Accounts table to get "Names"


>I tried creating a view for each separately and then joining my views
>but you can't use a where when your view has a group by clause.
 
 I think that many SQL servers use smth like "macroprocessing" for views, i.e.
 they substitute the text of your view definition into your query and then try
 to interpret it. So, if you can't do something without views, creating some
 views can't help (just an opinion, maybe it's not right for some SQL servers).

>Anyone suggestions would be appreciated.
>Thanks
>Kevin

I hope this helps


-- 
Vladimir A. Butenko             Internet:       <Butenko@bob.srcc.msu.su>
Gamma Software,                 Phone:          7(095)939-2618
Moscow State University         Fax:            7(095)938-2136
* ===  To Gorby-lovers: communists can't reform, but they can pretend! === *

dberg@informix.com (David I. Berg) (02/19/91)

In article <447@cti1.UUCP> kmeek@cti1.UUCP (Kevin Meek) writes:
> We have a table with account numbers and account information.
> .....
> A sketchy description of the tables I'm working with follows
> 
> Accounts table:
> Account# [ string 10 ] ( primary key ),
> name   [ string 10 ],
> number [ string 10 ]
> 
> planning table:  ( NO PRIMARY KEY ) 
> Account# [ string 10 ],
> plan_amount [ huge amount ]
> 
> spending table: ( no primary key )
> Account# [ string 10 ],
> exp_amount [ huge amount ]
> 
> .....
> if there are 3 planning records and 2 spending records for a given
> account say account #100
> 
> We would have
> 
> Accounts data as follows
> 
> 100|widgets
> 
> Planning records like this
> 
> 100|100.00|....
> 100|200.00|....
> 100|300.00|....
> 
> Spending records like this:
> 
> 100|150.00|...
> 100|250.00|...
> 
> I want a SQL select statement that will give me the following output
> 
> 100|widgets|600.00|400.00

How about the following:

select account.account#,sum(planning.plan_amount) plan_amount
  from account,planning
 where planning.account# = account.account#
 group by account#
  into temp t1;
select account.account#,sum(spending.exp_amount) exp_amount
  from account,spending
 where spending.account# = account.account#
 group by account#
  into temp t2;
select account#,plan_amount,exp_amount 
  from t1 outer t2;

(The outer join would be necessary only if not all accounts with planning
amounts have spending amounts.)

OR

create temp table t1 (
    account#,
    plan_amt,
    exp_amt);
insert into t1 (account#,plan_amt)
select account.account#,sum(planning.plan_amount)
  from account,planning
 where planning.account# = account.account#
 group by account#;
update t1 set (account#,exp_amount) =
(select account.account#,sum(spending.exp_amount)
 where spending.account# = account.account#
   and account.account# = t1.account#
 group by account#);
select * from t1;
  ___                   ___              dberg@cougar.informix.com
  /  ) __      . __/    /_ ) _       __  Informix Software Inc.  (303) 850-0210
_/__/ (_(_ (/ / (_(_  _/__> (-' -/~ (_-  5299 DTC Blvd #740; Englewood CO 80111
{uunet|pyramid}!infmx!dberg        The opinions expressed herein are mine alone.

mwang@oracle.com (Michael Wang) (02/20/91)

In article <447@cti1.UUCP> kmeek@cti1.UUCP (Kevin Meek) writes:
[...]
>We have a table with account numbers and account information.
>
>We have a table with planning data that is linked to the
>account table via the account number.
>
>We have a table with spending data that is linked to the account
>table via the account number.
>
>We want an output record for each account with the total
>planned amount and total expended amount.  It is easy to get
>either one using sum and group by but getting both in one query
>is more difficult.
[...]
>We would have
>
>Accounts data as follows
>
>100|widgets
>
>Planning records like this
>
>100|100.00|....
>100|200.00|....
>100|300.00|....
>
>Spending records like this:
>
>100|150.00|...
>100|250.00|...
>
>I want a SQL select statement that will give me the following output
>
>100|widgets|600.00|400.00
[...]
>I tried creating a view for each separately and then joining my views
>but you can't use a where when your view has a group by clause.
[...]

One way to do this query is to create two views. One view would be

     SELECT account#, sum(plan_amount) sum_plan_amount
     FROM   planning
     GROUP BY account#

The other would be:

     SELECT account#, sum(exp_amount) sum_exp_amount
     FROM   spending
     GROUP BY account#

Your query would then be:

     SELECT accounts.account#, name, sum_plan_amount, sum_exp_amount
     FROM   accounts, planning_view, spending_view
     WHERE  accounts.account# = planning_view.account#
       AND  accounts.account# = spending_view.account#

It sounds like from your note that this is not going to work because your
version of SQL does not allow WHERE clauses with views that contain
GROUP BYs (which version are you using anyway?).

The other way to do this requires that you have a unique key column in both
of the tables, or the database you are working with has some column that
uniquely identifies every row in the database and is accessible to the user.
In ORACLE there is a ROWID column that does this. Using ROWID, your query
could be written as:

     SELECT accounts.account#, name,
            sum(planning.plan_amount)/count(distinct spending.rowid),
            sum(spending.exp_amount)/count(distinct planning.rowid)
     FROM   accounts, planning, spending
     WHERE  accounts.account# = planning.account#
       AND  accounts.account# = spending.account#
     GROUP BY accounts.account#, accounts.name

The query uses the fact that when you join tables, you are doing a cross
product, which means, given two tables A and B, every row in table A is
duplicated by the number of rows in table B. If you are doing a sum of a
column in table A while joining these two tables, then the result will
always be the result you would get without doing the join multiplied by the
number of rows in table B.

Again, this query depends on having some sort of unique key column for 
the tables you are joining. If the table doesn't have a unique key, then
you have to have some mechanism like ROWID to do this.


Hope this help,


Michael Wang
mwang@oracle.com

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) (02/21/91)

As quoted from <447@cti1.UUCP> by kmeek@cti1.UUCP (Kevin Meek):
+---------------
| Accounts data as follows
| 100|widgets
| 
| Planning records like this
| 100|100.00|....
| 100|200.00|....
| 100|300.00|....
| 
| Spending records like this:
| 100|150.00|...
| 100|250.00|...
| 
| I want a SQL select statement that will give me the following output
| 100|widgets|600.00|400.00
+---------------

In general, you're out of luck.  I don't know if any of the SQLs out there
will do this, even with an intervening view.

In practice, this can be accomplished (I infer UNIFY 2000, that's what all
this sounds like (link indexes, etc.) using some external tools:

+---------------
| It is easy to get a query to give
| 100|widgets|600.00
| OR
| 100|widgets|450.00
| But I can't get both answers with one query.
+---------------

So:  do query 1 into a temp file, query 2 into another temp file, then combine
them to get the composite (note that both should be sorted):

# the sed's will need to be changed if the first field is a string
# (this lets join consider fields 1 and 2 together to be the join field,
# since join rather stupidly only joins on one field)
SQL query1 | sed 's/|/,/' > /tmp/q1.$$
SQL query2 | sed 's/^|/,/' > /tmp/q2.$$
join -j 1 -o 1.1 1.2 2.2 -t'|' /tmp/q1.$$ /tmp/q2.$$ | sed 's/,/|/' |
    RPT reportscript -

I find this trick rather useful for dealing with things that SQL (both Unify's
SQL and SQL in general) can't handle.

++Brandon
-- 
Me: Brandon S. Allbery			    VHF/UHF: KB8JRR on 220, 2m, 440
Internet: allbery@NCoast.ORG		    Packet: KB8JRR @ WA8BXN
America OnLine: KB8JRR			    AMPR: KB8JRR.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery    Delphi: ALLBERY

bengsig@dk.oracle.com (Bjorn Engsig) (02/21/91)

In <447@cti1.UUCP> cti1!kmeek (Kevin Meek) writes:
|
|We want an output record for each account with the total 
|planned amount and total expended amount.  [i.e. a join of two aggregates
|and a table]
|
|I tried creating a view for each separately and then joining my views
|but you can't use a where when your view has a group by clause.
| 
Article <AE_V-ldatL@bob.srcc.msu.su> by butenko@bob.srcc.msu.su says:
| I think that many SQL servers use smth like "macroprocessing" for views, i.e.
| they substitute the text of your view definition into your query and then try
| to interpret it. So, if you can't do something without views, creating some
| views can't help (just an opinion, maybe it's not right for some SQL servers).
Oracle will actually allow you to do that, e.g.

create view plansum(accno,ps) as select accno,sum(amount)
from planned group by accno

create view expsum (accno,ex) as select accno,sum(amount)
from expenses group by accno

select acc.accno, acc.accname, ... , expsum.ex, plansum.ps
from acc, plansum, expsum
where acc.accno = expsum.accno
and   acc.accno = plansum.accno

In C.J.Date "Selected Writing", there is a very good description of how
deficient SQL actually is.  It would for example be very nice to have a
true concept of 'table expressions' so that I could write

select ...
from ( select ... )
-- 
Bjorn Engsig, ORACLE Corporation, E-mail: bengsig@oracle.com, bengsig@oracle.nl

            "Stepping in others footsteps, doesn't bring you ahead"

cortesi@informix.com (David Cortesi) (02/23/91)

In article <1262@dkunix9.dk.oracle.com> bengsig@dk.oracle.com (Bjorn Engsig) writes:
] In <447@cti1.UUCP> cti1!kmeek (Kevin Meek) writes:
]
] |We want an output record for each account with the total 
] |planned amount and total expended amount.  [i.e. a join of two aggregates
] |and a table]
] |I tried creating a view for each separately and then joining my views
] |but you can't use a where when your view has a group by clause.
] 
] Oracle will actually allow you to do that, e.g.
] 
] create view plansum(accno,ps) as select accno,sum(amount)
] from planned group by accno
] 
] create view expsum (accno,ex) as select accno,sum(amount)
] from expenses group by accno
] 
] select acc.accno, acc.accname, ... , expsum.ex, plansum.ps
] from acc, plansum, expsum
] where acc.accno = expsum.accno
] and   acc.accno = plansum.accno

Rather to to my astonishment, Informix SQL also permits this.  Curious
to know how the engine went about it, I used SET EXPLAIN ON, a feature
that makes the optimizer write its query plan to a file for reference.
It did about what you'd expect: executed each of the views in turn
with output to a temporary table, then joined the three tables.  It
picked one of the temp tables as the master table for the join; then
it built a temporary index on the other temp table so it could use
indexed lookups on both subordinate tables.  Which is just about the
sequence of operations one would enter manually to accomplish the same
result if the engine would not.