[comp.databases] Querying for Percentages

SYSPMZT@gecrdvm1.crd.ge.com (04/26/91)

Can anyone help an SQL idiot with what appears to be a simple query?

I have performance data that looks something like this:

CONNECTION    ELAPSED_TIME   OCCURRENCES
----------    ------------   -----------
CICS                   1.5          100
TSO                    2.0           50
CICS                   0.5          100
CICS                   3.0           50
TSO                    2.5          100

What I want to retrieve (using DB2/QMF) is the percentage of CICS occurrences
that had and elapsed time greater than 1.0 seconds.  That, in more english
terms, translates to me to be:

select 'percentage is ', (sum(occurrences)/sum(all occurrences)) * 100
from table
where connection = 'cics'
and elapsed > 1.0

which would be 150/400 or 38% .  The problem is the sum(all occurrences)
function: how can I get this value, in either one query, a subselect, or
by saving a value in another table and somehow joining it to my performance
table?

This sounds so simple to me, but everything I've tried has returned 0,
or complained about syntax/grouping.  And *WHY CAN'T I GET THE HANG OF
THIS LANGUAGE ?????* cuss cuss

Many thanks,
            Phil Zampino

lung@venice.SEDD.TRW.COM (Diane M. Lung) (04/29/91)

In article <91115.152836SYSPMZT@GECRDVM1.BITNET> SYSPMZT@gecrdvm1.crd.ge.com writes:
>Can anyone help an SQL idiot with what appears to be a simple query?
>
>I have performance data that looks something like this:
>
>CONNECTION    ELAPSED_TIME   OCCURRENCES
>----------    ------------   -----------
>CICS                   1.5          100
>TSO                    2.0           50
>CICS                   0.5          100
>CICS                   3.0           50
>TSO                    2.5          100
>
>What I want to retrieve (using DB2/QMF) is the percentage of CICS occurrences
>that had and elapsed time greater than 1.0 seconds.  That, in more english
>terms, translates to me to be:
>
>select 'percentage is ', (sum(occurrences)/sum(all occurrences)) * 100
>from table
>where connection = 'cics'
>and elapsed > 1.0
>
>which would be 150/400 or 38% .  The problem is the sum(all occurrences)
>function: how can I get this value, in either one query, a subselect, or
>by saving a value in another table and somehow joining it to my performance
>table?
>
>This sounds so simple to me, but everything I've tried has returned 0,
>or complained about syntax/grouping.  And *WHY CAN'T I GET THE HANG OF
>THIS LANGUAGE ?????* cuss cuss
>
>Many thanks,
>            Phil Zampino


Hang in there, Phil.  There IS a solution, but it assumes that you can express
outer-joins in your DBMS.

It's actually simple once you visualize what you're trying to do.  You table
is:

	PERFORMANCE_DATA
	c1	c2	c3
	------- ------- -------
	CICS	1.5	100
	TSO	2.0	50
	CICS	0.5	100
	CICS	3.0	50
	TSO	2.5	100
	

Your problem is basically solved if you can produce the following self-join:

	t1.c1	t1.c2	t1.c3	t2.c1	t2.c2	t2.c3
	------- ------- ------- ------- ------- -------
	CICS	1.5	100	CICS	1.5	100
	CICS	3.0	50	CICS	3.0	50
	NULL	NULL	NULL	TSO	2.0	50
	NULL	NULL	NULL	CICS	0.5	100
	NULL	NULL	NULL	TSO	2.5	100

This join has your specified constraints on table t1, but there is a copy of
every row from t2 represented.  If the constraint fails for rows in t1, you
want NULL to appear.  In essence, you want to display all rows in t2, but
only rows in t1 that "made the cut".  Hopefully your DBMS has the outer-join 
(=*) operator.  Now, all you need now is to calculate sum(t1.c3)/sum(t2.c3).  
In Sybase, I would write:

	select 
	  sum(t1.c3) / sum(t2.c3)
	from 
	  PERFORMANCE_DATA t1, 
	  PERFORMANCE_DATA t2
	where
	  /* your constraint */
	  t1.c1 = 'CICS' and t1.c2 > 1.0
	  /* outer-join constraint */
	  and t1.c1 =* t2.c1 and t1.c2 =* t2.c2 and t1.c3 =* t2.c3

I have tested this code and it works.  Hope this helps you "get going" again.

Happy SQLing,

Tom Swartz
TRW Systems Integration Group

(I am borrowing this account)

chap@art-sy.detroit.mi.us (j chapman flack) (04/30/91)

In article <91115.152836SYSPMZT@GECRDVM1.BITNET> Phil Zampino writes:
>
>I have performance data that looks something like this:
>
 [perf_data table]
>
>What I want to retrieve (using DB2/QMF) is the percentage of CICS occurrences
>that had and elapsed time greater than 1.0 seconds.  That, in more english
>terms, translates to me to be:
>
>select 'percentage is ', (sum(occurrences)/sum(all occurrences)) * 100
>from table where connection = 'cics' and elapsed > 1.0
>
>which would be 150/400 or 38% .  

Here's a way I got it to work in Ingres.  I'd love to hear of a better one;
this strikes me as pretty ugly:

___ Mon Apr 29 15:38:40 1991 ___________________________________________________

  1> select * from perf_data

+------+-----------+-------------+
|connec|elapsed_tim|occurrences  |
+------+-----------+-------------+
|CICS  |      0.500|          100|
|CICS  |      1.500|          100|
|CICS  |      3.000|           50|
|TSO   |      2.000|           50|
|TSO   |      2.500|          100|
+------+-----------+-------------+
(5 rows)


  2> create view perf_total
  3> as select sum(occurrences) as all_occurrences
  4> from perf_data



  2> create view perf_grouped
  3> as select connection, sum(occurrences) as group_occurrences
  4> from perf_data
  5> where elapsed_time > 1.0
  6> group by connection



  2> select connection, 100.0 * group_occurrences / all_occurrences
  3> as percentage
  4> from perf_grouped, perf_total
  5> where connection = 'CICS'

+------+-----------+
|connec|percentage |
+------+-----------+
|CICS  |     37.500|
+------+-----------+
(1 row)
End of Request
________________________________________________________________________________

...and if it works in Ingres, you should have *no trouble at all*  ;-)

I'm still getting the hang of the language myself, and I don't like having to
create the second view, perf_grouped.  I don't have the ANSI book in front of
me, and I'm not sure why I can't just say:

select connection, 100.0 * sum(occurrences) / all_occurrences as percentage
  from perf_data, perf_total
  where elapsed_time > 1.0
  group by connection, all_occurrences
  having connection = 'CICS'

But in Ingres, that gets me:
E_OP0893 A repeat query
       number that is not valid is being added to a QEN_BASE.

Since repeat query numbers and QEN_BASEs have nothing to do with the SQL
standard, I'm assuming this is a feature of the Ingres implementation.
(Listening, Ingres?)  If somebody could try the same query on (an)other
platform(s) and see what happens, or find the language in the SQL standard
that says this is invalid, I'd appreciate the feedback.

>And *WHY CAN'T I GET THE HANG OF THIS LANGUAGE ?????* cuss cuss

Well, I imagine it might be tricky if all you have to experiment with is a
not-too-close-to-conformant SQL implementation.... (Listening, Ingres?)  :-)
-- 
Chap Flack                         Their tanks will rust.  Our songs will last.
chap@art-sy.detroit.mi.us                                   -Mikos Theodorakis

Nothing I say represents Appropriate Roles for Technology unless I say it does.