[comp.databases] DB Question of the week

flak@slovax.UUCP (Dan Flak) (08/17/87)

In article 923 Prentiss Riddle presents:

>I've inherited responsibility for a medium-large and growing database (say

>Currently the data is stored in a single large Informix file in which each
>record consists of a key and 52 floating point lab values.  The key is a
>composite of a study group number, a subject number and a time. 

>+-----------+-------------+------+-------+-------+-   -+--------+
>| group no. | subject no. | time | val 1 | val 2 | ... | val 52 |
>+-----------+-------------+------+-------+-------+-   -+--------+

(not all values are measured every time, and some of the value
fields have different meanings to different groups).

>PLAN A:  Instead of using very wide records with 52 lab values, why not use
>short records each containing a key and a single lab value?  Each record
>might look like this:
>
>+-----------+-------------+------+-------+-------+
>| group no. | subject no. | time | label | value |
>+-----------+-------------+------+-------+-------+
>
>The "label" would be a short (3-4 byte) string explaining the meaning of the
>"value" field.  Obviously most of our storage here goes for keys (and

I like your plan A, but have additional questions.

Are subject numbers unique to groups? That is, does an individual
group study several subjects, but no subject is studied by more than
one group?

If that's the case, you could always maintain a group no. - subject no.
file "on the side". You can join the group name back in at report run
time.

This would make the file look like (with sample data):

+-------------------------+
|        FILE ONE         |
+-----------+-------------+
| group no. | subject no. |
|-----------+-------------|
|     1     |   870001    |
|     1     |   870002    |
|     2     |   870003    |
|     2     |   870004    |
|     2     |   870005    |
|     3     |   870006    |
|     .     |     .       |
|     .     |     .       |
|     .     |     .       |
+-----------+-------------+


+-----------------------------------+
|            FILE 2                 |
|------------+------+-------+-------|
| subject no | time | label | value |
|------------+------+-------+-------|
| 870001     | 1000 | TEMP  |  98.6 |
| 870001     | 1000 | SYS   |  120  |
| 870001     | 1000 | DIA   |  80   |
| 870001     | 1100 | TEMP  |  98.7 |
|  .         |  .   |  .    |   .   | 
|  .         |  .   |  .    |   .   |
|  .         |  .   |  .    |   .   |
| 870006     | 1000 | PH    |  7.05 |
+------------+------+-------+-------+

If not, no big deal, just more typing.

Now, if there's only going to be one or two readings per subject
number, this scheme may not be worth while. However, if there's going
to be many readings per subject number, then this is the way to go.

The main problem with this set up is that you have to retype the
subject number (and possibly the group number) for each observation.
It's far easier to "fill in the blanks" in the "flat" file system you
have set up now. (I.e. type the group and subject once, and then
associate a string of numbers). It may be easier on input, but it's
more difficult to extract the data especially since corresponding
fields don't always mean the same thing. I don't see how you can
distinguish between legitimate zero values and "no data (reading)".

Fortunately, Informix offers a nice compromise known as the
Master-Detail screen Perform. 4GL is also supposed to have some
"whiz-bang" entry method for doing something similar.

BTW: When (if?) you decide to go with plan A. Don't overlook using
unload ascii in dbstaus followed by some awk scirpts to parse out
your data for reload into the new schema.

Of course, there's always the possibility of a compromise. That is,
have one (or several) "standard files". I'm judging by the organization
you work for, that some of these readings are medical tests. I know
that one standard would be basic vital signs (pulse, blood pressure,
temperature, respiration) . You can maintain these in a file system
such as you have now (only much reduced), and join them with "specific"
tests or measurements for fluid intake/output, blood work results
(gasses, electrolytes, etc. - possibly another "standard" group), etc.
Without knowing the specifics, it's hard for me to generalize.

I won't bore the rest of the net with specifics. If I am anywhere
near the target, and if you can put up with more of my rambling send
me E-mail. 

-- 
{psivax,ism780}!logico!slovax!flak  :  {hplsla,uw-beaver}!tikal!slovax!flak
Dan Flak-R & D Associates,3625 Perkins Lane SW,Tacoma,Wa 98499,206-581-1322