davidm@uunet.UU.NET (David S. Masterson) (01/11/90)
I've been having a problem designing a few databases on a couple of projects that have an inherent problem that I can't seem to normalize: Assume that you have some entities that each have a list of properties related to it. For instance, a given entity X might have properties (a, b, c) whereas another entity Y might have properties (a,b, x, y, z). There is nothing in the definition of properties that separates them into different types of entities/attributes -- they are all just "properties". Note also that each entity can have a different list of properties both in number and value (X has 3, Y has 5). Finally, do not assume that either the entity identifier or the property values are representable by one attribute (in the basic problem definition -- I don't know about contrived attributes). My inclination would be to model this in a relational database as two tables (one for entities and one for properties). The two tables would relate to one another via the EntityID (in this case, X or Y). Given this, how would someone write an SQL query to answer a request like "Find the entities having all of the following list of properties (...)"? If that turns out to be "not too tough", then how about the request "Find the entities having all of these properties (...) or all of these properties (...)?" Finally, how about the request "Find the entities having any of these properties (...) and any of these properties (...)"? Inquiring minds want to know... ;-) -- =================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mt. View, CA 94043 =================================================================== "If someone thinks they know what I said, then I didn't say it!"
may@28182.dec.com (Patrick May) (01/11/90)
In article <CIMSHOP!DAVIDM.90Jan10105028@uunet.UU.NET>, cimshop!davidm@uunet.UU.NET (David S. Masterson) writes... >Assume that you have some entities that each have a list of properties related >to it. For instance, a given entity X might have properties (a, b, c) whereas >another entity Y might have properties (a,b, x, y, z). There is nothing in >the definition of properties that separates them into different types of >entities/attributes -- they are all just "properties". Note also that each >entity can have a different list of properties both in number and value (X has >3, Y has 5). Finally, do not assume that either the entity identifier or the >property values are representable by one attribute (in the basic problem >definition -- I don't know about contrived attributes). What you need is an associative table between an entity table and a property table. Rows in the associative table would be composed of the primary key from the entity table and the primary key of one of the associated (hence the name) property table rows. This approach allows you to store non-primary key information about the properties once and once only. >Given this, how would someone write an SQL query to answer a request like >"Find the entities having all of the following list of properties (...)"? If Queries of this sort are then trivial accesses of the associative table.
davidm@uunet.UU.NET (David S. Masterson) (01/12/90)
In article <49@arkham.enet.dec.com> may@28182.dec.com (Patrick May) writes:
What you need is an associative table between an entity table and a
property table. Rows in the associative table would be composed of the
primary key from the entity table and the primary key of one of the
associated (hence the name) property table rows. This approach allows you
to store non-primary key information about the properties once and once
only.
I'm beginning to see that I left out a key ingredient to my question. Seeing
the replies has shown me that it is a key ingredient -- I didn't realize it
before. What I was trying to do was to arrange ("normalize"?) my database in
order to answer all possible queries in a static fashion. That is, I didn't
want to use dynamic SQL to answer a query like "where property in <X,Y> or
property in <M,N> and ..." which I beginning to see is not possible (the query
gets even worse when property becomes multivalued). I don't suppose there is
an SQL construct like "where <type, value> is in <<M,x>,<N,y>>" (the two where
clauses are not really related)? Perhaps such a construct would have value?
--
===================================================================
David Masterson Consilium, Inc.
uunet!cimshop!davidm Mt. View, CA 94043
===================================================================
"If someone thinks they know what I said, then I didn't say it!"
davidm@uunet.UU.NET (David S. Masterson) (01/12/90)
In article <CIMSHOP!DAVIDM.90Jan11125238@uunet.UU.NET> I write:
I'm beginning to see that I left out a key ingredient to my question.
Forgive me, but I've been looking at this some more and maybe my question
wasn't as wrong as I thought. Let me see if I can make the problem more
concrete.
I have a database consisting of observations. Each observation consists of a
<DataType, DataValue> pair (the observation might be water height or machine
temperature or any number of other things). Associated with each pair is a
list of <LocationType, LocationValue> pairs that help identify where the
observation occurred. This can be defined simply (and rather redundantly) in
one table like (with some example data):
Observation
DataType DataValue LocType LocValue
WaterHeight 10 Shore South
" " Time 12:30
" " Observer Bill
WaterHeight 15 Time 12:30
" " Observer Fred
WaterTemp 60 Time 1:30
" " Observer Tom
WaterTemp 50 Time 2:30
" " Observer Gerry
With a table like this (or more normalized), can a query be written like to
find <DataType, DataValue> that are constrained by some <LocType, LocValue>
pairs. For instance:
1. Find the WaterHeight when Time was 12:30 and Observer was Bill.
2. Find the WaterHeight for Observer Fred and Shore is South.
3. Find the WaterTemp for Observer (Tom or Fred) and Time was 1:30.
Since the qualification clause is used to qualify a row in a table (either in
the database or joined in memory), none of the above rows qualify as direct
answers to the query. Yet, this is obviously not what is wanted. There seems
to be nothing in the definition of <LocType, LocValue> that would suggest
normalizing it into more attributes. The application treats all the right
hand side information as just location information (lots of strings). Any
ideas?
--
===================================================================
David Masterson Consilium, Inc.
uunet!cimshop!davidm Mt. View, CA 94043
===================================================================
"If someone thinks they know what I said, then I didn't say it!"
may@28182.dec.com (Patrick May) (01/13/90)
>I have a database consisting of observations. Each observation consists of a ><DataType, DataValue> pair (the observation might be water height or machine >temperature or any number of other things). Associated with each pair is a >list of <LocationType, LocationValue> pairs that help identify where the >observation occurred. This can be defined simply (and rather redundantly) in >one table like (with some example data): > > Observation > DataType DataValue LocType LocValue > WaterHeight 10 Shore South > " " Time 12:30 > " " Observer Bill > WaterHeight 15 Time 12:30 > " " Observer Fred > WaterTemp 60 Time 1:30 > " " Observer Tom > WaterTemp 50 Time 2:30 > " " Observer Gerry >With a table like this (or more normalized), can a query be written like to >find <DataType, DataValue> that are constrained by some <LocType, LocValue> >pairs. Maybe I'm misunderstanding your explanation, but from the example I see no way to relate a time to a measurement. Your apparent goal is a database of data collection records composed of the following: Observer Time Location Observation Type Observed Value Each Observer may be related to many Time-Location pairs and may make many observations at that Time-Location. I would therefore model this system using the following tables: Time.Location Observer_ID Time Location_ID Observation_Data <foreign key from Time.Location> Observation_Type Observation_Value Validation Tables for Observer, Location, and Observation_Type. There is a one-to-many relationship between Time.Location and Observation_Data. The primary key fields in Time.Location (all three listed) may be supplemented with a system generated surrogate key to minimize the number of key fields in Observation_Data (although I try to avoid such constructs). Hope this helps, Patrick
dberg@cod.NOSC.MIL (David I. Berg) (01/14/90)
In article <CIMSHOP!DAVIDM.90Jan11152210@uunet.UU.NET>, cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: > I have a database consisting of observations. Each observation consists of a > <DataType, DataValue> pair (the observation might be water height or machine > temperature or any number of other things). Associated with each pair is a > list of <LocationType, LocationValue> pairs that help identify where the > observation occurred. This can be defined simply (and rather redundantly) in > one table like (with some example data): > > Observation > DataType DataValue LocType LocValue > WaterHeight 10 Shore South > " " Time 12:30 > " " Observer Bill > WaterHeight 15 Time 12:30 > " " Observer Fred > WaterTemp 60 Time 1:30 > " " Observer Tom > WaterTemp 50 Time 2:30 > " " Observer Gerry > > With a table like this (or more normalized), can a query be written like to > find <DataType, DataValue> that are constrained by some <LocType, LocValue> > pairs. -- To take the Information Modelling appoach, what you have is an object type "Observation". Each observation can have a number of attributes, each with its associated value. From the schema you presented in your posting, I conclude that you wish to keep your data base as generic as possible, i.e. allow for a variable number of attributes for each observation, including those which are yet to be defined. If this isn't so, then your schema should look like: Observation(Time, WaterTemp, WaterHeight, Shore, Obersver), end of discussion.. If this is your goal, however, (and it is not an unreasonable one) then you've started down the right track. When you normalize the schema you have presented, you will be left with those attributes which are ALWAYS a transitive function of Observation, and those atrtributes which are a function of the particular data items recorded during the observation. For each observation, you need an identifier. Date/Time of the observation would probably be your identifier. The only other attribute which is a transitive function of Observation is Observer (i.e. there will always be one and only one observer for each observation). Now you create another object type called Observation_Data. The attributes of this object type are <Data_Type, Data_Value>. There is a one to many relationship between the object types Observation and Observation_Data. To establish the logical access path between them, you must attribute Observation_Data with the identifier of Observation. I recommend that you create an Obervation Number for each observation, particularly to answer the queries you have posed. The implementation of the information model, then, results in the following data base schema: Observation(Date/Time, Observer, Observation_No) Observation_Data(Observation_No, Data_Type, Data_Value). Your data base now looks like: Observation(12:30,Bill,1) Observation(12:30,Fred,2) Observation(1:30,Tom,3) Observation(2:30,Gerry,4) Observation_Data(1,WaterHeight,10) Observation_Data(1,Shore,South) Observation_Data(2,WaterHeight,15) Observation_Data(3,WaterTemp,60) Observation_Data(4,WaterTemp,50) Now, to answer the following questions: 1. Find the WaterHeight when Time was 12:30 and Observer was Bill. 2. Find the WaterHeight for Observer Fred and Shore is South. 3. Find the WaterTemp for Observer (Tom or Fred) and Time was 1:30. you would: 1. SELECT Data_Value FROM Observation_Data, Observation WHERE Data_Type = "WaterHeight" AND Date/Time = "12:30" AND Observer = "Bill" AND Observation_Data.Observation_No = Observation.Observation_No 2. SELECT Data_Value FROM Observation_Data WHERE Data_type = Water_Height AND Observation_Data.Observation_No = (SELECT Observation.Observation_No from Observation, Observation_Data WHERE Observer = "Fred" AND Data_Type = "Shore" AND Data_Value = "South", AND Observation_Data.Observation_No = Observation.Observation_No) 3. SELECT Data_Value FROM Observation_Data, Observation WHERE Data_Type = "WaterTemp" AND Date/Time = "1:30" AND Observer IN ("Tom","Fred") AND Observation_Data.Observation_No = Observation.Observation_No -- David I. Berg (dberg@nosc.mil) GENISYS Information Systems MILNET: dberg@nosc.mil 4250 Pacific Hwy #118 UUCP: {akgua decvax dcdwest ucbvax}! San Diego, CA 92110 sdcsvax!noscvax!dberg (619) 226-1122