[comp.databases] Support for imprecise data: survey

ami@kodkod.usc.edu (Ami Motro) (07/30/90)

Hello database experts,

I am interested in finding the level of support (if any) in present commercial
database systems for IMPRECISE DATA.

I define imprecise data as any relevant information concerning a data value,
which is available in the absence of the actual data value itself.  
For example,
  1. The data value is unavailable, but is known to belong to a specified set
     (disjunctive data).
  2. The data value is unavailable, but is known to be within a particular 
     range (essentially, same as 1).
  3. The data value is unavailable, but is known to exist (null value).  
     This is the same as 1, except that the "set" is the entire domain.
  4. The data value is unavailable, but may not even apply.
  5. The data value is unavailable, but an approximation is available.

As a simple example, assume an attribute TEL-NO.  In a particular case you may
have 123-4567 (precise data); in another case you may only know that it's
either 123-4567 or 765-4321 (disjunctive); or you may be certain that there is
a telephone number, but not know it (null); or you may not be sure whether a
number exists; or you may only have 345-???? (approximation); and so on.

Things I would like to know include, how does the user describe the imprecise
data to the system? How does the system retrieve in the presence of imprecise
data?  Can the user specify imprecision in queries?  And so on.

If you are thoroughly familiar with a database system, and could comment on its
support for imprecise data (and could afford the time to commit it to e-mail),
I would appreciate hearing from you (even if it's simply "nothing at all is
available in system X").

If in your previous experience you encountered situations (applications) where
you had wished that some such support were available, please describe the
particular application, and the particular "missing feature", and how you
worked around it.

Note that many systems offer some support for null values.  Also, constants
with "wildcards" may be considered imprecise retrieval specifications.

Thanks in advance!  If you are interested, I'll send you a paper on the topic,
when it is completed.

Ami Motro

ami@usc.edu

dhepner@hpcuhc.HP.COM (Dan Hepner) (07/31/90)

From: ami@kodkod.usc.edu (Ami Motro)

>I am interested in finding the level of support (if any) in present commercial
>database systems for IMPRECISE DATA.

>  5. The data value is unavailable, but an approximation is available.

This topic was discussed here a while back under the subject heading 
"Fault Tolerant information recall", or something like that.

Your title for this field of investigation is superior.

Dan Hepner

endrizzi@sctc.com (Michael Endrizzi ) (08/01/90)

And the war starts.....

dhepner@hpcuhc.HP.COM (Dan Hepner) writes:

>This topic was discussed here a while back under the subject heading 
>"Fault Tolerant information recall", or something like that.

>Your title for this field of investigation is superior.

Fault Tolerance: A list of faults that will be tolerated and the 
post-fault level of service for each fault. The process of tolerating
a fault consists of 3 steps: fault detection, isolation, and recovery.

Fault Tolerant Information Recall:

	1) Tolerates syntatic faults in databases
	2) Post level of service guarantees data that is
	   syntatically "closest" to query.
	3) Fault-Detection: implicit and assumed
	   Fault-Isolation: ???
	   Fault-Recovery:  Query always returns with "best" answer 
		            available.

Can I see your definition of fault-tolerance mr. hepner??
Then we can talk.

				Dreez

ghm@ccadfa.adfa.oz.au (Geoff Miller) (08/02/90)

ami@kodkod.usc.edu (Ami Motro) writes:

>Hello database experts,

>I am interested in finding the level of support (if any) in present commercial
>database systems for IMPRECISE DATA....

>Things I would like to know include, how does the user describe the imprecise
>data to the system? How does the system retrieve in the presence of imprecise
>data?  Can the user specify imprecision in queries?  And so on....

I'm currently working with Prime "Information", which is a Pick variant with
some extra goodies, but I think my comments would be quite valid for generic
Pick.

I'm not quite sure whether you are referring to recording imprecise data (to
take your example, entering some character in a database to indicate that a 
person has a phone number although it is not known) or whether you are talking
about imprecise queries on precise data.  The first would appear to be largely
a matter of how you define your database and subsequent queries  -  the second
can get a bit more interesting.

One application on which we work is a military history database which records
data on individual servicemen.  We have no control over the raw data, which
are scanned from the original records, so along with scanning errors (which 
we mostly detect) we have problems arising from the inconsistency of the 
original records.  You might be surprised at how many ways the rank of 
Private can be recorded, let alone the number of equivalent ranks in 
specialist units (bombardier, fusilier, ...).  What we have had to do in 
a number of cases is to select by exclusion, so that we exclude the records
which obviously do not fit a particular criterion and then look at what we
have left and at how the criteria can be refined.  This can take many 
iterations, and sometimes we have to make the final selections by hand 
from a displayed list.

In general we have found this approach to work, although admittedly it can get
a bit tedious.  We have also found it much better to use a series of SELECT
statements rather than building up one enormous query (each SELECT works only
on the records returned by the previous one).  "Information" does of course
support selections on the basis of 'NE ""' (not equal to null) and pattern
matching and partial matching, so I don't think we have had any insoluble
problems in this area.

Geoff Miller  (ghm@cc.adfa.oz.au)
Computer Centre, Australian Defence Force Academy

gordon@mead.UUCP (Gordon Edwards) (08/03/90)

In article <1990Aug1.152432.7861@sctc.com>, endrizzi@sctc.com (Michael
Endrizzi ) writes:
|> And the war starts.....
|> 
|> dhepner@hpcuhc.HP.COM (Dan Hepner) writes:
|> 
|> 
|> >This topic was discussed here a while back under the subject heading 
|> >"Fault Tolerant information recall", or something like that.
|> 
|> >Your title for this field of investigation is superior.
|> 
|> 
|> Fault Tolerance: A list of faults that will be tolerated and the 
|> post-fault level of service for each fault. The process of tolerating
|> a fault consists of 3 steps: fault detection, isolation, and recovery.
|> 
|> Fault Tolerant Information Recall:
|> 
|> 	1) Tolerates syntatic faults in databases
|> 	2) Post level of service guarantees data that is
|> 	   syntatically "closest" to query.
|> 	3) Fault-Detection: implicit and assumed
|> 	   Fault-Isolation: ???
|> 	   Fault-Recovery:  Query always returns with "best" answer 
|> 		            available.
|> 

I normally don't get involved in word games, but in this case I have to agree
with Dan.  Before comming to MDC, I was a member of an operating system group
supporting embedded airborne systems for the Navy.  On this particular project,
an ELINT bird, three AN/AYK-14(V) airborne computers were used to support
various mission and flight navigation devices.  We had to design the operating
system to detect failures among the three computers.  Example, AYK1 carried
the software of AYK2 and AYK3. In the event that AYK2 became incapacitated, 
AYK1 would assume AYK2's responsibility.  Okay, no problem with detection
and isolation, but recovery!  If you can't recover something to a known
state then you shut it down and notify the crew, not make a guess and continue
running! 

In summary, best guesses have no place in fault tolerance.  Call it value
approximation, imprecise information recall, etc...


--
                                Gordon Edwards
                         Mead Data Central, Dayton OH
                           mead!gordon@uccba.uc.edu
                              uccba!mead!gordon