[comp.databases] errors and 4 valued logic

holliday@csgrad.cs.vt.edu (Glenn Holliday) (11/06/90)

Chuck Phillips  writes:
>Clairification: A statement requiring the existance of the non-existant to
>be true, is simply false, not an error.  However, a domain violation _is_
>an error.  (e.g. "I am green years old.")

while aaron@grad2.cis.upenn.edu (Aaron Watters) argues

>`the king of North America is bald' is error.
>if ``The king of North America is bald'' is false then we infer
>that ``The king of North America has hair'' must be true, right?
>... My favorite
>version of 4-valued logic resolves this difficulty by calling
>both statements `overdefined' or `erroneous.'

>PS: I also claim that `I am green years old' can reasonably be
>treated as false.

	I think this is another example of Neat vs Scruffy, otherwise known
as "What does logic _mean_" vs "How does the database work?".  The examples
can be logically represented to model any domain you like.  Whether you
want them to be false or errors depends completely on what domain you
define.  If any object can be in the domain, you can infer true statements
or fail to infer them, which makes them false (_if_ you define your logical
system to give that meaning to false!).
	On the other hand, practical database systems want to work with
simple domains and decide efficiently whether to allow a statement or
query.  Calling either example an error is legitimate as long as your
database system behaves consistently and tells the user why it rejected the
construct.
	To call both of these false, your database wants rules that say
1. The domain for every object/entity must be completely defined.
2. Any purported value which does not fall within the appropriate domain is
   reported as an error.
3. Any reference to a value which does not exist (but would fall within an
   appropriate domain if it did exist) is reported as an error.

	Of course, much of the hue and cry over null vs empty, etc is about
the best way to implement these rules.

Glenn Holliday  holliday@csgrad.cs.vt.edu

aaron@grad2.cis.upenn.edu (Aaron Watters) (11/06/90)

The question at hand is:  What are the appropriate truth values
for the following statements
	B: `the king of north america is Bald.'
	H: `the king of north america has Hair.'
	G: `I am Green years old.'
About which...

In article <694@creatures.cs.vt.edu> holliday@csgrad.cs.vt.edu (Glenn Holliday) writes:
>Chuck Phillips  writes:
>	I think this is another example of Neat vs Scruffy, otherwise known
>as "What does logic _mean_" vs "How does the database work?".  

If you say so.

>The examples
>can be logically represented to model any domain you like.  Whether you
>want them to be false or errors depends completely on what domain you
>define.  

I don't agree at all.  B=not H.  If one is true then the other is
false and that's that.  Since at an intuitive level I can admit
neither B or H as a true statement, I am forced to consider an additional
truth value called `error' (or `overdefined') in order to deal with
such statements.  (I've had personal correspondence with a
constructivist who argues that B must be true since we cannot produce
a hair off the king's head -- I maintain this does violence to ordinary
common sense, even if it is an internally consistent position.)

>If any object can be in the domain, you can infer true statements
>or fail to infer them, which makes them false (_if_ you define your logical
>system to give that meaning to false!).

this is Reiter's closed world assumption, which I've always regarded
as a convenient technical assumption that is not really justifiable
in general.  Something is false if it is incorrect, not just because
I cannot prove it to be true.  To call mechanisms that do not make
this distinction `logical' is a perversion of language.

>	To call both of these [B and G?]
>false, your database wants rules that say
>1. The domain for every object/entity must be completely defined.
>2. Any purported value which does not fall within the appropriate domain is
>   reported as an error.
>3. Any reference to a value which does not exist (but would fall within an
>   appropriate domain if it did exist) is reported as an error.
>Glenn Holliday  holliday@csgrad.cs.vt.edu

I'm afraid I don't understand this.		-aaron

PS: recap of my position B=H=error. G=false.

morrison@cs.ubc.ca (Rick Morrison) (11/07/90)

In article <694@creatures.cs.vt.edu> holliday@csgrad.cs.vt.edu (Glenn Holliday) writes:
>Chuck Phillips  writes:
>>Clairification: A statement requiring the existance of the non-existant to
>>be true, is simply false, not an error.  However, a domain violation _is_
>>an error.  (e.g. "I am green years old.")
>

The bald king of France has interested philosphers of language for a very 
long time. For those who are interested, there is a very nice paper by
Russell that discusses this and related problems:

@incollection{russell4,
   author =        {Bertrand Russell},
   title =         {Descriptions},
   booktitle =     {Semantics and the Philosophy of Language},
   publisher =     {University of Illinois Press},
   address =       {Urbana},
   year =          {1952},
   editor =        {Leonard Linsky},
   chapter =       {6},
   pages =         {95-108},
   note =          {Reprinted from Chap. XVI, {\it Introduction to
                    Mathematical Philosophy}, 2nd ed. London:
                    Allen and Unwin, 1920.},
   keywords =      {philosophy language semantics indefinite descriptions
                    definite descriptions}
}

As I recall, Russell took the view that 
       "the present king of France is {bald, hirsute}"
be understood as 
       (Ex) (PKOF(x) & Bald(x)), 
respectively
       (Ex) (PKOF(x) & Hirsute(x)), 
both of which are false.

The collection includes papers by Quine and others on the same topic.
Great reading.
----------------------------------
Rick Morrison		 | {alberta,uw-beaver,uunet}!ubc-cs!morrison
Dept. of Computer Science| morrison@cs.ubc.ca
Univ. of British Columbia| morrison%ubc.csnet@csnet-relay.arpa
Vancouver, B.C. V6T 1W5  | morrison@ubc.csnet (ubc-csgrads=137.82.8.20)
(604) 228-5010

Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) (11/11/90)

>>>>> On 6 Nov 90 14:22:10 GMT, aaron@grad2.cis.upenn.edu (Aaron Watters) said:

Aaron> The question at hand is:  What are the appropriate truth values
Aaron> for the following statements
Aaron> 	B: `the king of north america is Bald.'
Aaron> 	H: `the king of north america has Hair.'
Aaron> 	G: `I am Green years old.'

Aaron> I don't agree at all.  B=not H.

To steal a line, "I don't agree with your disagreement".  _Both_ B and H
assume the existence of a unique (follows from the use of "the king" as
opposed to "a king") king of North America.  To make explicit the implicit:

B1: There exists exactly one king of North America.
-and-
B2: The king of North America is bald.

H1: There exists exactly one king of North America.
-and-
H2: The king of North America is not bald.

It is true that B2 and H2 cannot both be true.  Further, since B1=H1=FALSE,
it follows (B1 and B2)=(H1 and H2)=B=H=FALSE.  (If two people claim to be
Napoleon, they can't both be correct, but they can both be wrong.)

Even if you negate both statements:

not B: There is no bald king of North America.
not H: There is no non-bald king of North America.

As one would expect, both of these statements are true.  Consistentcy
reigns, even if the king of North America does not.  :-)

Aaron> Since at an intuitive level I can admit neither B or H as a true
Aaron> statement, I am forced to consider an additional truth value called
Aaron> `error' (or `overdefined') in order to deal with such statements.

Although, I don't see it as _necessary_ (an N-valued logic can be modelled
by a 2-value logic for any finite N), perhaps there is some common ground
here.

With DBMSs, the problem with using a two valued logic, IMHO, is
that it is _useful_ to allow both domain violations and uncertainties to be
stated _explicitly_ instead of always deriving them by implication.  You
_can_ model properties as abstract as existence and temporal relations
using only a 2-valued logic, but it gets to be awkward, requires storing
information about meaningless impossibilities (e.g.  modelling existence)
and/or an explosion in the number of required variables (e.g. temporal
relations).

Allowing the explicit "unknown-but-in-domain" semantic can help eliminate
(if not eliminate) the special case-ing currently required to implement
outer joins.  Because it is _also_ useful to allow an explicit
"out-of-domain" semantic, we're left with a 4-valued logic to maximize
robustness, clairity, concisness of expression and (I believe) performance
for certain types of queries.

Aaron> (I've had personal correspondence with a constructivist who argues
Aaron> that B must be true since we cannot produce a hair off the king's
Aaron> head -- I maintain this does violence to ordinary common sense, even
Aaron> if it is an internally consistent position.)

Ha!  I used similar "reasoning" to argue in a Biology class that sterility
_was_ inheirited and that it was a dominant trait.  After all, if your
parents don't have children, you won't either.  :-)  But seriously, neither
of these positions is even "internally consistent".  Both contradict
observable reality, and are therefore false.

>	To call both of these [B and G?]
>false, your database wants rules that say
>1. The domain for every object/entity must be completely defined.
>2. Any purported value which does not fall within the appropriate domain is
>   reported as an error.
>3. Any reference to a value which does not exist (but would fall within an
>   appropriate domain if it did exist) is reported as an error.
>Glenn Holliday  holliday@csgrad.cs.vt.edu

In a 2-valued logic, G either would be disallowed altogether, being out of
domain, or else evaluate to false.  (Implicit assumption: Green is a
measure of time.)  In the 4-valued logic discussed, G would be error.
Regarding your three assertions: For the 4-valued logic discussed to be
particularly useful, 2 and 3 are needed.  Though 1 is desirable, it is not
essential.

A. You can explicitly assign the value of error.
B. A partial domain definition is still often useful. Example: Defining the
   domain as an integer will yield an error value when an attempt is made
   to assign a string to the cell, even though the _real_, but unenforced,
   domain is the set of integers between 1 and 10.  Not ideal, but still
   useful.
C. Operations on "unknown" values still preserve more information than the
   current system.  (e.g. "(true OR unknown)=true", vs. "(true OR null)=null".)


Aaron> PS: recap of my position B=H=error. G=false.

And mine: B=H=false.  G=error.

Recursive conclusion: We can't both be right, but we can both be wrong.  :-)

	Cheers,
--
Chuck Phillips  MS440
NCR Microelectronics 			chuck.phillips%ftcollins.ncr.com
2001 Danfield Ct.
Ft. Collins, CO.  80525   		...uunet!ncrlnk!ncr-mpd!bach!chuckp

aaron@grad2.cis.upenn.edu (Aaron Watters) (11/13/90)

 Once again we are concerned with the truth values
 for the following statements
 	B: `the king of north america is Bald.'
 	H: `the king of north america has Hair.'
 	G: `I am Green years old.'

I claimed B = not H, B=H=error.

Chuck Phillips writes...

=_Both_ B and H
=assume the existence of a unique (follows from the use of "the king" as
=opposed to "a king") king of North America.  To make explicit the implicit:
=
=B1: There exists exactly one king of North America.
=-and-
=B2: The king of North America is bald.
=
=H1: There exists exactly one king of North America.
=-and-
=H2: The king of North America is not bald.
=
=It is true that B2 and H2 cannot both be true.  Further, since B1=H1=FALSE,
=it follows (B1 and B2)=(H1 and H2)=B=H=FALSE.  (

I would dispute this restatement.  In logic you cannot in general
replace a constant with an existential statement in this manner.  To
my mind the noun `the king of north america' does not correspond to
any existential statement it corresponds
	in logic		to a constant
	in pascal (ml)		to a pointer (reference)
	in O-O dbs's		to an object.

With reference to pascal if I have a null reference p to a record
which contains a field `bald' the expression `p^.bald' cannot
be correctly thought to be true or false (or even unknown).
It can only be treated as an error.

In logic, if I am studying
a semigroup S and ask `is the left identity equal to the right
identity' when S has no right identity the value of the statement
cannot be correctly treated to be true or false while maintaining
consistency.  In fact the structure (S, operation, left-id, right-id)
together with axioms cannot be consistently encoded in a binary
truth valued first order model (directly).

=Even if you negate both statements:
=
=not B: There is no bald king of North America.
=not H: There is no non-bald king of North America.

I don't think these are reasonable restatements either.  As I
said not B = H, not H = B.

Beyond this, I geuss I'd agree with the bulk of the rest of your
entry.  What you seem to be suggesting is that we should disallow
constants (references, objects) with no referent.  The question
becomes whether users will tolerate such a restriction.  I think
in the context of object oriented databases they would find such
a restriction irritating to say the least.

Consider a query about a circular list L of integers
	Query: Is the first entry of the list less than 3?
	Answer: No.
Don't you think it would be better to get `error' signifying
that there is no such first entry?
		-aaron.

holliday@csgrad.cs.vt.edu (Glenn Holliday) (11/18/90)

aaron@grad2.cis.upenn.edu (Aaron Watters) writes:

> Once again we are concerned with the truth values
> for the following statements
> 	B: `the king of north america is Bald.'
> 	H: `the king of north america has Hair.'
> 	G: `I am Green years old.'

Indeed!  And I think I finally understand why we are disagreeing.  I think
you want B H and G to be different kinds of objects than I do.

The most important point is that we are interested in how to get real life
databases to make useful decisions when faced with unusual values.

>I claimed B = not H, B=H=error.

You assign TRUE or FALSE to each of B, H and G.  I don't think that's
possible.  Chuck Phillips (Chuck.Phillips@FtCollins.NCR.COM) also took my
path when he broke these down into

>B1: There exists exactly one king of North America.
>-and-
>B2: The king of North America is bald.

>H1: There exists exactly one king of North America.
>-and-
>H2: The king of North America is not bald.

>It is true that B2 and H2 cannot both be true.  Further, since B1=H1=FALSE,
>it follows (B1 and B2)=(H1 and H2)=B=H=FALSE.

Aaron Waters argues

>In logic you cannot in general
>replace a constant with an existential statement in this manner.  To
>my mind the noun `the king of north america' does not correspond to
>any existential statement it corresponds
>	in logic		to a constant
>	in pascal (ml)		to a pointer (reference)
>	in O-O dbs's		to an object.

Exactly!  Each of your statements H B and G are not constants.  They are
intended to model more than one chunk of information about the objects
"king", "north america", "hair", etc.  If you _do_ treat them as logical
constants, then your argument is correct.  But I believe they have lost
all meaning if you do that.  Assigning truth values to H B and G says
nothing about the real-world domain we're trying to model and reason about
in our database.

Your programming-language restatement is good.  But pointers to data
structures do not have truth values.  We need to look inside the structures
and reason about the meanings of the data values and relationships between
them.

>>If any object can be in the domain, you can infer true statements
>>or fail to infer them, which makes them false (_if_ you define your logical
>>system to give that meaning to false!).

>this is Reiter's closed world assumption, which I've always regarded
>as a convenient technical assumption that is not really justifiable
>in general.

>>your database wants rules that say
>>1. The domain for every object/entity must be completely defined.
>>2. Any purported value which does not fall within the appropriate domain is
>>   reported as an error.
>>3. Any reference to a value which does not exist (but would fall within an
>>   appropriate domain if it did exist) is reported as an error.

>I'm afraid I don't understand this.		-aaron

My basic point is, before we can use formal logic in databases, we need to
decide how logic should work in a real database system, what are the
meanings of the possible data in the database, and what are sensible/useful
behaviors when we try to reason on values that are "funny" in one way or
another.  You don't have to use the closed world assumption -- but you have
to decide how to recognize an error and what to do about it.

>Consider a query about a circular list L of integers
>	Query: Is the first entry of the list less than 3?
>	Answer: No.
>Don't you think it would be better to get `error' signifying
>that there is no such first entry?

I think it would be better to get
	Answer: No first entry is defined on this domain.
--

Glenn       | holliday@csgrad.cs.vt.edu            OR
Holliday    | glenn%bayberry@chado.fidonet.org     OR
            | GHOLLID@access.nswc.navy.mil          (Internet)

aaron@grad2.cis.upenn.edu (Aaron Watters) (11/20/90)

 Once again we are concerned with the truth values
 for the following statements
 	B: `the king of north america is Bald.'
 	H: `the king of north america has Hair.'
 	G: `I am Green years old.'
I claimed B = not H, B=H=error, G=false.

In article <720@creatures.cs.vt.edu> holliday@csgrad.cs.vt.edu (Glenn Holliday) writes:
...
>  Chuck Phillips (Chuck.Phillips@FtCollins.NCR.COM) also took my
>path when he broke [B and H] down into
>
>>B1: There exists exactly one king of North America.
>>-and-
>>B2: The king of North America is bald.
>
>>H1: There exists exactly one king of North America.
>>-and-
>>H2: The king of North America is not bald.
>
>Aaron Waters argues [Watters]
>>... the noun `the king of north america' does not correspond to
>>any existential statement it corresponds
>>	in logic		to a constant
>>	in pascal (ml)		to a pointer (reference)
>>	in O-O dbs's		to an object.
>
>Exactly!  Each of your statements H B and G are not constants.  They are
>intended to model more than one chunk of information about the objects
>"king", "north america", "hair", etc.  If you _do_ treat them as logical
>constants, then your argument is correct.  But I believe they have lost
>all meaning if you do that.

Well, I didn't say that, I said that `the king of north america'
behaved like a logical constant -- NOT like an existential assertion
 `the king is bald' != `there is a king who is bald'
the two statements simply have differing intuitive behaviors.  The
only way you can enforce the restatement you suggest on a database
user is to disallow the first sort of statement...

Imagine, if you will, a `fourth generation object oriented database
language' where the user writes:
	let K be any king of north america: (K is an object)
           if K is not bald print `the king has hair'
           if K is does not have hair print `the king is bald'
        endlet.
Now I admit that the program is awkwardly written, but I think
you would agree that the output
	`the king is has hair the king is bald'
defies the ordinary interpretation of the above pseudocode.  I claim
that since the code assumes the existence of such a king the let
statement should result in a `null' object (no reason to raise an
error if it is never used) and the evaluation of `K is not bald'
should result in an error. such as
>	Answer: No [king] is defined on this domain.
As you suggest.
		-aaron