[comp.databases] Stoopid questions - One down, one to go

Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) (09/26/90)

Thanks to gordon@meaddata.com, mao@postgres.berkeley.edu and
weigele@rz.informatik.uni-hamburg.dbp.de for emailing replies.  I now
understand that forbidding primary keys from containing NULLs makes the
implementation _much_ easier.  However, my second question remains.

from gordon@meaddata.com's reply:

> |> 2. Relations with a NULL operand return NULL, e.g.
> |> 
> |> TRUE OR NULL = NULL
> |> 
> |> This stoopid beginner would expect a value of TRUE instead of NULL.
> 
> The value NULL stands for "value unknown", or (as Himawan Gunadhi so
> appropriately corrected me) "value inappropriate."  Can two "unknown" or
> "inappropriate" values be equal?  There has been some criticism of NULLs, 
> including Date himself in Volume II.  A quick glance through 
> ANSI X3.135-1989 doesn't provide any additional help.  My own point-of-
> view is to treat NULLs like INFINITY in calculus.  Can INFINITY really equal
> INFINITY?  Its not very satisfactory, but it works for me.

In most writings on three-valued logics I've encountered (until
relational!) the third value did, in fact, mean "unknown".  This _does_
have different semantics than "value inappropriate".  "value unknown"
implies that the value _is_ within the domain, but not known.  "value
inappropriate" appears, to me, to indicate a datum _not_ within the domain,
implying any tuple containing NULL in _any_ field is invalid and should not
even be allowed.  However, since tuples containing NULL _are_ allowed, I
_assume_ the intended semantics are, as you stated, "value unknown".

Continuing the "value unknown" thread...

TRUE OR TRUE = TRUE
TRUE OR FALSE = TRUE

ergo, in three-value boolean algebra:

TRUE OR UNKNOWN = TRUE

...but in relational algebra

TRUE OR UNKNOWN = UNKNOWN  (!)

This, to me, involves an unnecessary loss of information.  FYI, other truth
tables from three-value, U-means-"unknown" boolean algebra:

p | NOT p
T     F
U     U
F     T

p | q | p AND q | p OR q | p IMPLIES q  -- a.k.a (NOT p) OR q
F   F      F         F          T
F   U      F         U          T
F   T      F         T          T
U   F      F         U          U
U   U      U         U          U
U   T      U         T          T
T   F      F         T          F
T   U      U         T          U
T   T      T         T          T


For binary relations "p @ q" such that @ is replaced with any one of the
operators from {=, !=, >, >=, <, <=} and p, q, or both p and q are unknown,
the value of the relation is unknown.  This is no different than the
current relational handling of these operators.  (The "=" and "!=" truth
tables are directly derived from the respective identities "(p=q)" is
equivalent to "((p AND q) OR ((NOT p) AND (NOT q))", and "(p!=q)" is
equivalent to "((p AND NOT q) OR ((NOT p) AND q))".  For "<", "<=", ">" and
">=" I've extrapolated from 3-valued logic using the U = "value in domain,
but unknown" semantic.)

The current definition is not entirely consistent with either the "NULL
means unknown" semantic or the "NULL means inappropriate" semantic.
Consistency with the former implies NULL OR TRUE = TRUE, while consistency
with the latter implies no tuple should contain a NULL in _any_ field
because of a domain violation.

The implementation of the "value unknown" semantic, as far as I can tell,
should be no more difficult to implement than the current semantic; I've
modelled three (and more) valued logics with finite domain polynomials,
although a yacc generated expression parser could handle this much more
efficiently.  Giving NULL the "unknown-but-in-domain" semantic would get
rid of the "out-of-domain-but-we'll-allow-it-anyway-just-for-this-case"
baggage.  It would also be a bit more consistent with set theory where
"NULL" is a member of every set (i.e. within every domain).  The only
problem I can think of is upward compatibility.  Upward compatibility _is_
significant, but I'm addressing tuple calculus theory, _not_ the ANSI SQL
specification.

Am I still missing something?

Aside: If the logical value "UNKNOWN" were introduced with the appropriate
semantics, but without removing NULL, we could add a useful, IMHO, semantic
without breaking tons of old code.  I would guess "NULL @ UNKNOWN" for any
binary operator "@" would have to return a value of "NULL", since "out-of-
domain" values propagate by their nature.  Of course, until "out-of-domain"
NULLs are forbidden from valid tuples (but not necessarily from expression
or query evaluation), the baggage would remain.

[My, my.  Aren't we getting hypothetical today? -- mailer-daemon:-]

Disclaimer: I only represent myself.  No other warantees expressed or
implied.  See your doctor before taking internally.
--
Chuck Phillips  MS440
NCR Microelectronics 			chuck.phillips%ftcollins.ncr.com
2001 Danfield Ct.
Ft. Collins, CO.  80525   		...uunet!ncrlnk!ncr-mpd!bach!chuckp

davidm@uunet.UU.NET (David S. Masterson) (09/30/90)

In article <CHUCK.PHILLIPS.90Sep26111518@halley.FtCollins.NCR.COM> 
Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) writes:

   Continuing the "value unknown" thread...

   TRUE OR TRUE = TRUE
   TRUE OR FALSE = TRUE

   ergo, in three-value boolean algebra:

   TRUE OR UNKNOWN = TRUE

   ...but in relational algebra

   TRUE OR UNKNOWN = UNKNOWN  (!)

This is not what I remember seeing in Codd's book.  Certain conclusions have
to occur in order to maintain transitive closure.  I think relational algebra
goes along with boolean algebra.  Unless this is some mix of three and four
valued logic?!?
--
====================================================================
David Masterson					Consilium, Inc.
uunet!cimshop!davidm				Mtn. View, CA  94043
====================================================================
"If someone thinks they know what I said, then I didn't say it!"

mst@vexpert.dbai.tuwien.ac.at (Markus Stumptner) (10/01/90)

From article <CIMSHOP!DAVIDM.90Sep30022748@uunet.UU.NET>, by cimshop!davidm@uunet.UU.NET (David S. Masterson):
> In article <CHUCK.PHILLIPS.90Sep26111518@halley.FtCollins.NCR.COM> 
> Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) writes:
> 
>    Continuing the "value unknown" thread...
>    [...]
>    ergo, in three-value boolean algebra:
> 
>    TRUE OR UNKNOWN = TRUE
> 
>    ...but in relational algebra
> 
>    TRUE OR UNKNOWN = UNKNOWN  (!)
> 
> This is not what I remember seeing in Codd's book.  Certain conclusions have
> to occur in order to maintain transitive closure.  I think relational algebra
> goes along with boolean algebra.  Unless this is some mix of three and four
> valued logic?!?

Codd definitely uses the standard TRUE OR NULL = TRUE.  The real "loss
of information" problems with his approach arise from straightforward
use of this rule in tautologies:

(birthdate < 1-1-66) OR (birthdate >= 1-1-66)

results in the two halves of the disjunction being evaluated on their
own (both to NULL), and the result of the whole expression is then
NULL, even though it is clear that for any known value (which NULL
stands for), at least one half and thus the whole expression would
evaluate to NULL.  There is an article in the ACM SIGMOD RECORD (Vol. 15:4,
Dec. 86) where Codd writes on this and related topics.  He concludes
(not wholly without justification) that this is not a very pressing
problem, since building tautologies into your conditions is a rather
stupid thing to do, anyway.  Still, it's annoying.

In that paper, Codd also writes about four-valued logic based on
"unknown" and "not applicable" values (though he doesn't call them
that).  I have not read his book, so I don't know whether this
discussion is mentioned there.

Markus Stumptner                                mst@vexpert.dbai.tuwien.ac.at
Technical University of Vienna                  vexpert!mst@uunet.uu.net
Paniglg. 16, A-1040 Vienna, Austria             ...mcsun!vexpert!mst