Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) (09/26/90)
Thanks to gordon@meaddata.com, mao@postgres.berkeley.edu and weigele@rz.informatik.uni-hamburg.dbp.de for emailing replies. I now understand that forbidding primary keys from containing NULLs makes the implementation _much_ easier. However, my second question remains. from gordon@meaddata.com's reply: > |> 2. Relations with a NULL operand return NULL, e.g. > |> > |> TRUE OR NULL = NULL > |> > |> This stoopid beginner would expect a value of TRUE instead of NULL. > > The value NULL stands for "value unknown", or (as Himawan Gunadhi so > appropriately corrected me) "value inappropriate." Can two "unknown" or > "inappropriate" values be equal? There has been some criticism of NULLs, > including Date himself in Volume II. A quick glance through > ANSI X3.135-1989 doesn't provide any additional help. My own point-of- > view is to treat NULLs like INFINITY in calculus. Can INFINITY really equal > INFINITY? Its not very satisfactory, but it works for me. In most writings on three-valued logics I've encountered (until relational!) the third value did, in fact, mean "unknown". This _does_ have different semantics than "value inappropriate". "value unknown" implies that the value _is_ within the domain, but not known. "value inappropriate" appears, to me, to indicate a datum _not_ within the domain, implying any tuple containing NULL in _any_ field is invalid and should not even be allowed. However, since tuples containing NULL _are_ allowed, I _assume_ the intended semantics are, as you stated, "value unknown". Continuing the "value unknown" thread... TRUE OR TRUE = TRUE TRUE OR FALSE = TRUE ergo, in three-value boolean algebra: TRUE OR UNKNOWN = TRUE ...but in relational algebra TRUE OR UNKNOWN = UNKNOWN (!) This, to me, involves an unnecessary loss of information. FYI, other truth tables from three-value, U-means-"unknown" boolean algebra: p | NOT p T F U U F T p | q | p AND q | p OR q | p IMPLIES q -- a.k.a (NOT p) OR q F F F F T F U F U T F T F T T U F F U U U U U U U U T U T T T F F T F T U U T U T T T T T For binary relations "p @ q" such that @ is replaced with any one of the operators from {=, !=, >, >=, <, <=} and p, q, or both p and q are unknown, the value of the relation is unknown. This is no different than the current relational handling of these operators. (The "=" and "!=" truth tables are directly derived from the respective identities "(p=q)" is equivalent to "((p AND q) OR ((NOT p) AND (NOT q))", and "(p!=q)" is equivalent to "((p AND NOT q) OR ((NOT p) AND q))". For "<", "<=", ">" and ">=" I've extrapolated from 3-valued logic using the U = "value in domain, but unknown" semantic.) The current definition is not entirely consistent with either the "NULL means unknown" semantic or the "NULL means inappropriate" semantic. Consistency with the former implies NULL OR TRUE = TRUE, while consistency with the latter implies no tuple should contain a NULL in _any_ field because of a domain violation. The implementation of the "value unknown" semantic, as far as I can tell, should be no more difficult to implement than the current semantic; I've modelled three (and more) valued logics with finite domain polynomials, although a yacc generated expression parser could handle this much more efficiently. Giving NULL the "unknown-but-in-domain" semantic would get rid of the "out-of-domain-but-we'll-allow-it-anyway-just-for-this-case" baggage. It would also be a bit more consistent with set theory where "NULL" is a member of every set (i.e. within every domain). The only problem I can think of is upward compatibility. Upward compatibility _is_ significant, but I'm addressing tuple calculus theory, _not_ the ANSI SQL specification. Am I still missing something? Aside: If the logical value "UNKNOWN" were introduced with the appropriate semantics, but without removing NULL, we could add a useful, IMHO, semantic without breaking tons of old code. I would guess "NULL @ UNKNOWN" for any binary operator "@" would have to return a value of "NULL", since "out-of- domain" values propagate by their nature. Of course, until "out-of-domain" NULLs are forbidden from valid tuples (but not necessarily from expression or query evaluation), the baggage would remain. [My, my. Aren't we getting hypothetical today? -- mailer-daemon:-] Disclaimer: I only represent myself. No other warantees expressed or implied. See your doctor before taking internally. -- Chuck Phillips MS440 NCR Microelectronics chuck.phillips%ftcollins.ncr.com 2001 Danfield Ct. Ft. Collins, CO. 80525 ...uunet!ncrlnk!ncr-mpd!bach!chuckp
davidm@uunet.UU.NET (David S. Masterson) (09/30/90)
In article <CHUCK.PHILLIPS.90Sep26111518@halley.FtCollins.NCR.COM> Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) writes: Continuing the "value unknown" thread... TRUE OR TRUE = TRUE TRUE OR FALSE = TRUE ergo, in three-value boolean algebra: TRUE OR UNKNOWN = TRUE ...but in relational algebra TRUE OR UNKNOWN = UNKNOWN (!) This is not what I remember seeing in Codd's book. Certain conclusions have to occur in order to maintain transitive closure. I think relational algebra goes along with boolean algebra. Unless this is some mix of three and four valued logic?!? -- ==================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mtn. View, CA 94043 ==================================================================== "If someone thinks they know what I said, then I didn't say it!"
mst@vexpert.dbai.tuwien.ac.at (Markus Stumptner) (10/01/90)
From article <CIMSHOP!DAVIDM.90Sep30022748@uunet.UU.NET>, by cimshop!davidm@uunet.UU.NET (David S. Masterson): > In article <CHUCK.PHILLIPS.90Sep26111518@halley.FtCollins.NCR.COM> > Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) writes: > > Continuing the "value unknown" thread... > [...] > ergo, in three-value boolean algebra: > > TRUE OR UNKNOWN = TRUE > > ...but in relational algebra > > TRUE OR UNKNOWN = UNKNOWN (!) > > This is not what I remember seeing in Codd's book. Certain conclusions have > to occur in order to maintain transitive closure. I think relational algebra > goes along with boolean algebra. Unless this is some mix of three and four > valued logic?!? Codd definitely uses the standard TRUE OR NULL = TRUE. The real "loss of information" problems with his approach arise from straightforward use of this rule in tautologies: (birthdate < 1-1-66) OR (birthdate >= 1-1-66) results in the two halves of the disjunction being evaluated on their own (both to NULL), and the result of the whole expression is then NULL, even though it is clear that for any known value (which NULL stands for), at least one half and thus the whole expression would evaluate to NULL. There is an article in the ACM SIGMOD RECORD (Vol. 15:4, Dec. 86) where Codd writes on this and related topics. He concludes (not wholly without justification) that this is not a very pressing problem, since building tautologies into your conditions is a rather stupid thing to do, anyway. Still, it's annoying. In that paper, Codd also writes about four-valued logic based on "unknown" and "not applicable" values (though he doesn't call them that). I have not read his book, so I don't know whether this discussion is mentioned there. Markus Stumptner mst@vexpert.dbai.tuwien.ac.at Technical University of Vienna vexpert!mst@uunet.uu.net Paniglg. 16, A-1040 Vienna, Austria ...mcsun!vexpert!mst