gus@plains.NoDak.edu (jim gustafson) (09/10/90)
To my knowledge, outjoins (left/right) are typically implemented using either a simple loop-join or else a sort-merge join. I think Oracle uses a variation on sort-merge join. Can someone provide references to outerjoin implementations? In particular, I'm looking for outerjoin variations on conventional hash join techniques such as Shapiro's hybrid-hash join [TODS 1986]. Thanks, Jim. -- Jim Gustafson gus@plains.nodak.edu uunet!plains!gus (UUCP) gus@plains (Bitnet)
pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/19/90)
On 10 Sep 90 05:32:22 GMT, gus@plains.NoDak.edu (jim gustafson) said: gus> To my knowledge, outjoins (left/right) are typically implemented using gus> either a simple loop-join or else a sort-merge join. I think Oracle gus> uses a variation on sort-merge join. Can someone provide references to gus> outerjoin implementations? Frankly I believe that these are exactly the same option as for joins. I will actually represent that outjoins are a bogosity, as are null values. It may be preferable to avoid the difficult issues posed by null values, and the definition of an outer join operation, by attaching instead to each relation (or to the underlying domains) a default value chosen by the schema designer, and/or to make an outerjoin just a join where a suitable matching default value is automatically supplied. Take the classic outer join example of a parts(partno*,description*) relation and an orders(date*,partno*,quantity,customerno) relation, where we want a report showing *all*, not just those with orders outstanding, parts and the quantity on order for each, with those not on order marked as such. Generalize to taste :-). The implementation would then really look like the same for joins. -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
davidm@uunet.UU.NET (David S. Masterson) (09/20/90)
In article <PCG.90Sep18192004@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: It may be preferable to avoid the difficult issues posed by null values, and the definition of an outer join operation, by attaching instead to each relation (or to the underlying domains) a default value chosen by the schema designer, and/or to make an outerjoin just a join where a suitable matching default value is automatically supplied. This method *may* obviate the need for null records that occur in outer joins, but it doesn't obviate the need for NULLs as attribute values. In fact, this is the method documented by Ingres for doing outer joins in Quel. Using default values, though, has always had the problem of finding a default value that will *never* be used as a live value. -- ==================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mtn. View, CA 94043 ==================================================================== "If someone thinks they know what I said, then I didn't say it!"
clh@tfic.bc.ca (Chris Hermansen) (09/20/90)
In article <CIMSHOP!DAVIDM.90Sep19100915@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: >In article <PCG.90Sep18192004@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk >(Piercarlo Grandi) writes: > > It may be preferable to avoid the difficult issues posed by null values, Yay! Someone else who hates nulls!! > and the definition of an outer join operation, by attaching instead to > each relation (or to the underlying domains) a default value chosen by > the schema designer, and/or to make an outerjoin just a join where a > suitable matching default value is automatically supplied. > >This method *may* obviate the need for null records that occur in outer joins, >but it doesn't obviate the need for NULLs as attribute values. In fact, this >is the method documented by Ingres for doing outer joins in Quel. Using >default values, though, has always had the problem of finding a default value >that will *never* be used as a live value. NULLs almost guarantee strange `gotchas', don't you think? For instance, instead of writing a statement like update foo set bar = 37 where qqsv != 0 the SQLer has to decide if update foo set bar = 37 where qqsv != 0 or qqsv is null is what s/he REALLY means. Put another way, update foo set bar = 37 where qqsv != 0 and qqsv isnt null provides no extra degree of refinement over the first update statement, even though it appears to (for the casual observer, at least). It's also really handy that select qqsv,count(*) from foo group by qqsv produces a separate line for every row in foo that has a null in qqsv, don't you think? Note that I use Informix SQL, so your lack of mileage may vary on the above Chris Hermansen Timberline Forest Inventory Consultants Voice: 1 604 733 0731 302 - 958 West 8th Avenue FAX: 1 604 733 0634 Vancouver B.C. CANADA clh@tfic.bc.ca V5Z 1E5 C'est ma facon de parler.
gordon@meaddata.com (Gordon Edwards) (09/21/90)
In article <1990Sep20.161204.13014@tfic.bc.ca>, clh@tfic.bc.ca (Chris Hermansen) writes: |> In article <CIMSHOP!DAVIDM.90Sep19100915@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: |> >In article <PCG.90Sep18192004@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk |> >(Piercarlo Grandi) writes: |> > |> > It may be preferable to avoid the difficult issues posed by null values, |> |> Yay! Someone else who hates nulls!! |> |> > and the definition of an outer join operation, by attaching instead to |> > each relation (or to the underlying domains) a default value chosen by |> > the schema designer, and/or to make an outerjoin just a join where a |> > suitable matching default value is automatically supplied. IF default values are appropriate, THEN they should be used. If no default exists or is inappropriate, then you are compromising the accuracy of your model. |> > |> >This method *may* obviate the need for null records that occur in outer joins, |> >but it doesn't obviate the need for NULLs as attribute values. In fact, this |> >is the method documented by Ingres for doing outer joins in Quel. Using |> >default values, though, has always had the problem of finding a default value |> >that will *never* be used as a live value. Absolutely, as a matter of fact, I think the NULL is completely necessary for the accurate representation of data. If, for example, you have a integer attribute called status, and you don't currently know the value, what do you assign to status? I guess you could look through the possible domain of status and then pick a value outside that domain to represent unknown, but then you have to heavily document this so maintenance people know what you did, don't you think? |> |> NULLs almost guarantee strange `gotchas', don't you think? For instance, |> instead of writing a statement like |> |> update foo set bar = 37 where qqsv != 0 |> |> the SQLer has to decide if |> |> update foo set bar = 37 where qqsv != 0 or qqsv is null |> |> is what s/he REALLY means. Put another way, |> |> update foo set bar = 37 where qqsv != 0 and qqsv isnt null |> |> provides no extra degree of refinement over the first update |> statement, even though |> it appears to (for the casual observer, at least). Well the above is really placing two distinct constraints on the update. Constraint 1: qqsv cannot have a known value of 0. Constraint 2: qqsv cannot have an unknown value. If qqsv was guaranteed to be known, you can always declare it as *not null*. Then, Constraint 2 would become unecessary. If you suggest an arbitrary value for unknowns, what would the statement be? ubdate foo set bar = 37 where qqsv != 0 and qqsv != -99999999 This is easier than using null? I don't think so. |> |> [text deleted...] |> |> Note that I use Informix SQL, so your lack of mileage may vary on the above ????? The above statement makes no sense. ????? Maybe you have difficulty undersanding the proper use of null. It means, "I don't know what this value is." I'll use an example from Date, "... a shipment record might contain a null quantity value (we know that the shipment exists, but we do not know the quantity shipped)..." If you find that you have lots of nulls, then maybe your schema is not properly normalized. Try checking for transitive dependencies, I have found that they can contribute to an excessive amount of nulls. Null is a very important semantic concept. There have been many times I would have loved to have null when writing 3GL applications (I had to rely on -999). As for additional constraints, anytime you deal with an uknown you have take extra measures to account for it. What do you think? -- Gordon S. Edwards gordon@meaddata.com Mead Data Central
lugnut@sequent.UUCP (Don Bolton) (09/21/90)
In article <1990Sep20.161204.13014@tfic.bc.ca> clh@tacitus.UUCP (Chris Hermansen) writes: >In article <CIMSHOP!DAVIDM.90Sep19100915@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: >>In article <PCG.90Sep18192004@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk >>(Piercarlo Grandi) writes: >> >> It may be preferable to avoid the difficult issues posed by null values, > >Yay! Someone else who hates nulls!! > >> and the definition of an outer join operation, by attaching instead to >> each relation (or to the underlying domains) a default value chosen by >> the schema designer, and/or to make an outerjoin just a join where a >> suitable matching default value is automatically supplied. >> >>This method *may* obviate the need for null records that occur in outer joins, >>but it doesn't obviate the need for NULLs as attribute values. In fact, this >>is the method documented by Ingres for doing outer joins in Quel. Using >>default values, though, has always had the problem of finding a default value >>that will *never* be used as a live value. > >NULLs almost guarantee strange `gotchas', don't you think? For instance, >instead of writing a statement like > > update foo set bar = 37 where qqsv != 0 > >the SQLer has to decide if > > update foo set bar = 37 where qqsv != 0 or qqsv is null > >is what s/he REALLY means. Put another way, > > update foo set bar = 37 where qqsv != 0 and qqsv isnt null > >provides no extra degree of refinement over the first update statement, even though >it appears to (for the casual observer, at least). > >It's also really handy that > > select qqsv,count(*) from foo > group by qqsv > >produces a separate line for every row in foo that has a null in qqsv, >don't you think? > ARRGGH, I hate it when that happens, have to update the nulls with the string value "NULL" run the select and then "unupdate" or run several selects and cut and paste. Either way its messy. >Note that I use Informix SQL, so your lack of mileage may vary on the above > Also with informix, a select count(*) from foo where qqsv is null will produce a SINGLE row with the count. Can you say inconsistant? I knew you could :-) "null is an unknown value, and as such each one is unique" Always thought I had an odd sense of humor...:-) The "O" companies database groups the count in either instance. althought they profess the same quote above, at least consistancy is present.
davidm@uunet.UU.NET (David S. Masterson) (09/22/90)
In article <1990Sep20.161204.13014@tfic.bc.ca> clh@tfic.bc.ca (Chris Hermansen) writes: NULLs almost guarantee strange `gotchas', don't you think? Only (IMHO) if you don't understand the concept. For instance, instead of writing a statement like update foo set bar = 37 where qqsv != 0 the SQLer has to decide if update foo set bar = 37 where qqsv != 0 or qqsv is null is what s/he REALLY means. Put another way, update foo set bar = 37 where qqsv != 0 and qqsv isnt null provides no extra degree of refinement over the first update statement, even though it appears to (for the casual observer, at least). Use of these statements depend on the definition of the data. I would hope that the "casual user" would understand the data he is dealing with. For instance, in the above statements, it would seem that qqsv might be the primary key of foo, in which case, it can never be NULL, so all 3 statements degenerate into the first. If its not the primary key, then there must be a reason for qqsv to be NULL and (most likely) a completely different reason why qqsv might be 0 which would influence the use of "IS NULL". It's also really handy that select qqsv,count(*) from foo group by qqsv produces a separate line for every row in foo that has a null in qqsv, don't you think? I don't have a database handy, but I think it also produces a separate line for every row in foo that has a 1 in qqsv, so what's your point? There's no substitute for an intelligent query. -- ==================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mtn. View, CA 94043 ==================================================================== "If someone thinks they know what I said, then I didn't say it!"
jeffl@sybase.Sybase.COM (Jeff Lichtman) (09/24/90)
> select qqsv,count(*) from foo > group by qqsv > produces a separate line for every row in foo that has a null in qqsv... It's not supposed to. This is one of those cases where nulls are supposed to act like they're equal to each other. Does Informix SQL treat each null as a separate group? BTW, I agree that nulls in SQL leave a lot to be desired. I'm not convinced, though, that default values are adequate in place of nulls. There's the problem of having to choose a default that's guaranteed never to be represented by known data values. There's the problem of accidentally using a default value in a calculation as if it were a true value (calculations with null always yield null). There's the problem of storage space - many database systems don't use any space on disk for null values, which can save a lot in sparsely populated tables. There's the problem of naive users incorrectly interpreting default values as true data. And on and on.... --- Jeff Lichtman at Sybase {mtxinu,pacbell}!sybase!jeffl -or- jeffl@sybase.com "Saints should always be judged guilty until they are proved innocent..."
pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/24/90)
On 19 Sep 90 17:09:15 GMT, cimshop!davidm@uunet.UU.NET (David S.
Masterson) said:
davidm> In article <PCG.90Sep18192004@odin.cs.aber.ac.uk>
davidm> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
pcg> It may be preferable to avoid the difficult issues posed by null
pcg> values, and the definition of an outer join operation, by attaching
pcg> instead to each relation (or to the underlying domains) a default
pcg> value chosen by the schema designer, and/or to make an outerjoin
pcg> just a join where a suitable matching default value is
pcg> automatically supplied.
davidm> This method *may* obviate the need for null records that occur
davidm> in outer joins, but it doesn't obviate the need for NULLs as
davidm> attribute values.
davidm> Using default values, though, has always had the problem of
davidm> finding a default value that will *never* be used as a live
davidm> value.
Only if you use in-band default values (I mean, default values that are
not specifically tagged as being default). Even using in-band default
values is not too bad, because in virtually all the applications where
they would be used, in practice you need to define an in-band value as
default, otherwise you get the equivalent of "dangling pointers", i.e.
referential integrity is violated.
One of the reasons for which NULLs are a bogosity is that they are in
essence unspecified value and type defaults -- this creates a lot of
special casing, because it violates domaining.
They thus introduce a flavour of pointerdom into a flat model, a model
that only manipulates values, not their absence. NULLs introduce all the
problems of references found in some OO languages, while the relational
model has not been designed for references.
davidm> In fact, this is the method documented by Ingres for doing outer
davidm> joins in Quel.
There are those who like Quel, and those who like trouble and use SQL :-).
--
Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
lugnut@sequent.UUCP (Don Bolton) (09/25/90)
In article <11034@sybase.sybase.com> jeffl@sybase.Sybase.COM (Jeff Lichtman) writes: >> select qqsv,count(*) from foo >> group by qqsv >> produces a separate line for every row in foo that has a null in qqsv... > >It's not supposed to. This is one of those cases where nulls are supposed >to act like they're equal to each other. Does Informix SQL treat each null >as a separate group? > YEP, in the above example they sure do. then select count(*) from foo where qqsv is null produces one row with the total of nulls. >BTW, I agree that nulls in SQL leave a lot to be desired. I'm not convinced, >though, that default values are adequate in place of nulls. There's the >problem of having to choose a default that's guaranteed never to be represented >by known data values. There's the problem of accidentally using a default >value in a calculation as if it were a true value (calculations with null >always yield null). >There's the problem of storage space - many database >systems don't use any space on disk for null values, which can save a lot >in sparsely populated tables. This is one that gets me about Informix (aka Datachow), I may be mistaken but I believe it pads nulls with spaces, least thats what comes out on the selects. Also the columns are all filled to their defined size regardless of the length of the data string. select max(length column) aint in the vocabulary. Handy to have and use when you've inherited a used DB that had all of the design forethought of, well the idea is there :-) >There's the problem of naive users incorrectly >interpreting default values as true data. And on and on.... Like the mail thats returned addressed to Bozo Clown @ Large Feet Inc. :-) OR TEST RECORD NOT FOR MAILING TEST RECORD DO NOT MAIL :-) Oh *naive* users.
davidm@uunet.UU.NET (David S. Masterson) (09/26/90)
In article <PCG.90Sep24152920@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: davidm> Using default values, though, has always had the problem of davidm> finding a default value that will *never* be used as a live davidm> value. Only if you use in-band default values (I mean, default values that are not specifically tagged as being default). Isn't an "out-band" default (at least part of) the definition of a NULL? Even using in-band default values is not too bad, because in virtually all the applications where they would be used, in practice you need to define an in-band value as default, otherwise you get the equivalent of "dangling pointers", i.e. referential integrity is violated. How would "in-band" defaults be recognized by constraints for the enforcement of referential integrity? One of the reasons for which NULLs are a bogosity is that they are in essence unspecified value and type defaults -- this creates a lot of special casing, because it violates domaining. They thus introduce a flavour of pointerdom into a flat model, a model that only manipulates values, not their absence. NULLs introduce all the problems of references found in some OO languages, while the relational model has not been designed for references. You'll have to explain this more, I don't catch the point yet. Three and four valued logic seem to be a part of the relational model according to Codd (see "The Relational Model for Database Management Version 2"). davidm> In fact, this is the method documented by Ingres for doing outer davidm> joins in Quel. There are those who like Quel, and those who like trouble and use SQL :-). No doubt about SQL, but there's always been a bit of trouble in the "outerjoin implementation" in Quel to the point that Ingres had documented the workaround in a special appendix. ;-) -- ==================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mtn. View, CA 94043 ==================================================================== "If someone thinks they know what I said, then I didn't say it!"
aaron@grad2.cis.upenn.edu (Aaron Watters) (09/27/90)
In article <PCG.90Sep24152920@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes: > >One of the reasons for which NULLs are a bogosity is that they are in >essence unspecified value and type defaults -- this creates a lot of >special casing, because it violates domaining. > what on earth does this mean? Nulls are certainly NOT a bogosity. As pointed out by many, they certainly DO cause havoc in the relational paradigm -- but this indicates that the relational paradigm is inadequate, NOT that nulls are ``bad.'' [Aside: there is an amazing tendancy in the database community to disallow anything that doesn't fit in to one's favorite theoretical structure -- eg, no unions in views, nulls are bad, etcetera.] Enough abstraction. Let's consider an example relation PEOPLE [NAME, AGE] [john, 24] [jane, ??] [mary, 13] Then what should we get if we select the people with age 24? TWO ANSWERS: the `outer' answer and the `inner' answer [john, 24] [john, 24] [jane, 24] Of course, two answers to a single query doesn't fit into the relational model, so this must be a bad idea, right? How about if we `group by age'? AGAIN TWO ANSWERS `outer' `inner' [john, 24] [john, 24] [jane, 24] [jane, ??] [mary, 13] [jane, 13] [mary, 13] [jane, ??] Roughly `outer' answers give `all possibilities' and `inner' answers give `only that which is certain.' If you find this intriguing at all, let me know. -aaron.
asylvain@felix.UUCP (Alvin E. Sylvain) (09/29/90)
In article <1990Sep20.161204.13014@tfic.bc.ca> clh@tacitus.UUCP (Chris Hermansen) writes: >In article <CIMSHOP!DAVIDM.90Sep19100915@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: >>In article <PCG.90Sep18192004@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk >>(Piercarlo Grandi) writes: >> It may be preferable to avoid the difficult issues posed by null values, [ ... ] >Yay! Someone else who hates nulls!! [ ... ] >NULLs almost guarantee strange `gotchas', don't you think? For instance, >instead of writing a statement like > > update foo set bar = 37 where qqsv != 0 > >the SQLer has to decide if > > update foo set bar = 37 where qqsv != 0 or qqsv is null > >is what s/he REALLY means. Put another way, > > update foo set bar = 37 where qqsv != 0 and qqsv isnt null [ ... ] In Oracle, you can say: update foo set bar = 37 where nvl (qqsv, 0) != 0 Which uses the "nvl" function produce a NULL-value substitution. It basically says (in this example), if the value is NULL, use 0 instead. So you want all rows as in your second example. I'm sure that if you check the Informix manual, you'll find something similar. I can't say if this "standard" SQL, or merely Oracle's little implementaion goodie, but it's probably the easiest, cleanest way of handling that particular potential snafu. As to strange 'gotchas', any tool in existance has them, some more than others. Take C pointers for example (Yowch!). It's just another thing the tool-user must be aware and careful of. ("Hey, Charlie, don't leave your hand there! You'll cut yer silly finger off!" ;-) ------------------------------------------------------------------------ "I got protection for my | Alvin "the Chipmunk" Sylvain affections, so swing your | Natch, nobody'd be *fool* enough to have bootie in my direction!" | *my* opinions, 'ceptin' *me*, of course! -=--=--=--"BANDWIDTH?? WE DON'T NEED NO STINKING BANDWIDTH!!"--=--=--=- -- ------------------------------------------------------------------------ "I got protection for my | Alvin "the Chipmunk" Sylvain affections, so swing your | Natch, nobody'd be *fool* enough to have bootie in my direction!" | *my* opinions, 'ceptin' *me*, of course! -=--=--=--"BANDWIDTH?? WE DON'T NEED NO STINKING BANDWIDTH!!"--=--=--=-
clh@tfic.bc.ca (Chris Hermansen) (09/29/90)
In article <CIMSHOP!DAVIDM.90Sep21104753@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: >In article <1990Sep20.161204.13014@tfic.bc.ca> clh@tfic.bc.ca >(Chris Hermansen) writes: > [my examples deleted...] > >Use of these statements depend on the definition of the data. I would hope >that the "casual user" would understand the data he is dealing with. For >instance, in the above statements, it would seem that qqsv might be the >primary key of foo, in which case, it can never be NULL, so all 3 statements >degenerate into the first. If its not the primary key, then there must be a >reason for qqsv to be NULL and (most likely) a completely different reason why >qqsv might be 0 which would influence the use of "IS NULL". Sorry, I guess I didn't make myself clear enough. It's not the fact that NULL is different than 0 (I think this is neat, as I'm sure most people do) that bothers me. It's the (IMHO) weird three-state logical semantic nausea that hangs around NULLs. What I mean by that is that NULL values occurring in arithmetic or logical expressions cause those expressions to behave differently than `normal' values. One gets into `extra' testing and such that doesn't happen if a non-NULL value is used to represent the absence of a realistic value > > It's also really handy that > > select qqsv,count(*) from foo group by qqsv > > produces a separate line for every row in foo that has a null in qqsv, > don't you think? > >I don't have a database handy, but I think it also produces a separate line >for every row in foo that has a 1 in qqsv, so what's your point? There's no >substitute for an intelligent query. No, no, no! The whole point of `group by qqsv' implies that you get ONE row for each value of qqsv; the exception here is that rows containing NULL in qqsv do not group together because neither NULL = NULL nor NULL <> NULL is true. My rather trite example could normally be used to get a count of the number of rows with different values of qqsv, but as you say, a more intelligent solution is required when there are NULLs lurking about: select qqsv,count(*) from foo where qqsv isnt null group by qqsv; select count(*) from foo where qqsv is null Just to rant and rave for a moment; suppose that a careful programmer sets up a system of forms and reports, the forms carefully ensuring that the user cannot enter NULL values into fields. Suppose that two years later, a not-so-careful programmer comes along and modifies one of the forms, inadvertently allowing NULLs to be entered. Come year end, the report programs, formerly shielded from lurking NULLs, now produce a monumental pile of garbage, and the original programmer gets dumped all over because s/he "didn't allow for the occurrence of NULLs in the database". You be the judge. Chris Hermansen Timberline Forest Inventory Consultants Voice: 1 604 733 0731 302 - 958 West 8th Avenue FAX: 1 604 733 0634 Vancouver B.C. CANADA clh@tfic.bc.ca V5Z 1E5 C'est ma facon de parler.
clh@tfic.bc.ca (Chris Hermansen) (09/29/90)
In article <1422@meaddata.meaddata.com> meaddata!gordon@uunet.uu.net writes: > >In article <1990Sep20.161204.13014@tfic.bc.ca>, clh@tfic.bc.ca (Chris >Hermansen) writes: >|> In article <CIMSHOP!DAVIDM.90Sep19100915@uunet.UU.NET> >cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: >|> >In article <PCG.90Sep18192004@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk >|> >(Piercarlo Grandi) writes: [stuff deleted regarding use of nulls, particularly in outer joins] >Absolutely, as a matter of fact, I think the NULL is completely necessary for >the accurate representation of data. If, for example, you have a integer >attribute called status, and you don't currently know the value, what >do you assign to status? I guess you could look through the possible domain >of status and then pick a value outside that domain to represent unknown, >but then you have to heavily document this so maintenance people know what >you did, don't you think? OK, I'll bite... suppose I have a variable called status. Why is NULL better than a value of "unknown" (assuming it's a character value)? I would claim that in fact "unknown" is superior, as it is at least somewhat self documenting (NULL, by definition, is NOT); furthermore, I can easily spot the "unknown" rows in any printout or report; finally, I can easily count the rows of particular status in the table with one select statement (whereas I need two with NULL values allowed, as rows with NULL values do not group together). [more stuff deleted] >Well the above is really placing two distinct constraints on the update. >Constraint 1: qqsv cannot have a known value of 0. >Constraint 2: qqsv cannot have an unknown value. > >If qqsv was guaranteed to be known, you can always declare it as *not null*. This may relate to the Informix SQL comment below. >Then, Constraint 2 would become unecessary. If you suggest an arbitrary value >for unknowns, what would the statement be? > > ubdate foo set bar = 37 where qqsv != 0 and qqsv != -99999999 > >This is easier than using null? I don't think so. I don't think it's any more difficult! Maybe this isn't the best example, but I guess it depends on why the NULLs are in qqsv in the first place. If they got there because of a low level of quality control in the data input phase, I might prefer to see the data input stuff (screen forms or whatever) cleaned up; if they got there because someone genuinely wanted to show that qqsv didn't have a value entered into it, I guess it's really a tossup as to whether one uses NULL or some other value (that could potentially have more `local' meaning to the application). As you say above, outer joins make a strong case for the existence of a "well defined" undefined value. > >|> >|> [text deleted...] >|> >|> Note that I use Informix SQL, so your lack of mileage may vary on the above > > >????? The above statement makes no sense. ????? Aww, c'mon! All I'm saying is that I'm basing my comments on the RDBMS I use, not on the ANSI standard, nor on Quel/Ingres which was mentioned earlier on. >Maybe you have difficulty undersanding the proper use of null. It means, "I >don't know what this value is." I'll use an example from Date, "... a >shipment record might contain a null quantity value (we know that the >shipment exists, but we do not know the quantity shipped)..." I'd be the first to admit that I may have difficulty understanding the use of NULL. But think about your example; why should the input screen have let someone enter the fact that a shipment was made without knowing the number of items shipped? How would you propose to send the customer a bill, if you don't know whether you sent her/him 3 SPARCStations or 3000? Now suppose that you're the boss of the shipping department, and you find out that your programmer decided to allow NULLs in the amount_shipped field by casually noting that your warehouse contained no more SPARCStations because someone on the loading dock didn't bother to fill in the waybill properly and the fast-knuckled keypuncher blithely skipped over the amount_shipped field, so not only are you going to have to call 200 customers to find out how many you sent them, but you're going to have to call head office and tell them that your order for more stock should have had more than zero items on it. And in any case, I would claim that a zero would do as well as a NULL here, since most shipping departments would hardly bother to ship zero units (let alone generate a shipping report with zero units on it). If you're doing an outer join, a better(?) example of NULLs cropping up might be orders entered but not yet shipped... > >If you find that you have lots of nulls, then maybe your schema is not >properly >normalized. Try checking for transitive dependencies, I have found that they >can contribute to an excessive amount of nulls. Naw, all my schemae are perfectly normalized :-) :-) :-) >Null is a very important semantic concept. There have been many times I would >have loved to have null when writing 3GL applications (I had to rely on -999). >As for additional constraints, anytime you deal with an uknown you have take >extra measures to account for it. > >What do you think? Well, back to the status = "unknown" vs status is null; I don't feel that there is any innate superiority of the null value here, as long as one has control over the value of the field (ie, it's not the result of an outer join). Seems to me that one is semantically using two fields in a lot of cases: qqsv and qqsv_status. Finally, if I have a table with 6,000,000 records in it, with status NULL in 5,999,999 and status = "complete" in the other, and I use ISQL to execute the following: output to printer select status,count(*) from foo group by status order by status; I will end up with 100,000 PAGES of output on my laser printer. On the other hand, if those 5,999,999 records have status = "unknown", I'll get one page. Again, I (feel I) should emphasize that this particular behaviour may not occur in ALL SQLs, or even be predicted by the ANSI standard. Chris Hermansen Timberline Forest Inventory Consultants Voice: 1 604 733 0731 302 - 958 West 8th Avenue FAX: 1 604 733 0634 Vancouver B.C. CANADA clh@tfic.bc.ca V5Z 1E5 C'est ma facon de parler.
clh@tfic.bc.ca (Chris Hermansen) (09/29/90)
In article <11034@sybase.sybase.com> jeffl@sybase.Sybase.COM (Jeff Lichtman) writes: >> select qqsv,count(*) from foo >> group by qqsv >> produces a separate line for every row in foo that has a null in qqsv... > >It's not supposed to. This is one of those cases where nulls are supposed >to act like they're equal to each other. Does Informix SQL treat each null >as a separate group? Yeah, it does. That's why I attached the disclaimer - I had no idea what the standard said, let alone what Oracle, Ingres, you guys, etc etc did. > >BTW, I agree that nulls in SQL leave a lot to be desired. I'm not convinced, >though, that default values are adequate in place of nulls. There's the >problem of having to choose a default that's guaranteed never to be represented >by known data values. There's the problem of accidentally using a default >value in a calculation as if it were a true value (calculations with null >always yield null). There's the problem of storage space - many database >systems don't use any space on disk for null values, which can save a lot >in sparsely populated tables. There's the problem of naive users incorrectly >interpreting default values as true data. And on and on.... Well, at the risk of belaboring the point, I think that whenever the implied semantics of a field suggest a default value, then it's a GOOD THING to use one. For example, a field named widget_count is never (realistically) going to have a value less than zero, so if you need an "undefined" value for it, use -1, or -99999999, or some other nice number. Similarly, a widget_status value of "unknown" tells me more than NULL - with NULL, I can't decide whether the status is genuinely unknown, or whether the system just doesn't know it. So I don't think that defaults are necessarily adequate (in place of NULLs), but I don't necessarily agree that NULLs tell you any more than default values either (except in some cases like outer join). Chris Hermansen Timberline Forest Inventory Consultants Voice: 1 604 733 0731 302 - 958 West 8th Avenue FAX: 1 604 733 0634 Vancouver B.C. CANADA clh@tfic.bc.ca V5Z 1E5 C'est ma facon de parler.
davidm@uunet.UU.NET (David S. Masterson) (09/30/90)
In article <30203@netnews.upenn.edu> aaron@grad2.cis.upenn.edu (Aaron Watters) writes: >Nulls are certainly NOT a bogosity. As pointed out by many, they certainly >DO cause havoc in the relational paradigm -- but this indicates that >the relational paradigm is inadequate, NOT that nulls are ``bad.'' No, it indicates that current relational implementations are not complete, not that NULLs don't fit the paradigm. >PEOPLE [NAME, AGE] > [john, 24] > [jane, ??] > [mary, 13] > >Then what should we get if we select the people with age 24? > >TWO ANSWERS: >the `outer' answer and the `inner' answer > [john, 24] [john, 24] > [jane, 24] > >[similar grouping problem deleted] > >Roughly `outer' answers give `all possibilities' and `inner' >answers give `only that which is certain.' If you find this >intriguing at all, let me know. -aaron. This is better known as 3-valued logic which is a part of the relational model (as is 4-valued logic). Both answers (to both problems) are syntactically correct, the true answer is determined by the semantics of the problem (which you touched on at the end). To paraphrase Codd's terminology, is the first question: all people who definitely have an age = 24 or all people who MAYBE have an age = 24 ? BTW, do current OODBs treat 3 or 4 valued logic? I imagine they could, but do current implementations even address this? -- ==================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mtn. View, CA 94043 ==================================================================== "If someone thinks they know what I said, then I didn't say it!"
davidm@uunet.UU.NET (David S. Masterson) (09/30/90)
In article <1990Sep28.225442.4995@tfic.bc.ca> clh@tfic.bc.ca (Chris Hermansen) writes: In article <1422@meaddata.meaddata.com> meaddata!gordon@uunet.uu.net writes: >Absolutely, as a matter of fact, I think the NULL is completely necessary >for the accurate representation of data. If, for example, you have a >integer attribute called status, and you don't currently know the value, >what do you assign to status? I guess you could look through the possible >domain of status and then pick a value outside that domain to represent >unknown, but then you have to heavily document this so maintenance people >know what you did, don't you think? OK, I'll bite... suppose I have a variable called status. Why is NULL better than a value of "unknown" (assuming it's a character value)? I would claim that in fact "unknown" is superior, as it is at least somewhat self documenting (NULL, by definition, is NOT); furthermore, I can easily spot the "unknown" rows in any printout or report; finally, I can easily count the rows of particular status in the table with one select statement (whereas I need two with NULL values allowed, as rows with NULL values do not group together). Wait a sec. Think about that a little. NULL, by definition, is an "unknown" value, so it is self-documenting. Going back to the example of an unknown integer (where 3-valued logic plays an even more important role) in the role of quantity shipped (just hasn't been entered yet), what would you do with a question like "What is the sum of the known amounts shipped?". With the database understanding what a NULL is, the answer can easily be arrived at with just "SELECT SUM(qty) FROM table" with no special checking for -999. I'd be the first to admit that I may have difficulty understanding the use of NULL. But think about your example; why should the input screen have let someone enter the fact that a shipment was made without knowing the number of items shipped? How would you propose to send the customer a bill, if you don't know whether you sent her/him 3 SPARCStations or 3000? Now suppose that you're the boss of the shipping department, and you find out that your programmer decided to allow NULLs in the amount_shipped field by casually noting that your warehouse contained no more SPARCStations because someone on the loading dock didn't bother to fill in the waybill properly and the fast-knuckled keypuncher blithely skipped over the amount_shipped field, so not only are you going to have to call 200 customers to find out how many you sent them, but you're going to have to call head office and tell them that your order for more stock should have had more than zero items on it. In some places, this can (and is expected to) happen on a temporary basis. Usually, though, there are physical (ie. non-computer) checks to make sure that this is only a temporary phenomena. In such organizations, its better to allow them to enter "No Value" than to force them to reinvent the way they do business. And in any case, I would claim that a zero would do as well as a NULL here, since most shipping departments would hardly bother to ship zero units (let alone generate a shipping report with zero units on it). Placing a zero here would confuse Accounting, though, as it would look like a fulfilled shipment. Point being, zero (or any "in-phase" data) has other connotations that programs outside the shipping department might treat differently. Everyone, though, would view a NULL as a NULL. Also, don't forget to factor in the idea of constraints into the use of NULLs. With the single value, all referential constraints remain in balance ("I am NULL, therefore I don't refer to anything"), so there is no special processing for semantics ("I'm a status value, therefore if I am 'unknown', I don't refer to the list of known, possible statuses"). BTW, constraints are the way you should ensure that the wrong value is never entered (as in "NOT NULL"). -- ==================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mtn. View, CA 94043 ==================================================================== "If someone thinks they know what I said, then I didn't say it!"
aland@informix.com (alan denney) (10/01/90)
In article <42743@sequent.UUCP> lugnut@sequent.UUCP (Don Bolton) writes: |In article <1990Sep20.161204.13014@tfic.bc.ca> clh@tacitus.UUCP (Chris Hermansen) writes: |>In article <CIMSHOP!DAVIDM.90Sep19100915@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: |>>In article <PCG.90Sep18192004@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk |>>(Piercarlo Grandi) writes: |> |>NULLs almost guarantee strange `gotchas', don't you think? For instance, |>instead of writing a statement like |> update foo set bar = 37 where qqsv != 0 |>the SQLer has to decide if |> update foo set bar = 37 where qqsv != 0 or qqsv is null |>is what s/he REALLY means. Put another way, |> update foo set bar = 37 where qqsv != 0 and qqsv isnt null |>provides no extra degree of refinement over the first update statement, |> even though it appears to (for the casual observer, at least). This has to do with the very definition of NULL -- it requires that NULL values fail any "not equal" filter. Since a NULL is by definition an "unknown" value, then you don't know that it's not 0, right? In any case, you can avoid the whole issue by disallowing NULLs in the column(s) in question (by creating the column with NOT NULL). |>It's also really handy that |> |> select qqsv,count(*) from foo |> group by qqsv |> |>produces a separate line for every row in foo that has a null in qqsv, |>don't you think? |> |ARRGGH, I hate it when that happens, have to update the nulls with the |string value "NULL" run the select and then "unupdate" or run several |selects and cut and paste. Either way its messy. There was quite a bit of theoretical debate about which method was "proper". Now that ANSI has spoken, Informix engines have been changed to group NULL values together (the way you apparently prefer). This change is effective in the current release - 4.0. |Also with informix, a select count(*) from foo where qqsv is null |will produce a SINGLE row with the count. Can you say inconsistant? |I knew you could :-) Uh, can you say "pilot error"? [-: How can you call that inconsistent? The response you complain about is exactly what you asked for: how many rows have NULL for qqsv. No GROUPing is involved. -- Alan S. Denney # Informix # aland@informix.com # {pyramid|uunet}!infmx!aland "The driver says, 'One more cup of coffee and I'll be all right...' 'Pop a Bennie, another Bennie'..." - The Bobs, "Bus Plunge"
aaron@grad2.cis.upenn.edu (Aaron Watters) (10/01/90)
[Chris Hermanson argues that nulls are bad...] In article <1990Sep28.231042.5117@tfic.bc.ca> clh@tacitus.UUCP (Chris Hermansen) writes: >For example, a field named widget_count is never (realistically) going to >have a value less than zero, so if you need an "undefined" value for it, use >-1, or -99999999, or some other nice number. Similarly, a widget_status >value of "unknown" tells me more than NULL - with NULL, I can't decide whether >the status is genuinely unknown, or whether the system just doesn't know it. While admitting that this last point may be too deep for me, let's once again consider an example. What if we have two stockrooms each of which contains a number of widgets. Room A has 10 widgets and noone knows how many widgets are in room B. The total number of widgets in both rooms is therefore either 9 or -99999989, of course. That was a joke, I presume you realize, but how does one get around this summation anomaly? It seems you need to have something analogous to the following: If all widgetcounts are positive then X=sum(widgetcount) otherwise X=-1. Furthermore, if you have a large number of conventions like this more complex queries (especially if you have complex views) can become quite baroque. If you use actual null values, however, the queries are still somewhat baroque, but easier to understand. A personal interest of mine is the question of whether the programmer can ignore the possibility of nulls entirely and have the system handle them in a logically uniform way -- which, I argue is not currently done automatically. -aaron.
aaron@grad2.cis.upenn.edu (Aaron Watters) (10/01/90)
In response to my claim that difficulties with nulls show that... >>the relational paradigm is inadequate, NOT that nulls are ``bad.'' In article <CIMSHOP!DAVIDM.90Sep30013925@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: > >No, it indicates that current relational implementations are not complete, not >that NULLs don't fit the paradigm. ... >This is better known as 3-valued logic which is a part of the relational model >(as is 4-valued logic). Both answers (to both problems) are syntactically >correct, the true answer is determined by the semantics of the problem (which >you touched on at the end). To paraphrase Codd's terminology, is the first >question: > > all people who definitely have an age = 24 > or > all people who MAYBE have an age = 24 ? > First, why do you say these are not different aspects to the correct answer to a single query? Imagine, if you will, that I pose a very complex query built up from complex views whose definitions I do not know. Suppose I want to know those tuples known to satisfy the query and those not-known to not-satisfy the query. Do I have to pose two queries? Do each of the views I use have to be defined twice, once for definiteness and once for maybeness? Why? Roughly, I would argue that it's okay if the system allows the user to ask for maybes and definites, but if s/he doesn't specify s/he should get both. Second, note that according to the `semantic completeness' metric no reasonable database system will fill the bill -- because the problem is NP-complete for a fixed query and varying data. (Teaser.) I disagree that the `relational paradigm' includes `three valued logic' any more sensibly than Pascal `includes Prolog.' (IE, you can certainly implement one in the other, but that's beside the point.) -aaron.
gordon@meaddata.com (Gordon Edwards) (10/01/90)
In article <1990Sep28.225442.4995@tfic.bc.ca>, clh@tfic.bc.ca (Chris Hermansen) writes: |> In article <1422@meaddata.meaddata.com> meaddata!gordon@uunet.uu.net writes: |> [lots of stuff deleted. -gordon] |> in the table with one select statement (whereas I need two with NULL values allowed, |> as rows with NULL values do not group together). Really? This must be a quirk of Informix. I have used Informix a little, but most of my time has been spent in Sybase and Ingres, both of which group NULLs. |> |> [more stuff deleted] |> |> >Well the above is really placing two distinct constraints on the update. |> >Constraint 1: qqsv cannot have a known value of 0. |> >Constraint 2: qqsv cannot have an unknown value. |> > |> >If qqsv was guaranteed to be known, you can always declare it as *not null*. |> |> This may relate to the Informix SQL comment below. |> |> >Then, Constraint 2 would become unecessary. If you suggest an arbitrary value |> >for unknowns, what would the statement be? |> > |> > ubdate foo set bar = 37 where qqsv != 0 and qqsv != -99999999 |> > |> >This is easier than using null? I don't think so. |> |> I don't think it's any more difficult! Maybe this isn't the best example, but |> I guess it depends on why the NULLs are in qqsv in the first place. If they |> got there because of a low level of quality control in the data input phase, |> I might prefer to see the data input stuff (screen forms or whatever) cleaned |> up; if they got there because someone genuinely wanted to show that qqsv didn't |> have a value entered into it, I guess it's really a tossup as to whether one |> uses NULL or some other value (that could potentially have more `local' meaning |> to the application). As the DBA, it might not be possible to control applications. After the system is running for two months, a manager might want to interface a spreadsheet to the DBMS. At that point, unless you have constraints in your DBMS, you have no control. |> |> As you say above, outer joins make a strong case for the existence of a |> "well defined" undefined value. |> |> > |> >|> |> >|> [text deleted...] |> >|> |> >|> Note that I use Informix SQL, so your lack of mileage may vary on the above |> > |> > |> >????? The above statement makes no sense. ????? |> |> Aww, c'mon! All I'm saying is that I'm basing my comments on the RDBMS I use, |> not on the ANSI standard, nor on Quel/Ingres which was mentioned earlier on. |> OK. It sort of sounded like, "Informix is great and you other losers..." I just wanted to double check. |> >Maybe you have difficulty undersanding the proper use of null. It means, "I |> >don't know what this value is." I'll use an example from Date, "... a |> >shipment record might contain a null quantity value (we know that the |> >shipment exists, but we do not know the quantity shipped)..." Previous comment by me was a bit unfair, I apologize. |> |> I'd be the first to admit that I may have difficulty understanding the use of |> NULL. But think about your example; why should the input screen have let |> someone enter the fact that a shipment was made without knowing the number |> of items shipped? How would you propose to send the customer a bill, if |> you don't know whether you sent her/him 3 SPARCStations or 3000? Now suppose |> that you're the boss of the shipping department, and you find out that your |> programmer decided to allow NULLs in the amount_shipped field by casually |> noting that your warehouse contained no more SPARCStations because someone |> on the loading dock didn't bother to fill in the waybill properly and the |> fast-knuckled keypuncher blithely skipped over the amount_shipped field, |> so not only are you going to have to call 200 customers to find out how |> many you sent them, but you're going to have to call head office and |> tell them that your order for more stock should have had more than zero |> items on it. OK, so Date's example sucks. :-) Once again, as the DBA, you might not have control over front-end constraint enforcement. In your example above, I would have a number of checks to catch such situations (eg. end of day routines to catch NULL shipment entries, dock verification of shipment quantities, etc). |> |> And in any case, I would claim that a zero would do as well as a NULL here, since |> most shipping departments would hardly bother to ship zero units (let alone |> generate a shipping report with zero units on it). |> |> |> If you're doing an outer join, a better(?) example of NULLs cropping up might be |> orders entered but not yet shipped... |> |> > |> >If you find that you have lots of nulls, then maybe your schema is not |> >properly |> >normalized. Try checking for transitive dependencies, I have found that they |> >can contribute to an excessive amount of nulls. |> |> Naw, all my schemae are perfectly normalized :-) :-) :-) Isn't everyones? 8-0 |> |> >Null is a very important semantic concept. There have been many times I would |> >have loved to have null when writing 3GL applications (I had to rely on -999). |> >As for additional constraints, anytime you deal with an uknown you have take |> >extra measures to account for it. |> > |> >What do you think? |> |> Well, back to the status = "unknown" vs status is null; I don't feel that there |> is any innate superiority of the null value here, as long as one has control over |> the value of the field (ie, it's not the result of an outer join). Seems to me |> that one is semantically using two fields in a lot of cases: qqsv and qqsv_status. |> |> Finally, if I have a table with 6,000,000 records in it, with status NULL in 5,999,999 |> and status = "complete" in the other, and I use ISQL to execute the following: |> |> output to printer |> select status,count(*) from foo |> group by status |> order by status; |> |> I will end up with 100,000 PAGES of output on my laser printer. On the other hand, |> if those 5,999,999 records have status = "unknown", I'll get one page. |> |> Again, I (feel I) should emphasize that this particular behaviour may not occur in |> ALL SQLs, or even be predicted by the ANSI standard. |> You previous example may indeed be specific to Informix, and if this were true globally, I would put unknown in also. :-) I still maintain NULLs are important, however, in light of past conversations, I think I should add some remarks. 1. As I said before, if defaults are appropriate, use them. 2. I agree with those who say NULLs are not perfect. 3. If anything, specific UNKNOWN and INAPPROPRIATE values should be added. I thought your previous post recommended that the concept of NULL was wrong. If you are arguing that NULL needs to be more specific, then I probably agree with you. Sorry, for any misunderstandings. -- Gordon (gordon@meaddata.com)
clh@tfic.bc.ca (Chris Hermansen) (10/03/90)
In article <30354@netnews.upenn.edu> aaron@grad2.cis.upenn.edu.UUCP (Aaron Watters) writes: >[Chris Hermanson argues that nulls are bad...] Geeze, ^^^^^^^^^ it's HermansEn, ok? :-) :-) :-) >In article <1990Sep28.231042.5117@tfic.bc.ca> clh@tacitus.UUCP (Chris Hermansen) writes: >>For example, a field named widget_count is never (realistically) going to >>have a value less than zero, so if you need an "undefined" value for it, use >>-1, or -99999999, or some other nice number. Similarly, a widget_status >>value of "unknown" tells me more than NULL - with NULL, I can't decide whether >>the status is genuinely unknown, or whether the system just doesn't know it. > >While admitting that this last point may be too deep for me, let's >once again consider an example. What if we have two stockrooms >each of which contains a number of widgets. Room A has 10 widgets >and noone knows how many widgets are in room B. The total number of >widgets in both rooms is therefore either 9 or -99999989, of course. That's definitely a problem. One solution is to make the widget count NULL in room B, but then you have either 10 (I think this is the `standard') or NULL (ie undefined; I'm pretty sure this is not the `standard') for the sum. Putting myself in the shoes of the person writing the query, I would say that none of the above answers is very enlightening, unless all I really want to know is if we have five or six (but not eleven!) widgets. My argument was not really that NULLs are pro forma bad, just that I personally don't like the three-state logic that goes with them (and other out-of-band data values) in most cases. I guess I also started to flame about undefined values in general. At the risk of overstating my case, think of the manager who discovers that only ten widgets are available (at $50,000 each) when s/he needs twenty; s/he orders ten more from his/her supplier, ships the ten s/he has, indicates a back-order for his/her customer, and the next time s/he checks the database, finds out there were thirty more in room B. Meanwhile, his/her customer is p*ssed off at the back order and re-orders the other ten from the competition. OK, so that's a bit of a joke, too; my point is that MANY (not ALL) applications shouldn't admit a default value for fields, because the implied semantics of a field (eg widget_count) tend to suggest to the user that there will always be a `reasonable'. David Masterson(?sp) suggested that it might be better to allow defaults (eg NULLs) at data entry time (since, presumably, there are many cases where the data entry person may be unable to supply the missing value, and may even do something weird like enter 5 just to complete the form). I would claim that the best thing to do is bounce the form back to the originator for fixing. >That was a joke, I presume you realize, but how does one get around >this summation anomaly? It seems you need to have something analogous >to the following: > If all widgetcounts are positive then X=sum(widgetcount) > otherwise X=-1. >Furthermore, if you have a large number of conventions like this >more complex queries (especially if you have complex views) can >become quite baroque. If you use actual null values, however, >the queries are still somewhat baroque, but easier to understand. > >A personal interest of mine is the question of whether the >programmer can ignore the possibility of nulls entirely and have >the system handle them in a logically uniform way -- which, I >argue is not currently done automatically. -aaron. I would agree; I think your If... is the best solution (ie, if there are NULLs or other out-of-band values in the set, then the sum, average, etc should probably be out-of-band as well). I don't think this is the way the standard reads; I'm also fairly sure that select sum(widget_count) from fooey and select avg(widget_count) from fooey will behave as though the rows with NULLs in widget count were not really in the database. Certainly this is logical; whether you can therefore ignore the presence of NULLs is really up to you :-) Chris Hermansen Timberline Forest Inventory Consultants Voice: 1 604 733 0731 302 - 958 West 8th Avenue FAX: 1 604 733 0634 Vancouver B.C. CANADA clh@tfic.bc.ca V5Z 1E5 C'est ma facon de parler.
davidm@uunet.UU.NET (David S. Masterson) (10/03/90)
In article <30357@netnews.upenn.edu> aaron@grad2.cis.upenn.edu (Aaron Watters) writes: In article <CIMSHOP!DAVIDM.90Sep30013925@uunet.UU.NET> (I) write: > > all people who definitely have an age = 24 > or > all people who MAYBE have an age = 24 ? > First, why do you say these are not different aspects to the correct answer to a single query? First, I'm not sure I understand your statement. Second, the answer to the first query is a subset of the answer to the second query. These two queries have different meanings where it just so happens that their results are in the same domain. If the second query had been "all people who definitely have an age = 36", then they would have been recognized as two different queries. The same is true of the above queries. Imagine, if you will, that I pose a very complex query built up from complex views whose definitions I do not know. Then, as far as you (or the model) are concerned, the views can (and should) be treated as basic relations. Suppose I want to know those tuples known to satisfy the query and those not-known to not-satisfy the query. The double negative here is throwing me. Do I have to pose two queries? Perhaps, depending on what the double negative means. On the other hand, MAYBE qualification results in a superset of definite answer. Do each of the views I use have to be defined twice, once for definiteness and once for maybeness? Why? The VIEW construction is done with primitive operations (like those at the beginning of the message). If you want values in the view that are not precise, then use the MAYBE qualifier, but that will still include the definite subset. Roughly, I would argue that it's okay if the system allows the user to ask for maybes and definites, but if s/he doesn't specify s/he should get both. Precisely. Second, note that according to the `semantic completeness' metric no reasonable database system will fill the bill -- because the problem is NP-complete for a fixed query and varying data. (Teaser.) You seem to have been making the assumption that definitiveness and maybeness are distinct sets. I contend that maybeness is a superset of definitiveness. What does that do to the NP-complete problem (I never really understood NP terminology). I disagree that the `relational paradigm' includes `three valued logic' any more sensibly than Pascal `includes Prolog.' (IE, you can certainly implement one in the other, but that's beside the point.) Set theory, by definition, must contend with NULL (especially where domains are concerned) and the NULL-set. If operations are defined upon sets (as the relational model does), then those operations must be consistent where NULL is concerned. Three-valued logic is the expression of how to specify the results of standard operators with "unknown" (NULL) operands. Therefore, three-valued logic should be a part of the definition of operations against sets. So, to be consistent, the relational 'paradigm' should deal with NULLs completely and consistently. Having said that, I must say that I am speaking only of the model and not any one particular implementation. No implementation (to my knowledge) is yet complete where NULLs are concerned (can they, therefore, be called "completely relational"?). -- ==================================================================== David Masterson Consilium, Inc. uunet!cimshop!davidm Mtn. View, CA 94043 ==================================================================== "If someone thinks they know what I said, then I didn't say it!"
asylvain@felix.UUCP (Alvin E. Sylvain) (10/05/90)
In article <CIMSHOP!DAVIDM.90Sep30013925@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: [ ... ] >This is better known as 3-valued logic which is a part of the relational model >(as is 4-valued logic). Both answers (to both problems) are syntactically >correct, the true answer is determined by the semantics of the problem (which >you touched on at the end). [ ... ] >BTW, do current OODBs treat 3 or 4 valued logic? I imagine they could, but do >current implementations even address this? Whoa, wait a minute! Can you explain to this meager humble dum-dum what is 4 valued logic? If this is explained much deeply in some periodic list of FAQ that I failed to locate, please to forgive! -- =======================Standard Disclaimers Apply======================= "We're sorry, but the reality you have dialed is no | Alvin longer in service. Please check the value of pi, | "the Chipmunk" or pray to your local diety for assistance." | Sylvain
lugnut@sequent.UUCP (Don Bolton) (10/06/90)
In article <1990Oct1.070456.17087@informix.com> aland@informix.com (alan denney) writes: >In article <42743@sequent.UUCP> lugnut@sequent.UUCP (Don Bolton) writes: >|In article <1990Sep20.161204.13014@tfic.bc.ca> clh@tacitus.UUCP (Chris Hermansen) writes: >|>In article <CIMSHOP!DAVIDM.90Sep19100915@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: >|>>In article <PCG.90Sep18192004@odin.cs.aber.ac.uk> pcg@cs.aber.ac.uk >|>>(Piercarlo Grandi) writes: >|> >|>NULLs almost guarantee strange `gotchas', don't you think? For instance, A whole lot of stuff deleted. >This has to do with the very definition of NULL -- it requires that >NULL values fail any "not equal" filter. Since a NULL is by definition >an "unknown" value, then you don't know that it's not 0, right? > >In any case, you can avoid the whole issue by disallowing NULLs in the >column(s) in question (by creating the column with NOT NULL). THis works great in many applications, however a leads tracking system or something that has many possibilities for empty columns like that is still a problem. NOT NULL is sometimes NOT FEASABLE :-) >|>It's also really handy that >|> >|> select qqsv,count(*) from foo >|> group by qqsv >|> >|>produces a separate line for every row in foo that has a null in qqsv, >|>don't you think? >|> >|ARRGGH, I hate it when that happens, have to update the nulls with the >|string value "NULL" run the select and then "unupdate" or run several >|selects and cut and paste. Either way its messy. > >There was quite a bit of theoretical debate about which method was >"proper". Now that ANSI has spoken, Informix engines have been changed >to group NULL values together (the way you apparently prefer). This >change is effective in the current release - 4.0. > And there was *much* rejoicing. >|Also with informix, a select count(*) from foo where qqsv is null >|will produce a SINGLE row with the count. Can you say inconsistant? >|I knew you could :-) > >Uh, can you say "pilot error"? [-: > >How can you call that inconsistent? The response you complain about >is exactly what you asked for: how many rows have NULL for qqsv. >No GROUPing is involved. > Neither was any grouping involved in the other example where nulls were concerned. :-) But you see the result from the select count(*) from where null example was what I had been raised to expect on the group by result. To my primal training they BOTH should have returned a SINGLE row with the count. anyways its moot since its now fixed in the curent release.
aaron@grad2.cis.upenn.edu (Aaron Watters) (10/08/90)
One variety of 4 valued logic is unknown/true/false/error eg. `bush is president' is true. `dukakis is president' is false. `all space aliens are green' is unknown. `the king of North America is bald' is error. unknown means `this may be true or false or error, but we don't know which.' error means `this can't have any truth value' -- in the example above because any truth value requires the existence of someone who (last I heard) doesn't exist. -aaron
davidm@uunet.UU.NET (David S. Masterson) (10/08/90)
In article <152008@felix.UUCP> asylvain@felix.UUCP (Alvin E. Sylvain) writes:
Whoa, wait a minute! Can you explain to this meager humble dum-dum
what is 4 valued logic? If this is explained much deeply in some
periodic list of FAQ that I failed to locate, please to forgive!
Hehe, yea, it threw me for awhile when I first saw it in Codd's new book (its
still a bit confusing, but I really haven't studied it). Basically, as I
understand it, it breaks down like this:
2-valued logic: defines the results of applying relational operators to TRUE
or FALSE operands.
3-valued logic: same as 2-valued logic except operands might be TRUE, FALSE,
or UNKNOWN.
4-valued logic: same as 3-valued logic except operands might be TRUE, FALSE,
NOT-APPLICABLE, UNKNOWN-BUT-APPLICABLE (forget if that's the right
terminology).
Check Codd's book, "The Relational Model for Database Management Version 2"
for more in-depth reasoning on 4-valued logic's applicability to relational
databases.
--
====================================================================
David Masterson Consilium, Inc.
uunet!cimshop!davidm Mtn. View, CA 94043
====================================================================
"If someone thinks they know what I said, then I didn't say it!"
Chuck.Phillips@FtCollins.NCR.COM (Chuck.Phillips) (10/08/90)
Just a nit:
>>>>> On 7 Oct 90 20:23:37 GMT, aaron@grad2.cis.upenn.edu (Aaron Watters) said:
Aaron> unknown means `this may be true or false or error, but we don't
Aaron> know which.'
Aaron> error means `this can't have any truth value'
So far so good.
Aaron> -- in the example above ["the king of North America is bald"]
Aaron> because any truth value requires the existence of someone who
Aaron> (last I heard) doesn't exist.
Clairification: A statement requiring the existance of the non-existant to
be true, is simply false, not an error. However, a domain violation _is_
an error. (e.g. "I am green years old.")
If "king" is an out-of-band attribute, then the statement is an error; if
"king" is within the defined domain, then the statement is simply false.
Hope this helps,
Chuck Phillips MS440
NCR Microelectronics chuck.phillips%ftcollins.ncr.com
2001 Danfield Ct.
Ft. Collins, CO. 80525 ...uunet!ncrlnk!ncr-mpd!bach!chuckp
--
Chuck Phillips MS440
NCR Microelectronics chuck.phillips%ftcollins.ncr.com
2001 Danfield Ct.
Ft. Collins, CO. 80525 ...uunet!ncrlnk!ncr-mpd!bach!chuckp