in

The ORM Foundation

Get the facts!

To Null or not to Null. That is the question!

Last post 05-25-2011 13:48 by hughdarwen. 20 replies.
Page 1 of 2 (21 items) 1 2 Next >
Sort Posts: Previous Next
  • 05-21-2011 1:44

    To Null or not to Null. That is the question!

    Dear Ken,

    You have my deepest apologies for using language that might be construed, if mistakenly, as a put-down of Chris Date. I myself am a big fan of Mr. Date, whom I regard in precisely the same way as your Amazon reviewer does, and I meant no disrespect in calling him a "religious zealot" -- which, for me, is a term lacking any negative connotation. (I will, myself, admit quite readily to being a religious zealot, who if anything, could probably use a bit more zeal.)

       My terminology was not about Mr. Date so much, as about this "nulls" issue -- and all I meant by it was, that people seem to come at it with some deeply-held presuppositions -- or perhaps a mere blind spot(s) -- that seem to determine or colour their thinking on the issue. For example, I hope you'll allow me to critique Hugh Darwen's argument, the one that you quoted.

       Why does Mr. Darwen think that if you allow a NULL column in a database, you're ipso facto allowing propositions to be recorded that have an indeterminate truth value? A very little bit of analysis will allow us to see that this is far from being true.

       As Clifford has just implied, the necessity of "missing values" in a relational database comes from, and only from, the mapping of more thant one fact type to the same table, at least one of which is non-mandatory for that type of object about which the table records facts. (Which is something that ORM's Rmap procedure prescribes.) That means that the NULL "value" is not representing a proposition, or part of one, at all, but rather, precisely the LACK of any proposition, of that type, about the object being discussed! This is in fact THE BASIC ISSUE, which Mr. Darwen overlooks: what do you do about that missing proposition?? (Put a NULL there? Put a Default Value there?) So, as much as I may respect Hugh Darwen, his argument is based on complete lack of comprehension of why you'd have the NULL there in the first instance.

       Hope this helps. Now, I'm enjoying this discussion (I must admit), but I'm just wondering whether, if Ken feels it should be continued, it might not deserve its own thread. Just a thought,

    Cheers,

    Andy

  • 05-21-2011 8:42 In reply to

    Re: To Null or not to Null. That is the question!

     Hi Andy,

    Yes I agree with you about splitting the thread so here it is.
    I hope you like the title. I was tempted to use "Nulls pour les Nuls" but then I felt that the Shakespearian option would be more informative.

    Nulls are indeed a very important question which has hitherto not had the attention it deserves in this forum. So lets get at it!
    I feel its best to take this matter in small chunks so this post does not address all of the points that I want to make. It is just a start. 

    Let me take your arguments one by one. I propose that we try use "proper reasoning" to get at the truth of this matter. ( I say "try" because whilst I'm convinced that "proper reasoning" is very useful, I don't claim to be the world's expert on the subject - just an avid student.)

    Reference: I have found a good "easy introduction" to reasoning to be Anthony Weston's 2009 book "A Rulebook for Arguments - Fourth Edition". "This is the ultimate "how-to" book for anyone who wants to use reasons and evidence in support of conclusions" Quote from Debra Nails. 

    So let me try analyse your arguments from this perspective:

    Your argument seems to be:
    P1: People perceive the concept of nulls using some deeply held presuppositions.
    P2: People mis-perceive the concept of nulls because they have blind spots.
    C1: The presuppositions and blind spots determine their thinking about nulls.

    Analysis:
    P1:  For P1 to be false, it would have to be the case that the "people" concerned would not hold any presuppositions.
    Since (by inference) you seem to be talking about "The set of People who already know about nulls", it seems to me that P1 is always true.
    Therefore it is not a valid premise.

    P2: Here you seem to be inferring that you can see something that "they" (the religeous zealot's?) cannot.
    I think it would be helpful if you were more specific about the nature of this "blind spot". What is it that "they" cannot see?  

    ------------------------

    Andy Carver:
    Why does Mr. Darwen think that if you allow a NULL column in a database, you're ipso facto allowing propositions to be recorded that have an indeterminate truth value? A very little bit of analysis will allow us to see that this is far from being true.

    Firstly this argument is not about a null column it is about allowing null values in a tuple.

    So here is my argument:
    P1: To be valid, a proposition must evaluate to true or false.
    P2: A relational database comprises a collection of propositions (as tuples) that are asserted to be true.
    C1: Tuples that contain nulls are not valid tuples because their meaning is indeterminate. (neither true nor false.)

    One reason that nulls are bad is because they may affect the results of queries without the reader of the query output being aware of the problem.
    For example, imagine a database which contains the proposition "Employee(.nr) has Salary(.amount)"
    Then imagine that 25% of the salary values were nulls.
    This would distort the calculation of "average salary" which I think would be the kind of "bad thing" for which a DBA or a CIO could loose their jobs.

     

    So I'm looking forward to seeing your argument with the conclusion "Thus, nulls are good for you."

    Ken

    Filed under:
  • 05-21-2011 10:52 In reply to

    Re: To Null or not to Null. That is the question!

    Dear Ken,

    Thanks for promoting this discussion to its own thread. I think that will give it the space it needs to breathe. Also, though, I hope other people will feel free to join in the discussion.

        I also like your proposal to take things in small chunks. This will allow us to get to the truth more easily. And btw, the Shakespearean title meets with my full approval :-)

       Now, I want to take exception a bit, to your first analysis of my "argument". That exception is, that I was not intending any (positive) argument at all: I was not intending to offer any defense for my position. On the contrary, I was only trying to find fault with Mr. Darwen's argument. So if it's all the same with you, I will now bypass your critique of my supposed "argument", and get to your own, explicitly advertised "argument" (where I insert my responses in-line, like

    >> this):

       You wrote:

    Firstly this argument is not about a null column it is about allowing null values in a tuple.

    >> Ken, I entirely agree.

    So here is my argument:
    P1: To be valid, a proposition must evaluate to true or false.

    >> Again, entirely agreed.

    P2: A relational database comprises a collection of propositions (as tuples) that are asserted to be true.

    >> One must demur at this point -- particularly because it is material to the point at issue: Tuples are NOT propositions, not by a long stretch. What they are, is extremely abbreviated sentences. And sentences are not propositions, they are expressions of meanings that, if the sentences be declarative, are propositions. And certainly, the sentences recorded in a database (as tuples) are declarative ones, so yes, they express propositions. But the distinction between the tuples and the propositions is absolutely crucial at this point -- and to ignore this distinction would amount, in this case, to a begging of the question in favour of the anti-nulls position. So I'm afraid I cannot allow the distinction to go unnoted. For to allow it to be accepted, would be to accept that any null included in the tuple is (automagically) included in the proposition. And that latter suggestion is absolutely false: nulls do NOT represent either complete or partial propositions; on the contrary, they represent the absence of a proposition!

    C1: Tuples that contain nulls are not valid tuples because their meaning is indeterminate. (neither true nor false.)

    >> Since P2 is a false premise, this conclusion is not one that is compelled upon us.

    One reason that nulls are bad is because they may affect the results of queries without the reader of the query output being aware of the problem.
    For example, imagine a database which contains the proposition "Employee(.nr) has Salary(.amount)"
    Then imagine that 25% of the salary values were nulls.
    This would distort the calculation of "average salary" which I think would be the kind of "bad thing" for which a DBA or a CIO could loose their jobs.

    >> Ken, you have given me a perfect example to illustrate why "nulls are good" (and thus you should really take your recommended daily dose of them :-). The reason why nulls are GOOD is precisely that the bag functions (e.g. AVG) which the system provides, are TRAINED to ignore rows containing nulls! This is exactly why the DBA or CIO will KEEP their jobs: they chose to accept nulls, and therefore didn't have to worry about or take special measures for those 25% of rows where there was missing information. Instead, the AVG() function knew to ignore those rows that had NULLs, and thus automatically calculated the right average in spite of these rows having missing information. Had they chosen not to allow NULLs, they would have had to use some other, ad hoc default value instead -- which the bag function would not have been trained to deal with. So it's precisely in the situation where they DON'T allow nulls, that their jobs would be placed in jeopardy!! Thank goodness for nulls!!!

    So I'm looking forward to seeing your argument with the conclusion "Thus, nulls are good for you."

     >> As I said, there it is, for you gave me a perfect example to present it with. Thanks,

    Regards,

    Andy

  • 05-21-2011 11:43 In reply to

    Re: To Null or not to Null. That is the question!

     Hi Andy (and all),

    I have to say I intensely dislike the use of the term "religious zealotry" in contexts such as the present.  I appreciate your apology (to Ken) and apparent retraction--of the term's usual connotation if not the term itself.  Chris Date and I would respond indignantly, claiming that we have put forward what we feel to be strong technical arguments, at great--possibly even nauseous--length and in great detail, to justify our position.  Those who fall back on the "religious zealot" counterpunch rarely if ever back themselves up with similar kinds of argument.  Religious zealots are people who feel very strongly about something for which they can offer no scientific evidence at all (as Richard Dawkins would put it), so we tend to feel that such zealotry is in the camps that oppose our view rather than in ourselves.  All that said, we do of course hold very strongly to the position we have expressed, and we have sometimes expressed it in very strong terms, thus exposing us to accusations of zealotry (perhaps justified, without the "religious" epithet).

    Now, my own writings on the subject of NULL were mostly written around the turn of the 1990s and appeared in a couple of books in Date's "Relational Database Writings" series (namely, the 1985-89 and 1990-91 editions).  There's also my university lecture "How To Handle Missing Information Without Using NULL", available at www.thethirdmanifesto.com.  Did I really once write something to the effect of "if you allow a NULL column in a database, you're ipso facto allowing propositions to be recorded that have an indeterminate truth value"?  (I hope I didn't write "NULL column" when I would of course have meant either "NULL column value" or "column that permits appearances of NULL", but I might have done so in the early days before Chris Date inculcated in me the importance of precise writing!)  I agree that Clifford's observation ("more than one fact type [in] the same table") is more to the point.  SQL's addition of a third truth value has some very peculiar effects but it doesn't of itself introduce indeterminacy, though one might say it finesses a certain problem of indeterminacy by that very introduction.

    Anyway, let me make just two points in response, one technical, the other one that readers here might find of historical interest:

    Technical:  Of course NULL does not of itself represent a proposition.  Nothing appearing as or in place of an attribute value in a tuple represents a proposition.  It is the tuple itself that does that, by virtue of its appearance in some relation.  And our foundations in classical logic dictate that the proposition in question is formed by substituting each of its attribute values for one of the parameters (free variables) of the first-order predicate that the relation is deemed to represent.  Logic further dictates that each such attribute value unambiguously designates something.  Furthermore, the same logic dictates that every proposition is either true or false--there is no middle ground.  Furthermore again, we have one rather special predicate, known as "identity", capturing the intuitive notion that everything is the same thing as ("equal to") itself and nothing is the same thing as something other than itself.  NULL refers to a certain construct in the computer language SQL that doesn't designate anything and, thanks to SQL's introduction of an additional truth value, "unknown", is neither the same thing as itself nor not the same thing as itself (except sometimes, when actually it does for a while become the same thing as itself and not the same thing as anything else).  That very observation about SQL means that our logical foundation is completely undermined by that language, and yet it is not replaced by an alternative sound and agreeable foundation (we maintain that would be impossible in any case).  (And anybody who claims that our logical foundation is therefore not sound and agreeable after all had better speak to Aristotle about that!)

    Historical: You wrote, with reference to me, "his argument is based on complete lack of comprehension of why you'd have the NULL there in the first instance."  In 1978 I was engaged, along with a couple of colleagues at IBM, in the technical planning for the relational DBMS known as Business System 12 (BS12) that was subsequently developed and made available to customers of IBM's so-called "bureau service".  Initially I was firmly proposing that BS12 would support something I called "null" (though mine wouldn't have had the weird properties of SQL's NULL).  Moreover, none of my colleagues at the time thought this idea was at all controversial, though there was quite some discussion around questions like, so what's the result of null+1, then? Or null-null? (Bearing in mind, of course, that null=null would have been TRUE.)  However, I was eventually turned off the idea after discussion with various experts, including Chris Date and also including a certain illustrious member of the System R team that gave us SQL, including NULL, in the first place!  (It was the notion of introducing that third truth value, and the rationale for it, that finally convinced me to abandon nulls for BS12.  It scared me to hell!)  I hope you no longer feel that I completely lack any comprehension of why some people feel a need for incomplete tuples (not understanding why completeness is so essential).

    If I may be permitted a relevant plug while I'm here, our latest book, "Database Explorations", contains a whole section on the subject of the so-called problem of missing information.  It includes descriptions and evaluations of four different approaches to the problem, none of them involving NULL, plus some further discussion of SQL's "solution".

    Hugh Darwen

  • 05-21-2011 12:49 In reply to

    Re: To Null or not to Null. That is the question!

    Hi again, Andy,

    I wrote my previous response to your first contributuon in this thread, before I noticed your second.  Here I just want to respond further to this, from you:

    "The reason why nulls are GOOD is precisely that the bag functions (e.g. AVG) which the system provides, are TRAINED to ignore rows containing nulls!"

    You seem to be arguing that NULL is good because AVG (etc.) treat it this way.  Surely you see that if NULL didn't exist, AVG (etc.) would still work just fine--I would say even finer--especially when used in conjunction with the SUMMARIZE ... PER ... relational operator proposed by myself and Date as an improvement on SQL's GROUP BY.  Of course I agree that if NULL exists, then SQL's treatment of it in aggregate functions is sensible.  But then I would add, in that case (a) why isn't the sum of no values at all zero, as it should be (zero being the identity value under addition) , and (b) why isn't 1+NULL equal to 1, considering that SUM(x) = 1 when the bag represented by x here is {1,NULL}?

    Not supporting NULL means the book doesn't have to say anything about its treatment (in every operation).  William of Occam would have approved.

    Regards,

    Hugh

  • 05-21-2011 13:42 In reply to

    Re: To Null or not to Null. That is the question!

    Hi Mr. Darwen,

       It is a great privelege to have you joining in our conversation. Thank you for this. Let me say that I'd much rather sit and listen to you and Mr. Date lecture on this (or any other) topic for a week, than to have to offer up my relatively meager insights on topics that you more educated ones have treated at length.

      However, we are sometimes called to enter the fray where someone has asked us to comment on an issue. But I knew, as soon as I'd sent it, that I'd get into trouble for that "argument based one complete lack of any comprehension" phrase. For although it really meant, "argument presented by Ken by citing Hugh Darwen", it came out like, "argument by Hugh Darwen". And of course, any idiot would know that you would have plenty of understanding of the problem of missing information. So I must again apologize for my unfortunately loose use of language.

      Now, let me say I do appreciate that NULLs do put us in quite an awkward situation, with regard to logical foundations. And again, I'd prefer to listen to you guys tell us all about these issues, than keep blathering on myself. But just at the moment, I don't seem to have any option but to try, more for others than for you, to explain better where I'm coming from (and where I think Clifford is coming from).

       We all acknowledge that we have a problem of "missing information" that we have to deal with. But I found a problem with Mr. Darwen's argument (as presented by Ken, that is), which problem I've yet to make perfectly clear: the problem, I think, is the assumption of a static predicate for a particular relation, one that applies both to rows where there is a null in a certain column, and to rows where that column has a non-null value. That is, when Hugh writes of "the first-order predicate that the relation is deemed to represent", I would argue that it cannot represent the same predicate, for tuples having a null for attribute X, as it represents for tuples having a non-null value for attribute X. Let me give an example to show why it's apparently incorrect to assume a static, unchanging predicate (i.e. one that applies to both the null and the non-null rows).

      Let's imagine a Student table. The table has a PK column called "studentId", a mandatory column called "name", a nullable (i.e. non-mandatory) column called "gender", and a mandatory column called "birthdate": in the horizontal notation used in Halpin and Morgan's book on ORM, that would be

    Student ( studentId, name, [gender], birthdate )

       Now, what is the predicate, for a row where the gender is provided? It is, using mixfix notation and with attributes shown in square brackets, none other than "[student with Id] has [name] and is of [gender] and was born on [birthdate]".  Notice that this proposition type is a conjunction of three smaller, atomic fact types. But is this the same predicate, with the same meaning, as is intended for a row where the gender is not provided? I would say that it is not the same predicate. Rather, the predicate intended on such a row is none other than, "[student with Id] has [name] and was born on [birthdate]". For the absence of the (non-null) gender value, indicates precisely that the conjunct, atomic fact type "[student with Id] is of [gender]" does not have a populating instance asserted on this row.

       However, from what you say above, Mr. Darwen, I feel now that you understand all this perfectly. But of course I was responding to Ken's claims, not yours; and he was claiming (what you've now denied,) that introducing nulls introduces indeterminacy of the proposition's truth value. And my pro-nulls argument was, simply stated, that I've yet to see any proposal for dealing with missing information, that is not more problematical, in practice, than the "nulls" solution. Having clarified all this, I feel I've shot my wad, and now would to well to sit back and listen to the real experts solve the problem of missing information. (Hugh? Terry? Chris?) Please fill me in, I'd love to be educated on the topic. And thanks again for having contributed thus far,

    Best regards,

    Andy

  • 05-21-2011 22:03 In reply to

    Re: To Null or not to Null. That is the question!

    Well, I don't feel like an expert, but...

    The demand to exclude NULLs is for exactly the same reason that fact-based modelling demands elementary form. Whenever you express things in terms of predicates containing composite facts, you have problem when one of those facts is missing; so you need a different kind of expression. When elementary form is used, no fact exists without all roles being fulfilled, so you have no need for NULLs. In this respect, I think relational purists and fact oriented modellers are entirely in agreement, though we use different language to express it.

    The problem with SQL is that for the implementations to get reasonable performance, facts should be clustered into tables that represent composite facts wherever that's possible. This clustering makes the best use of the physical media's access characteristics. If the DBMS implementations had instead focussed on correct modeling in highly-normalised form (isomorphic with elementary form), and solved the problems of physical clustering without that affecting the conceptual models, there would be no need for NULLs... except for the NULLs hidden inside the physical model, of course.

    As technology advances and allows data to move progressively into Flash storage, and the rotational latency of the disk drive disappears, there's less need for physical clustering, so the conceptual and physical models will converge. There's still some premium on physical clustering due to hardware data transfer protocols, but I predict this will progressively become irrelevant.

    Until that happens, we need to deal with the explicit generation of composite physical models, probably using both SQL and NULL values, and I don't see that to be a problem... although it could have been hidden by better DBMS design. The great pity isn't that physical models must be explicitly generated, but that we haven't built query languages that operate purely at the conceptual level, and hide the mapping to the physical level. Although it's possible to use SQL views to project a pure conceptual schema from a composite physical schema, it's a lot more work to construct. Querying it also requires many more joins, and because the SQL join syntax is so awkward and ugly, people avoid that. The result is that physical schemata are hard-wired into database query code in almost every application in existence.

    As it is, we need to solve the conceptual/physical mapping problem outside the SQL implementations. The goal is to never have to interact with the physical model directly. A number of people besides myself have proposed such solutions to this problem, including the estimable Messrs Darwen and Date. My offering is the Constellation Query Language, which is similar to Terry's FORML and Conquer. I honestly think that such solutions represent the future of data.

  • 05-22-2011 2:02 In reply to

    Re: To Null or not to Null. That is the question!

    Clifford

       That's extremely well stated, and yes, you're getting down to at least one of the hubs of this whole issue. I agree that the awkwardness, which we have no choice to deal with, comes from the source that you indicate, and thus (I would add) not from any logical sloppiness or incompetence of the "nulls are OK" crowd. I wanted to bring up the issue of conceptual queries earlier, but I'm glad I waited because you've now covered it much more eloquently than I could.

    Cheers

    Andy

  • 05-22-2011 7:15 In reply to

    Re: To Null or not to Null. That is the question!

    Clifford,

    Well, for a "non-expert" you have certainly made a very good articulation of the "null problem"!

    One small quibble is your use of the term "purist" as you used it in "relational purists".

    As I understand it, the term "purist" may be interpreted by many people to have a pejorative connotation.
    I suspect that this was not your intention.

    Would you agree that "relational conformist"  would be better?

    Here, I draw upon Philip Crosby's "Quality is Free" book in which he advocates defining quality as "conformance to requirements".
    I feel that the term "relational conformist" eliminates the risk of a pejorative interpretation and also leads to useful questions such as:

    Does this data model conform to the relational model?
    Why should I bother to conform to the relational model?
    What are the benefits of conforming to the relational model?

    and so on.

    Regarding nulls, we all seem to be agreed that in RDBMS implementations, nulls are a fact of life that have to be handled.
    However, that is not the same thing as saying that nulls are OK.

    My original interjection (http://www.ormfoundation.org/forums/t/839.aspx) was intended to bring out the point that it is good practice to design a data model in a way that minimises the appearance of nulls in the physical implementation. My aim was to show that Marc's problem could be solved by using a conceptual model that eliminated the need for nulls - in his particular scenario.

    Ken

  • 05-22-2011 10:53 In reply to

    Re: To Null or not to Null. That is the question!

    Clifford wrote: "The problem with SQL is that for the implementations to get reasonable performance, facts should be clustered into tables that represent composite facts wherever that's possible."

    That point about performance is frequently made.  I always want to remark that performance is a characteristic of the implementation, not the language.  I also always want to ask, "performance of what?", to which the answer tends to be "queries that would require extra joins if we decomposed vertically to avoid nulls".  But some queries of course go faster after decomposition. Also, in recent years there has been much interest in the performance benefits claimed for "column-wise" stores, in some varieties of which vertical decomposition is taken to extremes, under the covers.  If those claims are justified, then vertical decomposition at the logical level doesn't seem such a bad idea, performance-wise, after all

    Regards,

    Hugh

     

  • 05-22-2011 22:35 In reply to

    Re: To Null or not to Null. That is the question!

    hughdarwen:
    performance is a characteristic of the implementation, not the language

    While it's all very well to claim that, real applications must be written with real DBMS.

    hughdarwen:
    some queries of course go faster after decomposition

    Of course, and the column store proponents have shown some interesting results. I'm wary though of their promotional focus.

    What does concern me is the attitude that the vast numbers of database designs containing NULLs are that way because the creators are somehow ignorant or lazy. That would be an extremely disrespectful attitude to take to the integrity and professionalism of the million DBAs whose daily work is to scrutinise schemas, transaction logs and system performance in order to create acceptable performance. When a schema can be manually manipulated to implement column-store behaviour or to remove NULLs through better normalization, these people will do it... if it helps instead of hindering. The simple fact is, as I justified in my previous message, in the vast majority of cases, a physical model with NULLs performs better then one without.

    I'm not justifying the use of NULLs in a conceptual model. Far from it. but unless and until we have DBMS that reliably produce the most effective physical schema from a pure conceptual one, we aren't going to make progress.

    The other requirement for using more highly-normalised schemas is a query language where joins don't make your eyes bleed. Recently I helped a New Zealander design a model for a trading (swapping) service. The schema was simple: Owneditem is where Person has Item, WantedItem is where Person wants/is wanted by Item, WantedItem is valued in/contains ValueRange (and OwnedItem-ValueRange too). Five tables; Person, Item, WantedItem, OwnedItem, and a static table for ValueCategory. The SQL view for possible trades, using the (less recommended, but more succinct implicit joins) follows:

    CREATE VIEW PossibleTrade AS
    SELECT  p1.person_id AS person1_id,
            o1.item_id AS item1_id
            p2.person_id AS person2_id,
            o2.item_id AS item2_id
    FROM    Person AS p1,
            OwnedItem AS o1
            Item AS i1,
            WantedItem AS w1,
            Person AS p2,
            OwnedItem AS o2,
            Item AS i2,
            WantedItem AS w2,
            ValueCategory AS vc
    WHERE   p1.person_id = o1.person_id             -- Person1 owns
      AND   o1.item_id = i1.item_id                 -- an Item1
      AND   o1.value_category = vc.id               -- valued in the range VC1
      AND   o1.item_id = w2.item_id                 -- which is wanted
      AND   w2.person_id = p2.person_id             -- by Person2
      AND   p2.person_id = o2.person_id             -- who owns
      AND   o2.item_id = i2.item_id                 -- an Item2
      AND   o2.value_category = vc.id               -- valued in the same range VC1
      AND   o2.item_id = w1.item_id                 -- which is wanted
      AND   w1.person_id = p1.person_id             -- by Person1
    

    Using explicit joins, it's shorter but harder to read:

    CREATE VIEW PossibleTrade AS
    SELECT  p1.person_id AS person1_id,
            o1.item_id AS item1_id
            p2.person_id AS person2_id,
            o2.item_id AS item2_id
    FROM    Person AS p1
            JOIN OwnedItem AS o1 ON p1.person_id = o1.person_id AND w1.person_id = p1.person_id
            JOIN Item AS i1 ON o1.item_id = i1.item_id
            JOIN ValueCategory AS vc ON o1.value_category = vc.id AND o2.value_category = vc.id
            JOIN WantedItem AS w2 on o1.item_id = w2.item_id
            JOIN Person AS p2 ON w2.person_id = p2.person_id
            JOIN OwnedItem AS o2 ON p2.person_id = o2.person_id
            JOIN Item AS i2 ON o2.item_id = i2.item_id
            JOIN WantedItem AS w1 ON o2.item_id = w1.item_id
    

    In CQL, this ugliness vanishes. Here's the entire schema, and the PossibleTrades view (derived fact type):

    Value Category is written as Integer restricted to {5, 15, 50, 150, 500, 1500};
    
    Item is identified by its Id;
    Item has one Name, Name is of Item;
    
    Person is identified by its Id;
    Person has one Name, Name is of Person;
    
    WantedItem is where
        Person wants Item, Item is wanted by Person;
    WantedItem is in one Value Category, Value Category contains WantedItem;
    
    OwnedItem is where
        Person has Item, Item is property of Person;
    OwnedItem is in one Value Category, Value Category contains OwnedItem;
    
    /*
     *  Derived Fact Types (views)
     */
    Person (as Owner) has Item wanted by Person where
        Owner has Item that is wanted by Person;
    
    Possible Trade is where
        Person(1) can trade Item(2) to Person(3) receiving Item(4) in return where
            Person(1) has Item(2) that is wanted by Person(3) that has Item(4) that is wanted by Person(1)
            and Item(2) is in Value Category that contains Item(4);
    

    The last 2 statements contain the same logic as the SQL views above. This schema translates to the same SQL, but is comprehensible to ordinary mortals. I realise that this example doesn't illustrate the use or avoidance of NULLs, but it does illustrate why the extra normalisation (required to eliminate NULLs) is a problem.

    Filed under:
  • 05-23-2011 2:27 In reply to

    Re: To Null or not to Null. That is the question!

    Hi Mr. Darwen,

    Although my wad is shot, I do notice I omitted to respond to a few questions and comments from you. Please allow me to do so below, in-line like

    >> this. 

    hughdarwen:

    <snip> 

    Here I just want to respond further to this, from you:

    "The reason why nulls are GOOD is precisely that the bag functions (e.g. AVG) which the system provides, are TRAINED to ignore rows containing nulls!"

    You seem to be arguing that NULL is good because AVG (etc.) treat it this way.  Surely you see that if NULL didn't exist, AVG (etc.) would still work just fine--I would say even finer--especially when used in conjunction with the SUMMARIZE ... PER ... relational operator proposed by myself and Date as an improvement on SQL's GROUP BY. 

    >> Well, if NULLS don't exist, then it's either because you're using some other, user-chosen default values to represent that "info is missing here", or else you don't have to worry about potential missing information because you've chose to put each elementary fact type into its own table (as in your "6NF"). But this latter alternative, I think Clifford has persuasively shown to be, at present anyway, too inefficient. Which leaves us with the former alternative, default values chosen by the user but which the system doesn't know are to represent "info is missing here". And in that scenario, I'd say No, AVG (etc.) do NOT "work just fine", in fact they become a royal pain to use-- which is my whole point here. Now, it may be that you and Mr. Date have come up with a substitute for GROUP BY that, to some extent anyway, relieves the painfulness involved. I'd certainly be interested in reading up on your "SUMMARIZE ... PER ..." Where would I find the best introductory discussion, if I may ask?

    Of course I agree that if NULL exists, then SQL's treatment of it in aggregate functions is sensible.  But then I would add, in that case (a) why isn't the sum of no values at all zero, as it should be (zero being the identity value under addition) , and (b) why isn't 1+NULL equal to 1, considering that SUM(x) = 1 when the bag represented by x here is {1,NULL}?

    >> These examples can only make one chuckle, and indeed if there is any part of these SQL functions' behaviour that is NOT "sensible", then SUM's treatment of nulls, and addition's treatment of them, are prime candidates for that dishonour.

    Not supporting NULL means the book doesn't have to say anything about its treatment (in every operation).  William of Occam would have approved.

    >> Well, I like Occam's Razor as much as the next philosopher. But honestly, what scares me more than teaching students about 3VL (which I last did about two weeks ago), is the thought of teaching them how to, and that they must, do all the aggregation operations without benefit of NULLs.

    Regards,

    Hugh

  • 05-23-2011 3:39 In reply to

    Re: To Null or not to Null. That is the question!

    Hi Ken,

    You complained about Clifford's use of the term "relational purist", and preferred us to use the term "relational conformist". I demur. I think Clifford's term is somewhat more accurate.

       The reason I say this is that I cannot get my head around the idea that E.F. Codd was not a "relational conformist" (seeing as how he came up with the whole approach in the first place:-). But if he was indeed a "conformist" to the relational model, then we can't use this term to refer to people who abhor the use of nulls. For he himself argued strongly against their position, insisting that nulls were an important part of the relational model! You might wish to have a read of the following 1986 paper by him:

        http://148.247.202.2/~jtj/courses/dbs/relatedpapers/Missing%20information%20(applicable%20and%20inapplicable)%20in%20relational%20databases.pdf

    Cheers,

    Andy

     

  • 05-23-2011 8:41 In reply to

    Re: To Null or not to Null. That is the question!

    Ken,

    P.S. So I don't appear too biased for some people's tastes, I offer up an additional link, which at least gives both sides equal time -- in the persons of Messers Date and Codd:

    http://www.dbdebunk.com/page/page/1706814.htm

  • 05-24-2011 11:09 In reply to

    Re: To Null or not to Null. That is the question!

    Andy Carver:
    You complained about Clifford's use of the term "relational purist", and preferred us to use the term "relational conformist". I demur. I think Clifford's term is somewhat more accurate.
     

    Hi Andy,

    Again I confess to a lack of rigour in my use of language.
    For clarification, I think that it is better to discuss the truth or falsity of one or more conjectures than to resort to ad hominem reasoning.
    This (I now realise) is what was behind my objection to the term "purist".

    So it seems to me better to discuss the topic of "relational conformance" in terms of whether a particular solution "conforms" rather than talk about the properties of any particular person who might propose the solution.

    Ken

Page 1 of 2 (21 items) 1 2 Next >
© 2008-2014 The ORM Foundation: A UK not-for-profit organisation -------------- Terms of Service