Modeling based an existence chain

Last post Wed, May 25 2011 18:47 by Ken Evans. 44 replies.

Page 2 of 3 (45 items) < Previous 1 2 3 Next >
	Sort Posts: Previous Next

Fri, May 20 2011 4:47 In reply to

Andy Carver
Joined on Fri, Apr 25 2008
Colorado, U.S.A.
Posts 107

Re: Modeling based an existence chain

Reply Contact

Hi Marc,

I think maybe you're saying you'd prefer to have an ELSE clause carry the ball wherever a check for null finds a null -- something like,

"WHERE blah-blah is null, THEN return such-and-such, ELSE WHERE blip-blip is null, THEN...." <return so-and-so...>

I sympathize, one does intuitively feel the need for that... But Clifford has actually given the correct syntax, for a "searched CASE" statement in SQL Server. Any subsequent WHERE clauses in the CASE statement are checked only where none of the previous ones evaluates to true.

I also agree with Clifford that the way to do this in an ORM, conceptual schema is with a derived fact type (which I gave you in my ORM diagram). That remains true no matter what your target system is.

Whether Clifford's use of CASE, or your use of the "join" you mention, is more efficient or more correct (for implementation purposes), would depend on what this "join" looks like... which I'm sure we'd be willing to comment upon, if you're willing to show it to us....:-) In any case, with such flexibility in SQL, I don't see any reason why you would need a "matrix" table to implement this business logic.

Cheers,

Andy

Fri, May 20 2011 4:51 In reply to

Andy Carver
Joined on Fri, Apr 25 2008
Colorado, U.S.A.
Posts 107

Re: Modeling based an existence chain

Reply Contact

P.S. Sorry for the typo in my hypothetical example, it should have read,

"WHERE blah-blah is NOT null, THEN return such-and-such, ELSE WHERE blip-blip is NOT null, THEN...." <return so-and-so...>

Cheers,

Andy

Fri, May 20 2011 9:58 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: Modeling based an existence chain

Reply Contact

There seems to be an implied "Elephant in the Room" in this thread called "Nulls are OK"
Isn't it the case that relational theory breaks down when missing values (nulls) are encountered in a table?

Of course it seems that there are many RDBMS's out there that contain database instances that include nulls.
And if Marc's environment is forcing him to try to reason with a database that includes nulls then yes, that's big problem and he has our sympathy.

However, nulls are a bigger problem than they may seem seem to some folks because of the inherent indeterminability of 3VL (As explained in places such as Ch5 of Chris Date's book "An Introduction to Database Systems Volume II". ) The question is "What does a null mean?": Value unavailable?; Value exists but not entered?; We'll put it in later?; Value not relevant?.... It is this indeterminability that creates the need for convoluted logic in an attempt to solve a problem that is probably unsolvable.

So, whilst there may be a (very) few good reasons for designing a data model that the modeler knows will unavoidably include nulls, isn't it the case that one of the benefits of ORM is that it allows you to design databases that minimise the use of nulls?

Ken

Filed under: ORM & Nulls

Fri, May 20 2011 12:00 In reply to

Andy Carver
Joined on Fri, Apr 25 2008
Colorado, U.S.A.
Posts 107

Re: Modeling based an existence chain

Reply Contact

Hi Ken,

Good to hear from you. Certainly, there are many religious zealots out there (of whom the generally sagacious Chris Date is one) on the issue of nulls and their advantages or disadvantages. Not to make fun of either side, but I'd be interested in finding anyone who could prove that having nulls causes more problems than does forbidding nulls. There will always be "missing data"; the question is only, how will we represent it in the database? And I think you'll find, if you look into it deeply, that it's rather more problematic, in dealing with that issue, to forbid nulls, than it is to allow them.

Be that as it may, I'm much more interested in answering your question about ORM and nulls. I think you've conflated two issues here: Because a correct ORM model contains only elementary fact types, there are, in fact, no nulls in the population of an ORM, conceptual database. But that's an entirely different question, from that of whether (or how many) there are nulls in an RDBMS implementation of a conceptual database. On the latter question, ORM is entirely indifferent, and lets the chips fall where Rmap may cause them to do. So, to your last question, I would say the answer is an unqualified No.

Best regards,

Andy

Fri, May 20 2011 16:49 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: Modeling based an existence chain

Reply Contact

Hi Andy,

Well, I respect free speech but your characterisation of Chris Date as a "Religeous Zealot" rather contrasts with what Amazon.com says about him which is: "Chris Date is the computer industry's most respected expert and thinker on database technology..."

Andy Carver:
I'd be interested in finding anyone who could prove that having nulls causes more problems than does forbidding nulls.

Well, there is lots of evidence out there such as that on Hugh Darwen's page at Warwick University. (Look at his lecture handout "How to Handle Missing Information Without Using NULL." Hugh's lectures include the following passage which I quote:

Relational theory is founded on 2VL.
A relation r is is interpreted as the extension of some predicate p.
Let t be a tuple with the same heading as r.
If tuple t is a member of r, then the proposition P(t) is taken to be TRUE.
Otherwise, (t is not a member of r), P(t) is taken to be FALSE.
There is no middle ground. The law of the excluded middle applies.
There is no way of representing that the truth of P(t), is unknown or inappliccable or otherwise concealed from us.

I'm wondering if one of "the problems" is that there seems to be difference of understanding between those folks who are well-meaning SQL experts, and those folks who think that a database is a system of deductive logic that must conform to a set of rules.

Lastly, as I see it the issue of nulls is related to first order predicate logic rather than any arbitrary religeous beliefs.

Ken

Fri, May 20 2011 18:19 In reply to

Re: Modeling based an existence chain

Reply Contact

Ken, it sounds like you're proposing that every time we have a fact type that is many-to-one but not mandatory on either side, we should have a new table for it? Because frankly, that does sound like religious nonsense. It's not what Rmap does in this very common case, and that's a good thing too. The original non-mandatory fact type isn't at all illogical, and despite the barmy rules that SQL has about NULLs, and the problems you can get into with them, they're still a sane way to handle this situation.

Sun, May 22 2011 9:02 In reply to

Andy Carver
Joined on Fri, Apr 25 2008
Colorado, U.S.A.
Posts 107

Re: Modeling based an existence chain

Reply Contact

Hi Ken,

To pick up this thread again: OK: at the close of your last message (to date) on the "To null or not to null" thread, you finished with:

"My original interjection (http://www.ormfoundation.org/forums/t/839.aspx) was intended to bring out the point that it is good practice to design a data model in a way that minimises the appearance of nulls in the physical implementation. My aim was to show that Marc's problem could be solved by using a conceptual model that eliminated the need for nulls - in his particular scenario."

Great! I look forward to your showing us this, as I'm sure Marc (and Clifford) do as well. The solution that Clifford and I offered, as the correct way to model (this part of) Marc's domain in ORM, certainly involved an implementation (based on Rmap) that could have nulls in some columns of the Claimant table. So perhaps you would be so kind, now, as to go ahead and show us your non-nulls solution?

Cheers,

Andy

Sun, May 22 2011 11:23 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: Modeling based an existence chain

Reply Contact

Hi Andy,

May I suggest that we begin with you commenting on my earlier proposal - which you can see if you scoll back up the thread a few posts.

Ken

Sun, May 22 2011 21:16 In reply to

Re: Modeling based an existence chain

COLClaimantNulls.png

Reply Contact

Ken,

I thought that Andy's request was quite reasonable. To justify that, I will return to your claims:

Isn't it the case that relational theory breaks down when missing values (nulls) are encountered in a table?

In Marc's case, (see my amended version of Andy's model below), the Claimant's FederalAgency and PublicOrganization are non-mandatory fact types. In the NORMA mapping, we would have a Claimant table with FederalAgencyId and PublicOrganizationId. in Terry's horizontal notation, it looks like this:

Claimant(claimantId, countryCode, hasUnlistedWorkHistory, [federalAgencyId], [publicOrganizationId])
Country(countryCode, costOfLiving)
FederalAgency(federalAgencyId, countryCode)
PublicOrganization(publicOrganizationId, countryCode)

Instead, if we insist on not using NULLs, we can objectify and make independent the two "works for" fact types. That transforms the schema to this:

Claimant(claimantId, countryCode, hasUnlistedWorkHistory)
Country(countryCode, costOfLiving)
FederalAgency(federalAgencyId, countryCode)
PublicOrganization(publicOrganizationId, countryCode)
ClaimantWorksForFederalAgency(claimantId, federalAgencyId)
ClaimantWorksForPublicOrganization(claimantId, publicOrganizationId)

Now, consider a populated database of a million claimants, a hundred FederalAgencies, and a hundred PublicOrganization, where a quarter of the Claimants work for a FederalAgency and another quarter work for a PublicOrganization. To discover the cost of living for all claimants requires fetching all million records. Because the number of Countries, FederalAgencies and PublicOrganizations is small, we may assume that they require a single disk access each (at most, because they'll stay in cache). The Claimants table in the schema with NULLs is at most 2 megabytes bigger than the version without NULLs - that's assuming the worst case where a NULL is stored as a two-byte indicator rather than as a single bit vector. But the second schema has two additional tables, of a quarter of a million records each. Instead of scanning one table (and joining with cached data), we now must join three tables (and join with the cached tables too). If the results are fetched incrementally, the disk is seeking between three tables, instead of doing a linear scan across a single table - this incurs an order of magnitude performance loss.

If we insist on making the relational view available, but still need the performance advantages of storage using NULLs, the derive the two relations (ClaimantWorksForFederalAgency and ClaimantWorksForPublicOrganization) as views, and ignore the nullable columns (or hide them behind other views). This hiding is what the DBMS tools should be doing for us!

In addition, every time we wish to know where a Claimant works, we must write a 3-way join. It still doesn't get rid of the CASE or 3-way conditional statement, it just makes everything more complicated. The operators relational model doesn't provide a simple solution to avoiding this conditional. The simplest way I can see of doing it is to derive three new relations "ClaimantCostPerTheirHomeCountry", "ClaimantCostPerTheirFederalEmployer" and "ClaimantCostPerTheirPublicOrganization", with a fixed precedence column in each, then union those three into one relation, finally choosing the tuple for each claimant which has the highest precedence.

While that's a nice little theoretical exercise, it's a massively useless solution to a problem that simply doesn't exist. It is the insistence on using this kind of logic that evokes claims of zealotry.

A foolish consistency is the hobgoblin of little minds, adored by little statesmen and philosophers and divines - Ralph Waldo Emerson

Sun, May 22 2011 23:28 In reply to

Andy Carver
Joined on Fri, Apr 25 2008
Colorado, U.S.A.
Posts 107

Re: Modeling based an existence chain

Reply Contact

Hi Ken,

Excellent proposal. I would be glad to comment on your model.

From what I can tell, the domain (or to be precise, Marc's "watered-down version" of it) involves the idea that a person works for a federal agency or a Public Organization (or has no documented work), but does not do both. In other words, there is an exclusive-OR constraint between these roles or activities. So I included that constraint in my ORM diagram. I then asked Marc about my diagram as a whole, and he said it seemed to reflect the domain as he understood it.

Now, if the "exclusion" part of that constraint is indeed accurate, and if (as I also surmised and had in my model) indeed a person works for ONLY one Federal Agency or ONLY one Public Organization, then one problem with your model is that the ternary fact type is non-elementary, and thus needs splitting (into two binaries). This can be seen from the one-role uniqueness constraint that would apply to the Claimant role (which violates the "n-1 rule").

And I would think that that might puncture a tire on our transportation to a nulls-free implementation of the model for this domain... for to have an accurate and correct ORM model of the schema, it seems you have to go to binaries instead of this ternary, and some of those binaries will be non-mandatory.... leading to optional (i.e nullable) columns in any Rmap-produced relational schema. In any case, there's my feedback on this model, hope it helps...

Cheers,

Andy

Tue, May 24 2011 5:11 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: Modeling based an existence chain

colR.png

Reply Contact

Hi Andy,

Andy Carver:
From what I can tell, the domain (or to be precise, Marc's "watered-down version" of it) involves the idea that a person works for a federal agency or a Public Organization (or has no documented work), but does not do both. In other words, there is an exclusive-OR constraint between these roles or activities.

If you look closely at my diagram, you will see that I have specified three permissible options on "Organization(.type)".
Thus, the "exclusive or" is here because only one option can be chosen.

Andy Carver:
I then asked Marc about my diagram as a whole, and he said it seemed to reflect the domain as he understood it.

Well, I follow the practice of checking the verbalization with the user rather than the diagram.
My ternary verbalises as:

Claimant works for Organization in Country.
Each Claimant works for some Organization in some Country.
For each Claimant and Organization,
that Claimant works for that Organization in at most one Country.
This association with Claimant, Organization provides the preferred identification scheme for Employment.

And this creates two simple tables as shown below.

Ken

Filed under: Nulls

Tue, May 24 2011 7:26 In reply to

Andy Carver
Joined on Fri, Apr 25 2008
Colorado, U.S.A.
Posts 107

Re: Modeling based an existence chain

Reply Contact

Hi Ken,

So, let me ask you a question about that object type you call "Organization(.type)". Did you mean it to represent a set of (possible) organization types (e.g. {'Federal','Public','NotListed'} )? If so, the name of the object type should be "OrganizationType", not "Organization" (and the ref mode should be "(.name)"). On the other hand, did you really mean it to represent a set of (possible) organizations? Then the ref mode should by no means be "(.type)" -- in fact, that should never be a ref mode in any model... and by the same token, your value constraint {'Federal','Public','NotListed'} should, in this case, be deleted (as it is a set of organization types, not organization names or id's or whatever you want to identify organizations with).

In the former case, which I'm guessing you intended, then another problem with the model is that you needed to have an object type that represents a set of organizations -- the database needs to record the actual, specific organization where the claimant works (so it can find out which country that organization is in). So the value constraint needs to be deleted, also, because it is a set of organization-type-names, not organization names (or id's). But if you actually intended the latter sense (organizations, not organization types), then again, by the same token, the value constraint needs deleting, and the ref. mode needs changing (as outlined above). So in any case, your value constraint is not part of an accurate model of the domain.

All this is true, assuming my interpretation of the domain is correct. I'm pretty sure it is; but if you think you have any evidence to the contrary, I'd be more than happy to look at it -- and see if it outweighs my evidence.

Regards,

Andy

Tue, May 24 2011 8:35 In reply to

Re: Modeling based an existence chain

Reply Contact

Ken Evans:
And this creates two simple tables as shown below

Note however, that Ken's model is different from the models shown above. In particular, it involves the ternary "Claimant works for Organization in Country", whereas the others involve "Claimant works for Organization which is in Country"; i.e. two joined binaries. If the latter model is correct, then Ken's ER model is not normalized, because of the FD of Country on Organization.

Tue, May 24 2011 9:21 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: Modeling based an existence chain

Reply Contact

Hi Andy,
It would appear that we base our respective uses of ORM on different philosophical foundations.

My approach is based on the following principles (inter alia)
1: Words are just symbols(collections of one or more characters) without any intrinsic meaning.
From this perspective, your implied assertion that I have "used the wrong words" really means "you prefer a different group of letters" .That's fine by me but it does not change the logic.
Same point about .type which I see as a convenient but arbitrary way of giving a name (again arbitrary) to the characteristics of the members of the set. If you prefer ".name" that's fine by me but it does not change the logic.

In the wider context of easy to interpret natural language usage, I'll agree with your suggestion about OrganizationType(.name).
But it does not change the logic!

So as I se it, the logic of the first object type is: The set named "X" has three members "a,b & c"
Or as I have put it: The set named "Organization" has three members: "Federal, Public & NotListed"

In passing it is worth noting that "NotListed" is the way I chose to avoid the use of nulls.
However, I did extract these terms from Marc's description of his domain.

Andy Carver:
...needed to have an object type that represents a set of organizations -- the database needs to record the actual, specific organization where the claimant works (so it can find out which country that organization is in).

Well, I did not infer that from Marc's description but you may be right on this point.However, my purpose was not to present an accurate model of Marc's domain (I agree with Clifford that we don't have enough information for that).

My purpose was to illustrate the principle of how Marc's domain could be modeled in a way that does not require the use of nulls at the conceptual level.

Ken

Filed under: Nulls

Tue, May 24 2011 10:14 In reply to

Andy Carver
Joined on Fri, Apr 25 2008
Colorado, U.S.A.
Posts 107

Re: Modeling based an existence chain

Reply Contact

Hi Ken

You write that "My purpose was to illustrate the principle of how Marc's domain could be modeled in a way that does not require the use of nulls at the conceptual level." Ah... that's certainly different than how you characterized your purpose before.

But given your new characterization of your purpose, you've been expending effort to illustrate something that needs no illustrating. It's a truism, after all, that "at the conceptual level" we work with elementary fact types; and it's another truism, that as long as all your fact types are elementary, there are no nulls in the fact populations. This is true not of this or that domain (e.g. Marc's), but of every domain.

As such, I guess you made your point. But it came for free, no modeling required.

Cheers

Andy

Page 2 of 3 (45 items) < Previous 1 2 3 Next >

The ORM Foundation