in

The ORM Foundation

Get the facts!

Exclusion constraints

Last post 03-19-2014 12:29 by Ken Evans. 41 replies.
Page 2 of 3 (42 items) < Previous 1 2 3 Next >
Sort Posts: Previous Next
  • 03-16-2014 21:35 In reply to

    Re: Exclusion constraints

     Hi Niels

     We are considering support of structured datatypes, that would be modeled explicitly as coreferenced types to facilitate easy conceptual access to their components. An example would be ComplexNumber defined using two relationships to one-dimensional numbers (one for the real part and one for the imaginary part). Conceptually, we still regard object types such as Point, Line, Circle, Duration etc to be entity types, which may be given a compound reference scheme if needed.

    Different mapping options for such composite identified types could be invoked depending on the target system used for implementation. For example, if a system has native support for temporal range datatypes (including temporal operators such as overlaps etc) this option could be chosen at the mapping phase rather than the conceptual modeling phase. Choice of implementation target is for us not a purely conceptual issue. However if the modeler specifies a specific target system for the mapping, direct support for the special datatypes in that system could be provided. Again, we do not currently provide this support.

    Cheers

    Terry 

     

  • 03-17-2014 11:26 In reply to

    Re: Exclusion constraints

    Hi Terry,

    Thank you for your explanation, although I don't fully understand how to use compound value types as entity types.

    Suppose we have a model with two durations, for example by having a relation that contains a validity duration and a transaction duration, as is a scenario found in some temporal databases.  

    I understand we can create two entity types, each with a start date and end date, each having some attribute indicating whether the duration is left-inclusive or left-exclusive and right-inclusive or right-exclusive. With a compound reference scheme, we may also be able to (at least in principle) map this to a the right underlying values so we can determine equality of two durations, but that seems repetitive.

    Apart from having validity duration and transaction duration isn't there also a more abstract notion of duration that validity duration and transaction duration are instances of? 

    With that, shouldn't the general notion of duration be put in the meta model, instead of being part of the concrete model?

    Niels

  • 03-17-2014 12:51 In reply to

    Hi Niels,

    Just to keep the ball rolling until Terry gets a chance to reply, here is another straw man model that relates to your last post.
    Please feel free to comment on or shoot at my model.
    It will help others to see what can be done and either how to do it or how not to do it.

    Ken



  • 03-17-2014 13:50 In reply to

    Re: Exclusion constraints

    Hi Clifford,

    I by an large agree with what you say, although I would tend to see your final statement in a different light.

    First of all, the range type, which has been so important to the discussion so far, is not only a compound data type, it is also an abstract data type. A range type can be defined as having a start, an end, and some attribute denoting the inclusivity/exclusivity of the end points of the range. It is also polymorphic with respect to start and end. We can plug in any datatype that allows for comparison into a range. As such duration is only one example of a range, we could also have the range of all natural numbers from 1 to 100 (both end points included).

    That said, there is another fundamental difference between a compound type and an entity type, as I already put forward in my response to Terry.

    When it comes to values, we have three different levels, the instance level, the type level, and the meta type level.

    The instance level is a concrete value, the type level denotes the meaning of that value, while the meta type level defines the structural and algebraic properties of a value type. This is, at least implicitly, already the case for the value types in ORM2 as it is, as shown by a phrase given by Terry earlier in the thread: Reservation1.end <= Reservation2.start or Reservation1.start >= Reservation2.end.

    The value types "start" and "end", since they are dates have the comparison operators "<=" and ">=". An analogous phrase could, or at least should not be possible if we replaced these value types with boolean value types, since boolean values have no order. We can however use an analogous phrase in every situation where we have a pair of dates. The validity of that phrase is somehow defined at the meta type level, whether it is named like that or not.

    In my opinion, range should belong at that same meta type level, because the same three levels as seen in value types apply to ranges, on top of that range is an abstract data type, it needs to be defined before we can actually plug in the type parameter to determine what sort of range we actually talking about.

    Niels 

  • 03-17-2014 14:29 In reply to

    Re: Exclusion constraints

    Hello Ken,

    Thanks for the model.

    I am afraid I didn't use the correct terminology in my response to Terry. I used the term duration, where I should have used the term period. 

    Let me give an example of the usage of those two periods.

    Suppose we want to record the employment of persons, and we don't want to lose any information we ever entered in the database.

    To do so we create a quaternary relation with: person, organization, validity period, transaction period.

    Let's say we recorded that person 1, has worked for organization A since 1-1-2004, and we stored this fact on 28-3-2006, when the new application to store this information went into production.

    {person: 1, organization: A, validity: [1-1-2004, ...], transaction: [28-3-2006, ...]}

    Now on 29-3-2006 we discovered a clerical error. Person 1 started working for organization A on 1-2-2004.

    We close the transaction period of the original fact.

    {person: 1, organization: A, validity: [1-1-2004, ...], transaction: [28-3-2006, 29-3-2006]}

    and we create a new fact reflecting the correct validity 

    {person: 1, organization: A, validity: [1-2-2004, ...], transaction: [29-3-2006, ...]}

    Now on 1-1-2014 Person 1 leave company A to start working for company B, and we register that the following day.

    We close the transaction period of the current employment fact. 

    {person: 1, organization: A, validity: [1-2-2004, ...], transaction: [29-3-2006, 1-2-2014]}

    we create a new fact that reflects the ending of the validity period. 

    {person: 1, organization: A, validity: [1-2-2004, 1-1-2014], transaction: [1-2-2014, ...]} 

    and add a new fact related to the employment with organization B 

    {person: 1, organization: B, validity: [1-1-2014, ...], transaction: [1-2-2014, ...]}

    In short:

    The transaction period tells us during which time frame a fact is stored in the logical database (it remains present in the physical database, we never delete or update facts).

    The validity period tells us for which time frame a fact holds to be true.

    I hope this better explains what I said in too few words in my response to Terry.

    Niels 

  • 03-17-2014 16:17 In reply to

    Re: Exclusion constraints

    I realized I made a wrong assumption that in order to define validity period (previously erroneously called validity duration) and transaction period, I would have to define period twice. This of course is not necessary. Period can be defined only once and used twice in a relation, once in the role of validity and once in the role of transaction. 


  • 03-17-2014 16:21 In reply to

    Re: Exclusion constraints

    Hi Niels

    I use "period" to mean an anchored duration of time. In the example I gave, periods were closed (anchored at both start and end), so had start and end times (includes date and time of day). If we modify the example to open periods (e.g. each period has a start but in not necessarily an end) the modeling (including constraints) gets more complex, and it's typically better conceptually to use start end end fact types directly rather than introduce an open period type. 

    ORM distinguishes between entities (e.g. countries), domain values (e.g. country codes), and data values (e.g. strings). So Date as an entity type has instances that are anchored, 24 hour long slices of time, rather than having instances that are date strings. In my FORML constraint example, Reservation1.start is an attribute-style formulation where "start" names the role hosted by the entity type DateTime in the fact type Reservation started on DateTime. 

    Valid time and transaction time are modeled using different fact types. Your quaternary example involving person, organization, validity period and transaction period may be OK as a relation scheme in a relational database, but it corresponds to a compound fact type so would not be modeled that way in ORM, where instead multiple fact types would be used.

    I'm currently very busy with other work, so don't have time right now to provide a more detailed response. However, you can get some idea of how we typically do temporal modeling in ORM by reading section 10.3 of "the Big Brown Book" (Halpin, T. & Morgan T. 2008, Information Modeling and Relational Databases, Second edition, Morgan Kaufmann).

    Cheers

    Terry 

     

  • 03-17-2014 16:47 In reply to

    Re: Exclusion constraints

    Thank you Terry,

    I see how indeed the valid time, transaction time example I gave should indeed in ORM be modelled using elementary fact types, using external unicity constraints. I guess I am still a little bit too much trying to derive a conceptual model from a desired implementation instead of the other way around.

    Niels 

  • 03-17-2014 17:26 In reply to

    Re: Exclusion constraints

    Niels,

    I agree with your point about abstract data types. It's not clear however that we necessarily need to model the meta-type layer. Everything else in ORM is purely first-order, with only a single meta-level jump (from instance to type). I know that Terry has tried hard to keep it that way. We could handle type abstractions such as "range" by means of allowing tools to generate range types as normal composite data-types (structured, as Terry calls them). For avoidance of confusion like a range over booleans, I would flag each elementary data type as "ordered" or not. Booleans, GUIDs, etc, are not ordered, so cannot be used to construct a range.

    Obviously each tool that can construct such types needs a meta-type model, but that model doesn't necessarily have to be explicitly standardised.

    One other concern with composite data types is how to handle alternate possible representations. For example, a complex number can be represented as (real, imag) or as (amplitude, phase) - polar coordinates. The same abstract operations exist on the complex number regardless. Likewise, a date may be represented using a Gregorian calendar, a Julian one, or an Islamic calendar (considering of course that the boundary of a day is different in an Islamic calendar!) or as picoseconds some some epoch. Although the same abstract operations might apply to different representations, the exact definition and meaning may differ, and that would make a meta-type model extremely difficult to construct and to interpret.

  • 03-17-2014 17:59 In reply to

    Re: Exclusion constraints

    Niels,

    I really like your loss-less model using validity periods and transaction periods. Essentially you're saying "during transaction-Period it was believed that <fact> was the case", where <fact> is of the form "<fact1> was the case during validity-Period". I hadn't seen that approach used before; I'd always just used a validity Period and revised it (if necessary) to correct errors.

    One point though: With the elementary approach using Start and End times for the periods, I used to allow the End to be unspecified (or NULL, in the relational form). I now think this was an error. What I should have done is to use a far-future value, essentially an infinite time value (or a finite sentinel), to avoid the occurrence of NULLs. This approach makes the relational expressions much easier to write. Essentially the validity fact is now saying "From start-Time until (the far future) it is believed that <fact> is/will-be true". When end of the validity period for the fact becomes known, it's necessary to update the end-Time with that time before inserting a new record.

    Also, time is effectively continuous (Heisenberg notwithstanding). Only our representations are quantised, and it's this quantisation that makes range inclusion/exclusion difficult. What's worse is that although a day is 24 hours, each date (without timezone specified) lasts 48 hours, and overlaps with the date before and the date after. Awkward! That's why I always try to store UTC.

  • 03-17-2014 18:42 In reply to

    Re: Exclusion constraints

    Exactly because of the points you mention, the range types as present in PostgreSQL, are defined as having a lower and an upper bound. These bounds can be either exclusive or inclusive, and can also be infinite. 

    To store periods, a built in datatype exists called tstzrange, which stands for "time stamp with time zone range". Although it doesn't solve the problem of various calendars, it does solve the time zone issue, since that is part of the time stamp data type, all with a grain of microseconds, which for most practical purposes is highly usable.

    You can find more information about range types by following this link

    I am still curious how to exactly decompose the example I gave with person, organization, valid period, transaction period into elementary fact types. Would that require using person, organization as a objectified relation, with valid period and transaction period as associated facts, and external unicity constraints, or would that be the wrong approach?

    Niels 

     

  • 03-17-2014 19:07 In reply to

    Re: Exclusion constraints

     Hi Niels

     If a person can work only once for a given organization, then you can objectify "Person worked for Organization" as Employment and attach functional fact types to it for valid and transaction start and end dates. 

      If a person can work multiple periods for the same organization (e.g. resign, then rehire later) then you can objectify "Person joined Organization on Date" (all-key) as Employment and attach functional fact types to it for valid end dates and transaction start and end dates.

      Alternatively, you can introduce Employment as a base entity type (preferably with a simple identifier, otherwise coreferenced) and attach to that.

    Cheers

    Terry 

  • 03-17-2014 20:21 In reply to

    Re: Exclusion constraints

    Terry,

    Your proposed solution doesn't handle corrections to a person's employment timeline, without losing the fact that a different timeline was previously believed.

    Niels' problem is more complicated than the solutions you propose. He needs to maintain that at any time, a consistent employment timeline is believed for each Person. By "consistent" we mean that that Person is has at most one record of them working at a given company at any given time (they might however be working for more than one company at a time).

    So there are two constraints here: the consistent-timeline constraint, and the consistent-belief constraint. The consistent-belief constraint builds on the consistent-timeline constraint, so it's quite hard to express. Does anyone want to have an attempt at this? It's bigger than I have time for today...

    [edit] However, the same approach Terry took to allowing multiple periods of employment of the same person at the same company applies here too. We need to objectify the quaternary "on transaction-Date it was recorded that Person joined Organization on Date" (all-key). Then the two end dates can be added, and someone can start scratching their head about how to write the constraints Big Smile. You probably also want to derive the currently believed set of employment facts as a derived ternary of the same shape as Terry's ternary.

  • 03-18-2014 3:22 In reply to

    Re: Exclusion constraints

    Clifford,

    Firstly, "Belief"  does not exist as an independent thing. For a belief to exist, there must be at least one person (who is alive today or is not alive today) who holds (or used to hold) that belief. (e.g. Geocentricity)

    Secondly, what people believe or do not believe can be expressed as an elementary fact.  

    Thus, you are implying that there should be a fact type along the lines of "Person(.name) believes that Fact(.nr) is true between Date() and Date()."
    If this is what is required, then such fact types can be expressed in ORM in the normal way.

    However, it seems to me that persons who are not aware of a fact cannot have a belief about it.
    So, if you go down the "belief" route then you have to say who it is that believes the fact to be either true or false.

    Talking about the domain of "employment", many organizations have an HR department that is the custodian of "beliefs about employment within the organization" and such "beliefs" are stored as records of facts within the HR department's files (whether computerized or not).

    It may also be the case that a Tax Authority also has "beliefs about employment" that are recorded as facts within the files of the Tax Authority.

    And it would not be surprising to find that, at a particular moment in time, the facts about the employment status of an individual held by the Tax Authority are not in agreement with the facts held by the HR Department.

    So as I see it, using the term "belief" in the abstract leads to confusion.
    You have to ask "Who holds this belief?" and then express the belief as a fact that is believed to be true or false by the person or persons involved. 

    Ken 

     

     



    Filed under:
  • 03-18-2014 4:21 In reply to

    Re: Exclusion constraints

    Ken, yours may be a fair comment in many cases - but the need to maintain such a history often arises, even if only through blind rules or legislation, rather than anything reasonable. There are many types of facts other than employment about which such histories are required to be kept, for example to justify (after the fact) a decision that might have been made on a fallacious basis.

    As a real example from one of my recent consults in the employment industry, a candidate's employment history is recorded from sources that are not always trustworthy. On the basis of such facts, a recommendation to a third party might be made that could afterwards be questioned, or even potentially be the subject of a legal challenge. No matter the reason why the facts were incorrect, and even after the correct facts are known and have been recorded, it's still necessary to produce a forensic (i.e. strong under legal scrutiny) audit trail that can defend that recommendation.

    I've encountered the same thing in almost every record in the insurance industry, which always must be able to produce a forensic trail to support any legal battle. It might sound arcane, but requirements for this kind of forensic history is endemic in many normal business processes, and such data rules and the requirements for models to support them are a significant pain. PostgreSQL's support for time ranges is a very great benefit, and ORM tool providers would be wise to take note.

Page 2 of 3 (42 items) < Previous 1 2 3 Next >
© 2008-2014 The ORM Foundation: A UK not-for-profit organisation -------------- Terms of Service