in

The ORM Foundation

Get the facts!

ORM for Data Warehouses and Data Marts - Useful?

Last post 07-03-2008 9:50 by Ken Evans. 9 replies.
Page 1 of 1 (10 items)
Sort Posts: Previous Next
  • 06-30-2008 12:48

    • Tom Kregel
    • Top 150 Contributor
      Male
    • Joined on 06-30-2008
    • Rochester Minnesota
    • Posts 3

    ORM for Data Warehouses and Data Marts - Useful?

    I am an experienced data modeler in both the OLTP and OLAP worlds. I have found ORM very useful in OLTP. I am wondering what people feel is the usefulness of ORM to design a data warehouse or a data mart. I am thinking that ORM is more likely to be useful in the warehousing portions since these models are often 3rd normal form or close to it. Marts with dimensions, facts, stars and snowflakes I have doubts about. I see some usefulness in the strict business rule statements in ORM style “Fact” definitions. Could ORM add certain type of objects to make it more useful in this domain?

     

  • 06-30-2008 15:42 In reply to

    Re: ORM for Data Warehouses and Data Marts - Useful?

    Hi Thomas,

    Yes - ORM can be used for data warehouse work.
    Pages 836 to 843 of Terry's new book (The Big Brown Book) cover this topic. (Chapter 16.2)

    These pages cover issues such as data consistency, temporal constructs, denormalisation for performance, terminology, the three main approaches, cubes with more than three dimensions, MDX and English Query.   

    The rest of Chapter 16 covers concepual query languages, schema abstraction mechanisms, other design aspects, ontologies and the semantic web, postrelational databases, and metamodeling. The chapter also includes numerous examples in ORM, ER and UML notation. 

    Right now, Terry is "somewhere over the Atlantic Ocean" so he can't respond immediately. However, I'm sure that he will give us some helpful comments when he gets back home.

    Ken 

  • 06-30-2008 19:13 In reply to

    Re: ORM for Data Warehouses and Data Marts - Useful?

     Hello Tom,

    That's an interesting post.  My initial take on it is to look "up stream."  If you've found that the data for OLAP comes from OLTP (Line of Business transaction Databases), then ORM is useful at least in the sense of providing better quality raw material for these derivative data structures.  You probably know better than I, that many would be fully normalized OLTP DBs are denormalized to optimize query performance.  I think that's a mistake for the "Database of Record" OLTP DBs - the denormalization ought to be done at the OLAP level DBs.  The trend to denormalize at the OLTP level (in my opinion), goes back to the idea of one database for all purposes - where a compromise between data integrity, write  and read optimization seemed reasonable.  A better solution is to keep the OLTP DBs normalized; and to use the high quality data from them to create read optimized structures for query performance and analysis.

    An important consideration is that where denormaliztion of an OLTP DB occurs, it falls on the DBA to insure data integrity - where the fully normalized structures that follow an ORM model have that quality built in.  The little kludge that a good DBA might use to improve performance, yet limit anomalies, can turn into a mine field as the reasoning is lost or forgotten over time.  If the DB is tied to the model, the model keeps the DB on the straight and narrow, and is self documenting.  

     So my opinion is that ORM is very useful to OLAP, if as a provider of better quality source material - if nothing else.

    I hope you'll share your experience with ORM and other modeling approaches in these forums.

    BRN.. 

     

  • 07-01-2008 11:52 In reply to

    • Tom Kregel
    • Top 150 Contributor
      Male
    • Joined on 06-30-2008
    • Rochester Minnesota
    • Posts 3

    Re: ORM for Data Warehouses and Data Marts - Useful?

    Thanks Brian and Ken,

    I had just started reading that section in Terry's book when Ken's response came through.  Great synergy!  Terry talks about denormalizing by identifying key objects.  "An object type is a key object if and only if it's preferred identification scheme is used as the primary key of a table."  Then denormalize following functional chains.  I guess the follow up question would be can NORMA do this?  That may be more appropriate for that other discussion forum.  On the other hand doing this more complex form of Rmap may be a great mind training exercise. I completely agree that a good clean normalized source system is ideal.  We rarely have that option.  Some of these systems are old, poorly designed and inconsistent. Many are off the shelf products which are poorly designed as well.  Cases in point are Microsoft Commerce Server which looks like it was written by two different committees and Oracle Financials.  In the medical domains things are perhaps even worse.  Multiple local, national and international standards in multiple versions for diagnosis or medical procedures.  Complex activities such as appointments, procedures, research studies and patient privacy.  Even rats, germs, tissue samples that may be subdivided down to the gene level.  We are attempting to merge all this into one data warehouse that may even be used for epidemiological studies. My hope is that ORM may help bridge the gap between Specialists in all these areas and translate them to a consistent enterprise data model.

     

  • 07-01-2008 13:34 In reply to

    Re: ORM for Data Warehouses and Data Marts - Useful?

    Hi Tom,

    Well ORM certainly has the capability to do what you want, however, making it happen is a different story.
    In my experience, the "how to make it happen" problem has nothing to do with ORM's capabilities and everything to do with overcoming things like vested interests, hubris, fear, the psychology of sunk cost plus a large dollop of ignorance. 

    If I understand you correctly, you are involved with a project that is attempting to "make sense" out of some datasets that are stored in a mis-mash of poorly defined and incompatible data structures. If this is so, I just don't see how ORM (or anything else short of going back to the drawing board) can help because I don't see how you can avoid GIGO.

    Which reminds me of the Irishman who when asked for directions to a distant place said "Well, I wouldn't start from here!"

    Of course, this is only my opinion and others may have found some magic formula that neutralizes the GIGO problem.

    Ken  

     

    Filed under:
  • 07-01-2008 14:26 In reply to

    Re: ORM for Data Warehouses and Data Marts - Useful?

     Hi Ken,

    I was just about to say much the same thing!  With Tom, I think we may be preaching to the choir - about the real value of sound data.  I think we can all empathize with him about the pressure to make silk purses from sow's ears.  What we can do is articulate the reasoning and rationale to counter the deeply entrenched motivations you mentioned.  

    At the end of the day, what stakeholders want is something that provides the information they can use to make better decisions.  What is less than useless, is the kind of misinformation that is often the product of very good analysis of very bad data.  One major problem is that one value looks very much like another.  If you place a value in a slot marked "Answer" people often accept that it is.  They are happy to find an answer, until they realize it's not the answer. 

    Don't know about you, but I'm uneasy with the thought that medical practitioner's will make choices based on inferences drawn from unreliable data.  "Sorry Mr. Jones; what we thought was a tissue anomaly, turned out to be an update anomaly in a legacy database - pity we didn't catch this before the operation." 

    What can they do?  To start, they can put data integrity as the top priority.

    BRN..

  • 07-01-2008 17:38 In reply to

    • Tom Kregel
    • Top 150 Contributor
      Male
    • Joined on 06-30-2008
    • Rochester Minnesota
    • Posts 3

    Re: ORM for Data Warehouses and Data Marts - Useful?

    Gentlemen,

    Don’t get me wrong.  The data is not incorrect within each application area; it is inconsistently defined across the organization's applications and so it difficult to get an organization level merge.   If the cardiac unit is using one brand of software and the cancer center another, the applications are generating correct data in themselves but the data structure around the "Treatment" concept for example may have different cardinalities and also reflect different clinical concerns.  Just because the data structure is not 3NF does not make data itself incorrect. Practitioner decisions are not a enterprise data warehouse problem since they use a single application for each type of activity.  They mostly use their own eyes, ears, and hands evaluate, diagnose and treat.  Organizational decisions such as funding, overall demographics etc. might suffer due to the problems I mentioned above, but this will not directly affect patient care.  Medical clinic applications are heavily tested and have multiple backup systems and cross checks.  It is due to the specificity of these applications that organizational problems occur.

    Tom

  • 07-02-2008 8:59 In reply to

    Re: ORM for Data Warehouses and Data Marts - Useful?

     Hi Tom,

    The misdiagnosis quip I included in my last reply was just to highlight the concern about trusting results of analytical services, where those results rely on bad or misused data.  It was not meant as a swipe at medical professionals, in your organization or elsewhere.  Though you point out that the quality of analytical results has no direct bearing on patient care,  they must be important - or why go through the process?

    "Just because the data structure is not 3NF does not make data itself incorrect."  Surely; nor does being in any normal form make any data correct - it just protects data integrity against certain anomalies.  As you're talking about derivative data structures used for analysis, the concerns of insert/update/delete aren't the same for an OLTP DB.

    Assuming the data is correct, I'd say the challenge is to make sure the data is not misinterpreted in the process.  I'd make it a point to see that the connection to the original data is not lost.  If you have two data sources, one about "Apples" and one about "Oranges", be wary about settling on "Fruit" for the OLAP structure.  ORM lets you model a hierarchy of subtypes.  While preserving the distinctions will make the OLAP structure larger and less wieldy, where the subtleties are important, they can still be taken into account.  Closer to your question, would be the case where you have "Oranges" and "Oranges."  If you have two data sources that track, say, white cell counts (but different testing methods are used).  I think it's important to keep the distinction; as you can't predict the implications the methods may have.  Even more subtly, you (as the data architect), may not be aware of the two testing methods.  That's why I think you should preserve the simple fact that the data came from distinct sources - again, sub typing.  I'd leave the semantic inferences to the OLAP application.  If, for the purposes of a given query, there is no distinction between "Navel Oranges" and "Valencia Oranges" the difference can be factored out - but where it may have an impact, it remains available.

    Does that sound reasonable to you; and come closer to addressing your original point?

    BRN..

  • 07-03-2008 3:40 In reply to

    Re: ORM for Data Warehouses and Data Marts - Useful?

     Hi Tom,

    I'm doing some research in this space at the moment, and I believe the key words (at least from what I relate to) in your post are "consistent enterprise data model".

    Given your experience, I have no doubt that you know the difficulty in achieving this within large organisations. And it seems that you feel this doesn't disuade anyone from at least 'encouraging' a large enterprise to adopt some semblance of 'consistency' in their dialog. I feel the same. It may also be true that, in the production of a data-warehouse, that it is your client's mandate to the data-warehouse developer....'please bring us closer to consistency'.

    With the work that I'm doing with ORM, I've come to a few conclusions, which may be similar to ones that you already harbour:

    1. 100% consistency across the entire Enterprise may not be achievable, but there is definitely a hierarchy of Facts (and models) which is not only achievable...but is manifest by 'nature/logic' and by standards that are universally adopted (more on this below).

    2. To the extent that 1) (above) is true, perhaps it isn't so much ORM that provides the leverage, as either the tool/s with which you work to generate the data-warehouse and/or to develop your ORM models by.

    To explain...the research I am doing focuses on those conceptual models that can be covered off in 1) (above) first, and attributes them to the highest level of the hierarchy (i.e. the 'Enterprise'). These include

           a. Things that are true in all worlds (and we can be a little bit lenient here and say 'almost always true')
                       e.g. 'Human blood is of color, 'red''. (just as an example).

           b. Things that are universally accepted. (e.g. 'In the language English, blood contains a substance called 'Haemoglobin')  

    I'd stamp this type of data (when manifest as records in the data-warehouse), with the EnterpriseId to indicate that they are true across the entire enterprise.
    (NB Not teaching anyone how to suck eggs here...merely that "that's one ORM model in its own right"....)

    But if part of the Enterprise does 'blood research', and they've decided that 'haemoglobin' is too general a term, and have come up with 2 descriptions (say 'FactorX' and 'FactorY')...then I'd stamp that data with the Unique Identifier for the 'blood research' group (...another ORM model...).

    So in essence, I'm investigating ORM models that map the 'Enterprise' along with (as an inclusion of) the UoD that you are modeling.

    That's where my research is at, and that's where we are pushing development. What I personally feel is the benefit of this approach is that the ORM models can be done at any level/branch of the Enterprise, and still be consistent. i.e. They are consistent with the 'language' spoken by the practitioners at each level/branch of the Enterprise.

    Then, as the Enterprise matures...and the 'blood research' group decides that they were wrong after all (for example), lower level/branch 'Facts' may be either escalated 'up' the hierarchy (to more general widespread use), or stay static, or be dropped (as e.g. 'defunct'). Which again, kinda matches the way that language/terminology/facts grows/morphs/evolves within the large Enterprise. i.e. the data-warehouse/database wouldn't be lying, just reflecting a reality.

    I'd be interested to hear your interpretation on this and whether you share similar sentiments.
    Ultimately, on the front line, you'll know exactly what your client 'wants'. It would be interesting to hear a perspective on that. Do clients want a data-warehouse/EnterpriseModel that 'changes the Enterprise', or one that reflects and (may rapidly) 'change with the Enterprise'?

    Best regards

    Victor Morgante

    Viev 

     

  • 07-03-2008 9:50 In reply to

    Re: ORM for Data Warehouses and Data Marts - Useful?

    Hi Tom, 

    I'm curious as to your opinion on what it is that is the root cause of your problem.

    "Just because the data structure is not 3NF does not make data itself incorrect."
    When data structures are not properly normalized, then there is a risk of anomalies creeping into the database.
    If the application code does not take care of these anomalies then it seems to me that it is highly probable that queries will give invaid results.
    (Note: VisioModeler, VEA and NORMA all generate relational structures to 5NF which helps to minimise the risk of anomalies in an OLTP DB)

    The topic of "domain consistency across applications" also comes to mind.
    In ORM an "Object" is a domain with a name, a data type and a defined set of members.
    Unless two applications have the same definitions for the objects then you are going to get problems when you try to merge the databases.

    "..but the data structure around the "Treatment" concept for example may have different cardinalities and also reflect different clinical concerns. "
    I decode this statement to mean "the applications use different facts and may also use different domains"

    So, with these points in mind, could you please be more specific about what you see as the cause of your problems in merging the application data from the OLTP databases?

    Ken  

Page 1 of 1 (10 items)
© 2008-2014 The ORM Foundation: A UK not-for-profit organisation -------------- Terms of Service