Validation of a physical model against a conceptual model

Last post Tue, Jun 3 2008 10:06 by JParrish. 13 replies.

Page 1 of 1 (14 items)
	Sort Posts: Previous Next

Fri, May 30 2008 11:47

JParrish
Joined on Fri, May 30 2008
Florida, USA
Posts 25

Validation of a physical model against a conceptual model

Reply Contact

I've been working a little on the idea of an application called SchemaCop.. which is to be for physical database models what the .NET application FxCop is for the .NET framework. I think anyone who has experienced working with an existing data model in the majority of cases will face models that lack normalization, referential integrity, and often lack the type of constraints necessary to enforce conceptual rules, i.e. value constraints.

My initial approach is to define the rules as SQL queries which are perceived as having a violation for each record returned. For instance:

"SELECT * FROM PATIENTS WHERE PTYPE NOT IN ('In-Patient', 'Out-Patient')"

This makes the task simple.. albeit limited to the accuracy of the SQL statement and limited to currently data sources that allow execution of SQL commands against them. In the end it is not much more than a glorified SQL script, but with some meta information about each rule, and some work at the GUI, I felt it was compelling enough to have one application that could load a set of rules, and run the rules against multiple data sources reporting back violations for each.

I've been following ORM as much as I can since VEA was integrated into the VS tools suite, and I've been very excited about the progress of NORMA. So.. naturally I thought that it would be great if there was any potential for NORMA to output a file that could be consumed by a program like I describe as at least a launching point for defining the rules to execute against a physical data model.

I don't really have any idea on how to go about that, and it's probably likely that there would be too much manual translation / mapping work to be effective but I thought I'd post this up for any suggestions anyway.

P.S. My sincere thanks to the developers of NORMA for their hard work. I hope to become more active with the project, and offer design work and at least some real world usage to provide feedback very soon.

Filed under: Validation, NORMA

Fri, May 30 2008 13:58 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: Validation of a physical model against a conceptual model

Reply Contact

You mention VEA.
Did you ever get around to using the reverse engineering feature in VEA?

It would be nice to validate against SQL Queries but, as I see it, the rubber meets the road when you allow non-technical domain experts to see the "real meaning" of the data model from which their physical database is constructed.
It seems to me that the only practical way to do this is to reverse engineer the physical database into natural language using an ORM verbalizer.

Not that I have anything against SQL, its just that most "domain experts" are not fluent in SQL (at least the ones that I have met weren't Sad )

But maybe I have not properly understood your intent.

Ken

Fri, May 30 2008 14:52 In reply to

JParrish
Joined on Fri, May 30 2008
Florida, USA
Posts 25

Re: Validation of a physical model against a conceptual model

Reply Contact

Ken, thanks for your reply. I did manage a couple reverse engineerings of physical dbs using VEA, and it seemed to do an OK job at it. The issue comes in the fact that often the physical model is far removed from the original intent conceptually. This happens usually because there is a lack of a conceptual model, poor data modeling skills, changes in the conceptual model without proper physical refactoring, or in a lot of cases a combination of all of these. That said, the conceptual schema that you have from reverse engineering only shows the sad state of affairs in the physical model. So my point of view is that of the information modeler, that has to come in behind a poor design and attempts to refactor the database to a new correct conceptual model. For the purposes of migrating data, or even introducing controls to help enforce the conceptual model in the interim, I feel a simple tool like "SchemaCop" would be useful.

The intent is to perform schema validations against physical models from a conceptual model, where the physical model does not currently adhere to the conceptual model. When I say to perform validations, in my mind I am thinking about looking for violations within the population of the physical model that would violate some fact expressed in the conceptual model rather than say an issue with a table definition. Because of the potentially drastic differences in the two schemas, I can't envision an automated way to perform the validation, so falling back to SQL was one way of demonstrating how you might express a rule, but as you point out this is something that an individual that was familiar with the current physical model, conceptual model, and SQL grammar would have to perform. This is why I cautioned that what I inquire about might not have a good answer short of heavy manual mapping work.

I'm not thrilled about the idea of using SQL to express the rules, but since my knowledge is limited, I don't know of any way to perform translations between a conceptual model and non-matching physical model (a physical model that was not derived from the conceptual model). If there was such a thing.. you could reverse engineer the physical model and then perform "translations" for lack of a better word, that would allow validations against the new "correct" conceptual model. Sounds like pie in the sky to me, but that's why I ask, hoping I am wrong!

Fri, May 30 2008 16:52 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: Validation of a physical model against a conceptual model

Reply Contact

Hi,

Sorry I'm finding difficulty in understanding the functions of what you call "SchemaCop".

As I see it, a physical model is "merely" an encoding of a set of facts. The facts may be wrong (not in conformance with the reality of the domain they are supposed to represent), or meaningless (making statements such as "his wife is not married"), but they are separate points.

I begin by asking the question: "What is a database?" I like the way Fabian Pascal answers this question:

“A database is a deductive logic system. It derives new facts from a set of asserted facts.

The derived facts are true, if and only if

1: the initial assertions are true

2: the derivation rules are (logically sound and consistent”

(Pascal 2000:.xxvi)

To Fabian's comments I would add the statement "if and only if the initial assertions are not meaningless"As you have implied, many information modelers are faced with what you call "a poor design" and they seek to refactor the database to a "correct" conceptual model. But it seems to me that the core question is knowing what is meant by a "correct conceptual model".The only answer I have is that a correct model is one whose facts conform to the "reality" they they are supposed to be representing. You can choose to express this "correct model" in several ways, for example in SQL or with elementary facts in natural language.For me this is just the mode of expression and it does not affect the "correctness" of the model.As I mentioned, it is much easier for folks to validate a model that is expressed as a set of elementary facts. Regarding your phrase " perform translations between a conceptual model and non-matching physical model"As I see it the property of non-matching has everything to do with comparing elementary facts and nothing to do with the way in which the facts are encoded (SQL, phyiscal DB code or natural language)

So the issue with "performing translations between two different models" is that it can only be done by "someone who knows" going through the details and saying things like Data model 1 uses the term "X" to mean "A" whereas Data model 2 used the term "Y" to mean "A".

Of course this is at a very simple level. The real problem is that the when you get down to do the detailed analysis you quickly realise that many data models are talking about different concepts and some are full of "meaningless facts".

Thus, the notion of "translation" is not relevant because it is usually an "apples and pears" problem rather than a problem of "I call an apple an "X" and you call an apple a "Y". "

However, ORM based reverse engineering enables what you call "pie in the sky". But it does require people to do the validating.

Anyway, that's the way I see it.

Ken

Reference

Pascal, F. (2000), Practical Issues in Database Management: A Reference for the Thinking Practitioner. Addison Wesley, Boston

Fri, May 30 2008 17:28 In reply to

JParrish
Joined on Fri, May 30 2008
Florida, USA
Posts 25

Re: Validation of a physical model against a conceptual model

Reply Contact

Ken, I appreciate the thoughtful reply. I don't think the core question is knowing what is meant by a correct conceptual model. I apologize if somehow I called that into question. Let me back up and try to describe just the function of what I imagine SchemaCop to do.. it is helpful if you have used FxCop, as that is a parallel concept as far as the application experience is concerned.

Imagine that we have the following simple fact regarding a laboratory requisition and a specimen:

Specimen was collected for Requisition

Each Specimen was collected for some Requisition

and for our physical model we have two tables

[Requisitions]
ID,
SpecimenID,
DateCreated

[Specimens]
ID,
DateCollected

What SchemaCop should be capable of doing, is inspecting the specimens table population for any specimen that does not have a corresponding "SpecimenID" in the Requisitions table.

So in this case our database does not have a form of RI and over time some specimens were orphaned which is not allowed in our UoD.

I won't complicate this with anything more at the moment, but tell me if what I describe above is different from what you understood previously. Regards!

Sat, May 31 2008 6:42 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: Validation of a physical model against a conceptual model

Specimen.pdf

Reply Contact

Well - I have never used FxCop but I just looked it up and it seems to me that the function of FxCop is to check that a particular instance of object code conforms to a set of "standards".
Using this as a metaphor, then the analagous questions regarding a relational database are:

1: Does the physical schema conform to relational rules? (e.g. 0NF, 1NF... 5NF)
2: Does the physical schema conform to the "reality" of the UoD?
3: Does the data in the physical database conform to relational rules?
4: Does the data in the physical database conform to the reality of the UoD?

Now it seems to me that:
if the answer to 1 is "NO" then the answer to 3 is also "NO"
if the answer to 2 is "NO" then the answer to 4 is also "NO"

So it is my view that any "problems" highlighted by these tests must be fixed first.
The next problem will be to migrate the data from the "bad" database to the "good" database.
I don't see how this can be done without the assistance of a domain expert who "knows the UoD".
There may of course be some "automated assitance" from the RDBMS you use to try to import "bad data" in that it will probably generate "error lists".
But this only does the test for question 3.

I looked at your example of an "orphaned specimen" and I used NORMA to create a small object-role model and accompanying relational model in 5NF.
The attached file shows the facts, their verbalization and the relational model that the facts represent.

"over time some specimens were orphaned..."
The problem that I see here is that orphaned specimens dont have valid semantics.
You can't tell from the data whether the "orphaned specimen" was just "phantom record" created by a data entry error (a specimen record was added by mistake) or whether someone inadvertently deleted the entry in the "Request" table and the entry in the specimen table is valid but has "just" lost its matching record in the other table.

Hope I have made a better stab at understanding what SchemaCop is supposed to do.
Is SchemaCop intended to test for question 1, 2, 3, 4 or some of them or all of them or some other question that I have not yet grasped?

Ken

Sat, May 31 2008 6:46 In reply to

VictorMorgante
Joined on Fri, Dec 14 2007
Posts 82

Re: Validation of a physical model against a conceptual model

Reply Contact

Hi John,

I just read your initial post, and I believe that I can relate to what you envisage. My background is telecommunications billing. As you can imagine, when you have a database with millions of customers, and (what amounts to) data that will be used to bill them....it has to be right.

We often used a tool that we made, called a 'SanityChecker', and it was exactly like what you describe.....a set of SQL statements looking for dependencies or constraint violations, where those dependencies/constraints couldn't otherwise be checked by the RDBMS' we were using (ORACLE, Informix).

I worked on two such billing systems, and the 'tool' proved invaluable.

If I interpret you correctly, the what you are saying is that it would be good/great is an ORM tool could generate a script containing a set of 'SanityCheck' SQL statements.

Or alternatively, it would be great to have 'a file that could be consumed by a program ' (e.g. a DLL) that may be used to check data 'before' it hits the database.

I can't speak for nORMa. My company is building Richmond, a competitor...but i'd say that either tool could do the first, and we (my company) are definitely working on the second. It would be great if RDBMS tools also had ORM (instead or 'as well as' ER) as a diagramming tool also (from my perspective)...I think it'd be great if SQLServer had ORM as a documentation/schema development tool...then more semantically rich constraint checking can be performed 'before the data is stored in the database'.

Is that a fair interpretation of what you are looking for?

As for a 'SanityChecker' type SQL Script...there's nothing wrong with that. The first billing system that I worked with costs in the millions of dollars, and it comes with the SQL script as standard (i.e. it is well known that RDBMS' can't fulfill all the checking we want).

Anyway, I hope this is a welcome adjunct to your post. I find your thoughts most interesting and inspiring.

Best regds

Victor

Filed under: SanityChecking

Sat, May 31 2008 6:53 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: Validation of a physical model against a conceptual model

SpecimenR.png

Reply Contact

The .pdf file in my other post would not take the relational schema so here it is as a separate post.
Ken

Mon, Jun 2 2008 16:45 In reply to

JParrish
Joined on Fri, May 30 2008
Florida, USA
Posts 25

Re: Validation of a physical model against a conceptual model

Reply Contact

Ken and Victor, thank you both for your thoughts. There is certainly a potential here for the scope of the discussion to quickly explode into numerous topics that are each comprehensive, so I will make an attempt at responding to both with a few more steps to explain some of my thoughts on where "SchemaCop" may arrive as a first build. That is not to de-value the input both of you have provided, as I see the discussion naturally evolving to those points as a future concept.

Victor, I think the term "SanityChecker" is perfectly synonymous with what I am attempting to put together. Falling back on my original description, I envision something of a glorifed SQL script.

SchemaCop should have an interface that will allow for the creation of a rule manifest. At its simplest form if you can imagine just a collection of "Rule Context" and "Rule" groups displayed graphically in a tree-view. Graphically it would look something like this:

sample manifest display

In this example you can see that I've created rules with labels that are similar to an ORM verbalization of a fact. This is where I was headed with my thoughts of incorporating output from a tool like nORMa. In my own practices, the "Page" title in nORMa for an individual sub model would be a good analogy to "Rule Context", not a steadfast rule but is how I approach it.

Each individual rule would have other properties like the SQL script to perform the rule validation. Other attributes might include a tokenized validation message for use in reporting, which by the way is the true product of SchemaCop as of now.

Ken you make some good observations, some of which I would like to revisit in the future. Regarding the concern that orphaned specimens don't have valid semantics.. I would agree it could be worded differently but the end result is that there exists a specimen instance that violates the fact "Each Specimen was collected for some Requisition". It is not currently the goal of SchemaCop to attempt to rectify the how and why. With a report of the violation, it may be possible for the data to be corrected from other data present or perhaps via offline paper records. Regardless the data would be violating a fact and could not be migrated into the new model without first rectifying that. Aiding that process is a major component of what I hope to achieve.

Again, I appreciate your inputs. Victor as a side not regarding your Richmond product, I may look into creating a simple XML schema for the rule manifest, which would keep SchemaCop agnostic with regards to where a rule manifest might be imported.

Filed under: NORMA, SchemaCop, Richmond

Mon, Jun 2 2008 19:18 In reply to

VictorMorgante
Joined on Fri, Dec 14 2007
Posts 82

Re: Validation of a physical model against a conceptual model

Reply Contact

Hi John,

I believe I see what you are saying. I believe that's a great idea and a natural extension of an ORM tool.
I mention Richmond not to solicit, but so that I can't be accused of solicitation (i.e. I'm being open about it too).

To be honest, I'm very sold on what has turned out to be something incredible about the ORM community, and that is the resolution to have a common meta-model. It may not be to each vendor/developer/researcher's personal tastes (in passing), but we'll have a common meta-model. And to be fair, ORM (in its formality) more or less dictates it's own meta-model.

Which is to say, if nORMa,Richmond etc can export to it, and SchemaCop can read from it, then it only benefits all the tools, the way I see it.

I apologise for iterjecting on the thread. I was beside myself to actually hear of another SanityChecker. We had much joy in its construction and the cuteness of its name. lol

I can exit this thread here.

Best regds

Victor

Filed under: ORM Data Validation

Tue, Jun 3 2008 1:22 In reply to

JParrish
Joined on Fri, May 30 2008
Florida, USA
Posts 25

Re: Validation of a physical model against a conceptual model

Reply Contact

Ken, I tried to edit my last post to add this, for some reason it would not take the update. Huh?

I wanted to answer your question: "Is SchemaCop intended to test for question 1, 2, 3, 4 or some of them or all of them....", at the moment I am focused on a tool to assist in answering the question posed by item 4. I think the other items are best answered through other tools / methods for the most part.

Tue, Jun 3 2008 4:15 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: Validation of a physical model against a conceptual model

Reply Contact

When you edit, you will see a "reason for edit" textbox at the bottom of the page.
If you don't put anything in this box then the system will not accept the edit.

Ken

Tue, Jun 3 2008 5:52 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: Validation of a physical model against a conceptual model

DataModeling.png

Reply Contact

OK - Thanks for the clarification anbout the focus of SchemaCop.
So your intention is to compare the physical database with the UoD.

Within this context, what do you regard as the UoD?

As I see it, there are two options: (see the attached diagram)
1: UoD = The application domain as a collection of informal statements by domain experts
or
2: UoD = Formal conceptual model = a formal model agreed by domain experts.

It seems to me that if you try to use definition 1, then you still have to validate the UoD with multiple domain experts.
In other words, you have to find a way to create an "agreed" UoD.

And then there is the formal-informal issue.
If you go with an informal solution, then you will still be beset by ambiguities and disagreements.

At least that's how it seems to me.

Ken

Tue, Jun 3 2008 10:06 In reply to

JParrish
Joined on Fri, May 30 2008
Florida, USA
Posts 25

Re: Validation of a physical model against a conceptual model

Reply Contact

Ken,

I think ideally option 2 would be the route taken. In my own experience, when I am tasked with assisting an organization improve their data model, the first step for me is to work with the domain experts to develop a solid conceptual model, following the CSDP as much as possible. As for there being ambiguities between the formal conceptual model and what we can consider as a product from an informal model (I use that term informal model loosely), yes I agree there are instances where that introduces issues. You can't go blindly into working with the two models, it takes some knowledge of both. In my opinion, by gathering information about the mismatch and making decisions on how to treat the data in those cases, that is the best that can be hoped for is it not?

You mentioned in a previous post the problem of comparing "apples and pears". Nothing short of working through the differences with a domain expert can rectify such a mismatch. That is acceptable to me, as is the notion that in some cases data simply will not work from one model to another.

Hopefully I am following your thoughts correctly. Thanks!

John

Filed under: ORM Data Validation

Page 1 of 1 (14 items)

The ORM Foundation