In which step should artificial identities (IDs) be introduced if not mentioned by domain experts?

Last post Sat, Sep 18 2010 21:01 by Terry Halpin. 11 replies.

Page 1 of 1 (12 items)
	Sort Posts: Previous Next

Thu, Sep 16 2010 2:41

icmd.org
Joined on Wed, Sep 1 2010
Posts 6

In which step should artificial identities (IDs) be introduced if not mentioned by domain experts?

Reply Contact

I wonder in which step should artificial identities (IDs) be introduced if not mentioned by domain experts? Although I haven't finished reading, but the question arises. I hope I can get a certain answer before I read on. Or, introducing artificial identities should be strictly avoided in conceptual modeling process, because they're logical, or even physical? Thanks, everybody!

Thu, Sep 16 2010 12:09 In reply to

Tyler Young
Joined on Thu, Aug 27 2009
South Jordan, Utah, USA
Posts 49

Re: In which step should artificial identities (IDs) be introduced if not mentioned by domain experts?

Reply Contact

I can't answer the question for everyone, but in my personal work I tend to introduce artificial identifiers almost immediately. When I'm modeling I like to make everything "work" as much as possible as it's being created. This behavior is HEAVILY influenced by NORMA, since it gives red error messages if your entity types don't have identifiers. You can ignore the errors and fix them all with another pass at the end, but I find it easier to do the identifier at the time that I'm thinking about the entity type in context. On the other hand, when I'm just doing a quick sketch-up on the whiteboard, my entity types almost never have identifiers. I simply don't bother writing them out unless I'm going to implement something from the model. Before anyone brings it up; yes-- I am that lazy.

Thu, Sep 16 2010 22:27 In reply to

icmd.org
Joined on Wed, Sep 1 2010
Posts 6

Re: In which step should artificial identities (IDs) be introduced if not mentioned by domain experts?

Reply Contact

Thank you, Tyler!

Tyler Young:
When I'm modeling I like to make everything "work" as much as possible as it's being created. This behavior is HEAVILY influenced by NORMA, since it gives red error messages if your entity types don't have identifiers.

Do you really mean "artificial identifiers" or "reference modes" here? Will NORMA warns if entity type has "reference mode"?

What I worries the most is: If I (as data modeler) introduce artificial identifiers (e.g., StudentID), at the conceptual modeling stage, then, every familiar example stated by domain experts (e.g. The student named 'Anne Fischer' is female.) is the JOIN OF TWO ELEMENTARY FACT TYPES modeled in ORM (Student with StudentID 1 has StudentName 'Anne Fischer'.; Student with StudentID 1 is female.)

This itself is not a problem. Just add some kinds of indirection in the conceptual modeling stage. And I also believe concepts which do not exist in the DoD of the domain experts (e.g. artificial identifiers) should not present in the conceptual modeling. Is the philosophy we hold in common?

Please provide suggests. Thank you.

Fri, Sep 17 2010 3:08 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: In which step should artificial identities (IDs) be introduced if not mentioned by domain experts?

Reply Contact

Part of this puzzle lies in the phrase "if not mentioned by domain experts"
All utterances are made within a context which is often assumed and unstated.

Thus, one must ask "How does a domain expert uniquely identify something (e.g. the Student Anne Fischer )

For example: It is not unusual to have two people with the same name.
So the modeler needs a way to uniquely identify each person.
If the domain expert was in a room with two Annes, then the domain expert might point at the relevant Anne with a finger.

Ahh, you might say, we don't have any name duplications amongst our student population!
OK says the modeler, can you guarantee that this will be true for the life of the database?

So when I'm modeling, I try to model the general case and create a model that handles the implicit but unstated assumptions of domain experts.
Note that unique identification is a semantic problem and has nothing to do with computer technology.

Ken

Fri, Sep 17 2010 4:01 In reply to

icmd.org
Joined on Wed, Sep 1 2010
Posts 6

Re: In which step should artificial identities (IDs) be introduced if not mentioned by domain experts?

Reply Contact

Thank you, Ken!

Ken Evans:
Ahh, you might say, we don't have any name duplications amongst our student population!
OK says the modeler, can you guarantee that this will be true for the life of the database?

No, I'll not argue this at all. I understand this deeply :)

Ken Evans:
Note that unique identification is a semantic problem and has nothing to do with computer technology.

Yes, I understand it is a semantic problem.

Since semantic problems should be resolved by domain experts. So, I think it's OKAY to push the domain experts harder to let them state examples as:

Student with StudentID 1 has StudentName 'Anne Fischer'.
Student with StudentID 1 is female.

Rather than:

The student named 'Anne Fischer' is female.

But this sometimes revolutionize their thinking... Do you agree with me?

So, I think introducing the unique identifiers between CSDP step 1 and CSDP step 2, by the modeler, is acceptable? Or, do you believe it's only a practice (I don't think so, frankly)?

Thanks very much, again.

Fri, Sep 17 2010 5:46 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: In which step should artificial identities (IDs) be introduced if not mentioned by domain experts?

Reply Contact

icmd.org:
So, I think introducing the unique identifiers between CSDP step 1 and CSDP step 2, by the modeler, is acceptable? Or, do you believe it's only a practice (I don't think so, frankly)?

Well, it seems to me that what we are discussing is much more than "just" a problem of "introducing unique identifiers".

In talking about "entities". Terry says "...the description must specify the type of entity being referred to: the entity type. A type is the set of all possible instances." (page 68 of the BBB)

So with this in mind, let me give you an example of just one of the "problems with domain experts".
In the 1990's, a bank hired me to advise them on problems they were having with a rather large systems development project.
The bankers (the domain experts) wanted the developers to create "ten new transactions" that were to be available across 11 countries.

Example of a transaction: A service that guaranteed that funds could be transferred from an account in one of the 11 countries to an account in any of the other 11 countries within one hour. Other transactions were "within 2 hours", "within one day" and so on.There were also other criteria.

Each week, we had a project review attended by about 20 people. This included the development managers from each of the 11 countries.
The project was late and over budget. Each week it became later and even more over budget. There were many heated exchanges at the project review. The bankers "just" wanted ten new transactions. But the developers were on an Ed Yourdon style "death march".

So I interviewed a few people to "get the facts". Then I used the data from the interviews to create an object-role model. (VisioModeler in those days)
The process of building the object-role model, exposed the complexity hidden behind the domain expert's "requirement for "just ten new transactions" Each central bank had different rules, there was more than one time zone, and so on.
The details showed that there were really more than 1500 new transactions and this is what was causing the death march problem.

I then gave a presentation to one of the senior domain experts.
I used the "Fact Report" to get him to validate my model.
He made a few minor changes that showed that my first model had been more than 95% accurate.

So at the end of the presentation I felt triumphant because I felt that I had successfully exposed the problem as one of poorly specified requirements.

And you know what he said?
Well Ken, that's all very interesting but all we want is ten new transactions......

This is just one of many examples of my experiences with "domain experts" who seem to be completely unaware of the limitations of ambiguous natural language. I characterise this as the "Fish who don't know about water" problem.

So as I see it, the problem is more related to the challenge of helping "domain experts" to "think in terms of set theory and predicate logic".

So it seems to me that this is more a problem of applied psychology than it is of a unique identifier debate.

Do you agree?

Ken

Fri, Sep 17 2010 7:09 In reply to

icmd.org
Joined on Wed, Sep 1 2010
Posts 6

Re: In which step should artificial identities (IDs) be introduced if not mentioned by domain experts?

Reply Contact

Thank you for sharing the story, that's so meaningful!

So, I guess the 10 -> 1500 growth is caused little-by-little from minutiae like introducing unique identifiers (I'm sure it musc be very detailed and complex)? I ask this because the domain expert working with me (myself, I confess...) do not have the problems of underestimating the workload.

Backing to the topic, if introducing the unique identifiers at the CSDP step 1. So the elementary facts after step 1 should be like:

Student with StudentID 1 has StudentName 'Anne Fischer'.
Student with StudentID 1 is female.

Although that's not intuitive, but semantically correct and elementary. I wonder why the BBB do not use such kinds of examples? Unique identifiers is important to keep the semantic correctness and also important/useful in relational database, why BBB do not take such approach?

Thanks!

Fri, Sep 17 2010 8:15 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: In which step should artificial identities (IDs) be introduced if not mentioned by domain experts?

Reply Contact

icmd.org:
So, I guess the 10 -> 1500 growth is caused little-by-little from minutiae like introducing unique identifiers

Well, that's not quite what I wanted to convey. I don't see the problem as "underestimating the workload". I see the "problem" as something much deeper and more difficult to handle...

In the realm of interpersonal communication, we are all limited by our linguistic skills which are related to our mental models and our perceptive processes .

In other words, the language that we use for thinking and for expressing our thoughts in utterances or in writing is where many communication problems originate. (For starters, Google "the Sapir-Whorf hypothesis")

I see the issue of unique identifiers as being inextricably linked to the way we perceive and think about the world about us.
For example, I see a problem with your example :

icmd.org:
Student with StudentID 1 has StudentName 'Anne Fischer'.

Let's examine the meaning of the term "Student" in the expression "Student with StudentID 1"?
In logic you would say something like "Each Student has a Name" (Yes I know that its more like "For each Student there is a Name such that...")
Now in this "Fact" you are referring to two "entity types" : Student and Name.
And, as Terry explains "A type is the set of all possible instances."
So, in an ORM tool, we need a mechanism for expressing the notion of "For Each" at the conceptual/semantic level.
And that is what I see as the purpose of the "reference ID".

Furthermore, I don't subscribe to the notion that a reference ID is an "artificial identifier".
The underlying point here is that in normal conversation, it is the unstated context within which the conversation takes place that serves to "mostly" disambiguate the meaning of a noun in normal conversation.
However, in my earlier banking story, the differences in meaning of the term "transaction" was one of the root causes of the 10>1500 problem. This problem was not disambiguated by the context because "everybody knew" what was meant by a "transaction"

Whilst training as a flying instructor, I studied the principle of "Primacy"
In general terms, Primacy means that once a person has learned (= associated) a term with a meaning then it is really very very hard to get the person to "unglue" his or her perception.(It seems to be hard-coded into your perceptive firmware). This is just one of the many things that a competent ORM consultant needs to be able to detect and to handle.

Detecting is the easy part, the difficulty comes in finding a diplomatic way to get the "domain expert" to see things in a different way.

Regarding the BBB, I'll ask Terry, if he wants to comment on

icmd.org:
why the BBB do not use such kinds of examples?

Ken

Fri, Sep 17 2010 10:41 In reply to

icmd.org
Joined on Wed, Sep 1 2010
Posts 6

Re: In which step should artificial identities (IDs) be introduced if not mentioned by domain experts?

Reply Contact

Ken Evans:

I see the "problem" as something much deeper and more difficult to handle...

Yes, I understand the point.

Ken Evans:
So, in an ORM tool, we need a mechanism for expressing the notion of "For Each" at the conceptual/semantic level.
And that is what I see as the purpose of the "reference ID".

That's a very interesting and rationale point of view. I appreciate your explanation very much! But, although "a reference scheme" is not "artificial", the "unique identifier" (e.g. a long GUID) is "artificial" anyway. I think we're on the same track.

Ken Evans:

Regarding the BBB, I'll ask Terry, if he wants to comment on.

I appreciate if Terry can provide his insights on the BBB's examples according to the identifiers' topic.

Thanks.

Fri, Sep 17 2010 18:16 In reply to

Terry Halpin
Joined on Fri, Nov 23 2007
Maleny, Australia
Posts 154

Re: In which step should artificial identities (IDs) be introduced if not mentioned by domain experts?

Reply Contact

In the BBB, some of my introductory examples use very restricted domains in which simple names identify some kinds of objects (e.g. persons). In industrial modeling, the domains are typically larger, and these simple reference schemes don't work, so "artificial identifiers" such as employee numbers or product codes are used instead.

Why then did I use such simplistic examples to begin with? For two main pedagogic reasons:(1) to introduce modeling concepts in a friendly, easy way using very small concrete examples that readers can easily relate to; (2) to avoid boring the reader with with lots of artificial identifiers. However, even my very first example (Table 1.2) uses movie numbers to identify movies, and lists different movies with the same title. As you proceed through the book, you'll find that the vast majority of later examples do use more realistic identifiers, and Section 5.3 has a detailed discussion to help you choose appropriate reference schemes.

When working with clients, I use realistic identifiers at Step 1. In most cases, these are already available, even if they are "artificial". In rare cases, these are not available, and initially the only practical way to identify some kinds of objects is by ostension (i.e. by pointing at them). In this case, I get the clients to agree to introduce an artificial identification scheme, which they agree to use in communicating about the objects. These artificial identifiers are not hidden, but become part of human communication, so are then conceptual.

As a different issue entirely,when mapping to an implementation, you may choose to introduce internal identifiers that are not part of human communication (e.g. give all relational tables surrogate keys, or give all class instances object ids). This is an implementation, not a conceptual issue.

Please also note that there may be differences in the order in which you learn a procedure (like the CSDP), and the way in which you apply it in practice. For example, I apply the complete CSDP to each domain fragment as I discuss it with the client. I do not specify all the fact types for the whole model before thinking about what constraints to add.

Cheers

Terry

Sat, Sep 18 2010 1:05 In reply to

icmd.org
Joined on Wed, Sep 1 2010
Posts 6

Re: In which step should artificial identities (IDs) be introduced if not mentioned by domain experts?

Reply Contact

Thanks, Terry!
I'll rethink the issue after I get through reading all the steps of CSDP.
So, according to my current understanding, Terry, do you prefer making the clients stating "familiar examples" using the introduced artificial identification scheme?
Or, do you prefer let them stating the familiar examples in their own languages, and transform into elementary facts with the introduced artificial identification scheme, and let them check and agree with the transformed elementary facts?
Sorry for my dogmatic questioning. But I do think before starting a modeling project, I should make sure everything I'm not sure.

Sat, Sep 18 2010 21:01 In reply to

Terry Halpin
Joined on Fri, Nov 23 2007
Maleny, Australia
Posts 154

Re: In which step should artificial identities (IDs) be introduced if not mentioned by domain experts?

Reply Contact

I use realistic identification schemes with clients. If that requires using an "artificial identifier" that clients agree to use in communication, then I use that. In the vast majority of cases, such "artificial identifiers" are already in use by the client. For example, for the object types Employee and Country, the reference schemes are typically Employee(.nr) and Country(.code).

Cheers

Terry

Page 1 of 1 (12 items)

The ORM Foundation