We are now engaged in thread drift so if you want to continue discussion on any aspect, please start a new thread for each aspect that you want to discuss. In the meantime, here is my response to some of your points.
Although, philosophically, you could argue that the "personnel number" is a surrogate key -- it's just an externally allocated surrogate key. But it serves the same purpose and was probably introduced for the same reason by the "original" modelers
Well that's not the way I understand and use the term "surrogate key".What is a surrogate key?
Within the relational database paradigm, the term "surrogate key" refers to the use of a set of system generated unique characters (usually numbers) in a column in a database table with a view to providing each row in the table (tuple) with an identifier that is unique and so can be used as a reference for a row instead of any candidate natural key that may also be used to identify the row.Is a key surrogate or natural?
One way to establish whether a key is surrogate or natural is to ask: "Did this value exist before the database was designed?" For example, the HR departments of many companies issue unique "personnel numbers" to each new employee with a rule that numbers are never re-used. The personnel number is known to the employee and is often printed on employee's company identification badge. If a database is subsequently developed, then the personnel number qualifies as a "natural key" because database or no database, the personnel number always uniquely identifies an individual person.
On the other hand, a surrogate key is an artefact that exists only within the database and is only known to a few people such as a database administrator.Problems with surrogate keys such as GUID's
I think you're right though that a lot of OO approaches use surrogate keys as a simplifying assumption and maybe aren't as rigorous in their modeling. For this case, however, with a table like User/Person, I think the main reason is the stability of the key.
you start with a good conceptual model (i.e. one where the domain experts agree on a set of facts that accurately represent their domain) and then use a tool such a NORMA to generate a fully normalised logical model,Then
you can approach the need for surrogate keys by finding out where the natural keys cause problems and what the causes of the problems really are. In that case, the use of a surrogate key might be one of several alternative solutions.
However, I have seen many OO based approaches to "database design" that don't follow this route. Making arbitrary decisions on what an "Object" or "Entity" is in the absence of a formal definition of the universe of discourse seems to be inherently risky to me. Example
Consider this table. Each row is unique but each row refers to the same person.Surrogate Key Personnel Number Last Name
1000 12345 Jones
1001 12345 Jones
1002 12345 Jones
Now imagine that the surrogate key is used to enforce referential integrity with foreign keys in other tables. Suddenly you have a proliferation of what I can only think of calling "Unique duplication"
From what I have seen and read, some development paradigms make it quite easy to develop database schemas like this.A good natural key?
In a web service environment, for example, what would be a good natural key?
Well, a natural key relates to the universe of discourse, not to an implementation so my perhaps unsatisfactory answer is "it depends on the facts in the domain."
it seems to make sense to use information patterns for modeling. If I could pick up an ORM model -- with the imprimatur of world-class analysts -- and adopt it into my own project, why wouldn't I? Maybe more importantly, why wouldn't my boss require me to?
In the ruby-on-rails world, for example, there's a bunch of plugins for common use-cases. A plugin for users, roles, attachments, blogs, etc. Each framework builds on the other, reuses certain design ideas, etc. Wouldn't it be cool to have them all using the same fundamental information model regardless of framework or database technology?
This is not a new idea. For example companies such as SAP, have made a lot of money by getting organisations to "tailor" their organisational structure to fit the SAP technical architecture.
However, this approach takes a lot of investment to develop, manage and market. In addition, you have to handle the specifics of each domain: Either by adapting your model to their organisation or by adapting their organisation to your model - both of which lead to very costly projects.Where to start?
It just strikes me that perhaps focusing on ground-up influence of ORM may be as- or more effective than top-down approaches -- similar to how the strength of ORM lies with starting from the facts.
I agree with you. That's why I set up this website in the first place.
There are several organisations that are working to develop "standard data models". They use either UML Class models or an ER development method to define the models. Unfortunately, very few domain experts can "read" these models so validating the models is a really big challenge.