Hi Mitchell,
Ken Evans has forwarded your discussion about the FCO-IM Lexicalizing question to me. I’m happy to reply.
Please see my points below (in more or less random order, responding to your texts).
For the sake of simplicity let’s assume there is only one identifier (one candidate key, which is thus the primary key).
1. I think the issue you raise is best addressed while distinguishing between the conceptual level (that of an FCO-IM, or ORM, information model), the logical level (that of a relational database schema), and the physical level (that of an implementation in a concrete DBMS.
2. On the conceptual level, in FCO-IM and also in ORM, every object type must have an identifier (unique name). This identifier can consist of several components. On the logical level, this translates into a primary key with more than one column. There is absolutely no problem with that on either of these levels, and I agree completely with Ken’s comments (04-07-2010).
3. The physical level is where performance issues and other technical matters ‘under the hood’ are addressed. These performance issues are presently only of concern for very large database systems (in terms of records per table; datawarehousing with large ‘fact tables’ for example). In such situations, it may be advisable to replace a compound key with a single dummy key. That has drawbacks too, however: a) the alternate key is still there, and must be checked and maintained by the system and b) it changes the facts for the users (suddenly they have to use a number they never needed before). Messing with the way users communicate about their UoD should not be done lightly.
4. Perhaps you do not want to change the user communication, but just intend to implement the dummy key and hide it from all users by replacing it with the ‘real’ one in all interfaces and reports. That would require a lot of programming work, but it can be done. Then you would not need to change the conceptual level or the logical level, but only the physical level (and have a separate physical database schema, augmented with the full set of routines for maintaining the alternate keys and hiding the dummy key). Note that I don’t want to change the logical level for this purpose (but others may disagree).
5. In practice, the benefits of introducing dummy keys are gone for all but the largest systems: even in the big administrative system of HAN University, performance may be worse after introducing them (yes, we measured that). I discourage students strongly from succumbing to this ‘number disease’ rashly.
6. Fortunately, most RDBMSs can handle compound keys, and FK references to compound PKs, automatically nowadays.
7. Is your feeling of redundancy “storing the same information twice” caused by the fact, that on the logical level a child table has foreign key columns which are copies of the primary key columns of the parent table? If so, I agree, and I think it is a shortcoming of the relational model. (An ER-model can be seen as a relational database schema without this redundant copying of attributes. ) However, replacing the compound key with a single-column key doesn’t help here: both tables will contain the same column still.
8. But this is a form of redundancy on the metadata level, not on the data level. No complete fact is stored more than once: both FCO-IM and ORM deal with complete elementary facts, whereas the relational model deals with attributes, which are just parts of complete facts. No complete fact is stored more than once.
9. A direct answer to your question of 04-06-2010 “Why isn’t that [introducing StudentID] done/proposed?” is then: It isn’t necessary and it would be harmful to the communication of the users.
10. Now for the passage in chapter 2.9 (p.56). The book was written in 1995, when the performance issue was more important than it is now. So we decided to include a section to show the conceptual changes involved, if we would introduce a dummy key and actually force all the users to change their communication from ‘student Peter Johnson’ to ‘student S1’. Since ‘student S1’ has now become the standard identifier, the IGD must change accordingly. (Of course, if you aim to do what I described in point 4 above, no change in IGD is necessary.)
11. Finally for Ken: I’m sorry if our English is unclear, so I’ll rephrase the start of the first sentence quoted from our book: “It might be that, although there actually is a proper identifier, the analyst and/or the domain experts consider this key to be very impractical …”. I agree completely with your remarks.
Kind regards, Jan Pieter Zwart