Hi Robert,
This isn't a direct response to either Ken's or Clifford's post, so I'll start a third branch on this thread.
NORMA currently allows a subtyping relationship between two value types, but does not do anything with this relationship apart from blocking the sub/super types from switching from ValueType to EntityType. This doesn't mean we haven't thought about the problem, just that we haven't done anything about it yet.
Saying anything coherent about the ValueType subtyping question first requires a discussion of the difference between a data type (string, number, etc) and other object types in your model. There are several breakdowns that I use to consider the differences:
Set approach:
-
A data type corresponds to the definition of a set values. A full population of the set is generally not required to define the set, and indeed is often not possible because the population can be countably or uncountably infinite. Not all data types are infinite (bool, enums, restricted ranges, etc), and some data types can be defined extensionally (bool, enum).
-
An object type always has an asserted population, or is derived from other object types with an asserted population [without loss of generality, I'll classify these derived populations as asserted for the rest of this post]. The population of an object type is always finite, is always stored, and is never implied.
Note that from a computational standpoint the finite/infinite breakdown is extremely important. A query against a potentially infinite set is not guaranteed to terminate. This is why arithmetic operations (operators over infinite data types) are always treated specially in any treatment of logical or relational computation.
Conceptual approach:
-
Values drawn from a data type (the number 10, the string "virus") have no conceptual meaning outside the definition of that data type and its associated operations. Attempting to use a data type in a fact statement involving object types that are unrelated to the data type definition creates a conceptually void statement: "Person has String" tells me absolutely nothing about Person. Is the string a name? an address? a description? There is no way to know because String has no meaning in any model describing a real business domain. Similarly, the string virus is also meaningless without associating it with a ValueType in a specific domain (software or biology).
-
An object type always has conceptual meaning in a domain, even if the data type is not known. For example, "Person has Name" has a meaning to a domain expert without even knowing how Person is identified (auto counter, SSN, name, etc) or whether the name is stored as Unicode or ANSI.
Applied to Subtyping:
In addition to these perspectives, we also need to reconcile the subset relationship between a value type and its data type. Clearly the (finite, asserted) set of all person names is a subset of the (infinite, implicitly populated) set of all strings. So, how does this compare to a subtype between entity types? Entity type subtypes have both a conceptual component (the set of all MalePerson entitites is a subset of all Person entities) and an extensional (population) component: each instance of MalePerson asserted to exist in populated ModelX must also be asserted to exist as a Person in ModelX.
For a populated model there is a subtle but important difference between the EntityType->EntityType subtype and the ValueType->DataType subtype. In the entity type case, the existence of a subtype instance instance means that that the supertype instance must also exist (existence meaning that it is recorded in the model). In the value type case, the existence of a value instance does not mean that a corresponding instance must be recorded in the model for the supertype instance. In fact, when applied to any physical mapping, recording the data type instances sounds ridiculous: there's a good reason I've never seen a database with an explicit string table that contains every string used across all columns in the database, or an integer table telling me all of the integers that are used somewhere in the model.
The conclusion here is that there are different subtyping semantics at the instance level. For entity type subtyping, there is a subtyping instance fact type (1-1 with the subtype mandatory) that is either explicitly populated (for separated subtypes) or implicitly populated (for collapsed subtypes) at the physical level. For a valuetype/datatype subtype, this subtyping instance fact type is not populated.
To bring this full circle and apply it to ValueType subtyping, the decision first needs to be made whether we group a ValueType more closely with an EntityType or a DataType. Given that a ValueType has a finite asserted population and has conceptual meaning in a domain model, it is most natural to group it with EntityType instead of DataType. In fact, with this classification the primary difference between an EntityType and a ValueType is that the ValueType is self-identifying whereas the EntityType requires a reference scheme.
Given this perspective, the NORMA plan moving forward is:
-
Introduce a formal notion of DataType to allow user-defined data types. A 'DataType is a subtype of DataType' relationship represents a restriction (equivalent to a value constraint) of the set in the supertype DataType.
-
Improve the modeling of ValueType has DataType to indicate 'ValueType draws values from DataType'. This will be functionally the same as the 'DataType is a subtype of DataType' relationship, namely a subtyping relationship with no instance-level fact type requiring a population of the supertype.
-
Apply the same population-bearing semantics to ValueType/ValueType subtyping as we do to EntityType/EntityType subtyping.
Honestly, I don't anticipate the ValueType/ValueType subtype to be that commonly used, but we need to conceptually distinguish between the two types of subtyping. The definition of an explicit data type (likely displayed with a dotted line instead of dashed) will allow user-defined types and type restrictions (letting you define a standard 'medium length string' in your model) and will also allow a value type to be displayed graphically as a subtype of a data type. This will remove any ambiguity between what is meant by value type subtyping, and allow modelers to see their data types without declaring a ValueType as a conceptually meaningless data type.
-Matt