Semantics of subtyping value types

Last post Thu, Jun 2 2011 20:12 by Terry Halpin. 7 replies.

Page 1 of 1 (8 items)
	Sort Posts: Previous Next

Wed, Jun 1 2011 6:05

Robert Schmaal
Joined on Thu, Feb 26 2009
Posts 1

Semantics of subtyping value types

Reply Contact

ORM2 allows subtyping of value types but what are the semantics of this, is it subset semantics or coercion semantics? Hence, if I specify that a value type A is a subtype of value type B, does this constitutes that values of A are a "subset" of values of B, or that there exists a unique function to convert values of A into values of B?

Wed, Jun 1 2011 15:22 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: Semantics of subtyping value types

Reply Contact

If by "coercion semantics" you are referring to programming techniques such as forcing a number in a text data type to become a number in an integer data type, then no - ORM2 subtypes are not designed to do that.

ORM2 uses semantic subtypying which modelers usually use for one of two reasons:

1: To declare that one or more specific roles are played only by a given subtype.
e.g. You could design a model with a "Person" object type that had two subtypes: Male & Female.
With the Person object type you could say "Person was born on Date". (common to both subtypes)
However only the Female sybtype would be used in the fact "Female became pregnant on Date".

2: To encourage the re-use of model components
e.g. If your personnel database recorded facts about all employees (e.g. date of birth and home address)
You could make "Manager" a subtype of employee and assign all facts that are unique to managers to the Manager subtype.

If you want to know (a lot) more, Terry's Big Brown Book has 22 pages on subtyping in section 6.5.

Hope this helps

Ken

Wed, Jun 1 2011 20:49 In reply to

Re: Semantics of subtyping value types

Reply Contact

Robert,

Ken has given you a useful reply that focusses on Entity Type subtyping. Value Type subtyping is a bit different, as I'm sure you know, because of the implications over the range of values that are represented. The simple rule for any logical subtype is that all instances of a subtype must be able to play all the roles of the supertype. This implies subset semantics.

When you speak of coercion, you're speaking of the representation used by the underlying data type, not the conceptual value type. ORM diagrams are only concerned with the conceptual aspect, so coercion isn't relevant. However, ORM tools will provide some data type system, and may allow a subtype A to have a different data type than the supertype B. As long as the subset semantics apply, the type coercion between these two data types is feasible. But note that coercion is a representation (data type) issue, not a conceptual (value type) one. It's not at all clear to me that ORM has any single definitive position on data types, and it hasn't yet been discussed by the FBM working group in regard to standardisation.

What I do in the ActiveFacts metamodel which underlies the Constellation Query Language, is to allow abstract families of data types (such as Integer) to have concrete subtypes (e.g. 32-bit twos-complement Integer). Indeed, an abstract type such as String might also have abstract subtypes such as Unicode String, which in turn have concrete subtypes (UTF-8 string, UTF-16 string, etc). The only time the use of a concrete subtype is enforced is when mapping, since that requires physicality. So although there appears to be no discontinuity between data types and value types in ActiveFacts (in that a value type gets its representation by inheriting from a data type, rather than by being associated with a data type independently of subtyping, as in NORMA) in fact the discontinuity is there. It's in the mapping rules - if a mapper cannot choose a physical representation for a value type, that value type is treated as an abstract data type and the mapping will fail. The mapping might also introduce additional constraints, such as choosing to use a fixed-size representation for an Integer having no range constraint specified. The modeler must be aware of this behaviour to be able to make use of the mapping, but thats consistent with industry practice.

I hope that helps clear things up for you. It's not always obvious which things are data types and which are value types, and it's taken me a while to get my head around it. I'd appreciate Terry's comments on my approach as well. I know it has flaws, but I think the alternative approaches also have flaws.

Thu, Jun 2 2011 12:03 In reply to

Matthew Curland
Joined on Sat, Mar 8 2008
Posts 450

Re: Semantics of subtyping value types

Reply Contact

Hi Robert,

This isn't a direct response to either Ken's or Clifford's post, so I'll start a third branch on this thread.

NORMA currently allows a subtyping relationship between two value types, but does not do anything with this relationship apart from blocking the sub/super types from switching from ValueType to EntityType. This doesn't mean we haven't thought about the problem, just that we haven't done anything about it yet.

Saying anything coherent about the ValueType subtyping question first requires a discussion of the difference between a data type (string, number, etc) and other object types in your model. There are several breakdowns that I use to consider the differences:

Set approach:

A data type corresponds to the definition of a set values. A full population of the set is generally not required to define the set, and indeed is often not possible because the population can be countably or uncountably infinite. Not all data types are infinite (bool, enums, restricted ranges, etc), and some data types can be defined extensionally (bool, enum).
An object type always has an asserted population, or is derived from other object types with an asserted population [without loss of generality, I'll classify these derived populations as asserted for the rest of this post]. The population of an object type is always finite, is always stored, and is never implied.

Note that from a computational standpoint the finite/infinite breakdown is extremely important. A query against a potentially infinite set is not guaranteed to terminate. This is why arithmetic operations (operators over infinite data types) are always treated specially in any treatment of logical or relational computation.

Conceptual approach:

Values drawn from a data type (the number 10, the string "virus") have no conceptual meaning outside the definition of that data type and its associated operations. Attempting to use a data type in a fact statement involving object types that are unrelated to the data type definition creates a conceptually void statement: "Person has String" tells me absolutely nothing about Person. Is the string a name? an address? a description? There is no way to know because String has no meaning in any model describing a real business domain. Similarly, the string virus is also meaningless without associating it with a ValueType in a specific domain (software or biology).
An object type always has conceptual meaning in a domain, even if the data type is not known. For example, "Person has Name" has a meaning to a domain expert without even knowing how Person is identified (auto counter, SSN, name, etc) or whether the name is stored as Unicode or ANSI.

Applied to Subtyping:

In addition to these perspectives, we also need to reconcile the subset relationship between a value type and its data type. Clearly the (finite, asserted) set of all person names is a subset of the (infinite, implicitly populated) set of all strings. So, how does this compare to a subtype between entity types? Entity type subtypes have both a conceptual component (the set of all MalePerson entitites is a subset of all Person entities) and an extensional (population) component: each instance of MalePerson asserted to exist in populated ModelX must also be asserted to exist as a Person in ModelX.

For a populated model there is a subtle but important difference between the EntityType->EntityType subtype and the ValueType->DataType subtype. In the entity type case, the existence of a subtype instance instance means that that the supertype instance must also exist (existence meaning that it is recorded in the model). In the value type case, the existence of a value instance does not mean that a corresponding instance must be recorded in the model for the supertype instance. In fact, when applied to any physical mapping, recording the data type instances sounds ridiculous: there's a good reason I've never seen a database with an explicit string table that contains every string used across all columns in the database, or an integer table telling me all of the integers that are used somewhere in the model.

The conclusion here is that there are different subtyping semantics at the instance level. For entity type subtyping, there is a subtyping instance fact type (1-1 with the subtype mandatory) that is either explicitly populated (for separated subtypes) or implicitly populated (for collapsed subtypes) at the physical level. For a valuetype/datatype subtype, this subtyping instance fact type is not populated.

To bring this full circle and apply it to ValueType subtyping, the decision first needs to be made whether we group a ValueType more closely with an EntityType or a DataType. Given that a ValueType has a finite asserted population and has conceptual meaning in a domain model, it is most natural to group it with EntityType instead of DataType. In fact, with this classification the primary difference between an EntityType and a ValueType is that the ValueType is self-identifying whereas the EntityType requires a reference scheme.

Given this perspective, the NORMA plan moving forward is:

Introduce a formal notion of DataType to allow user-defined data types. A 'DataType is a subtype of DataType' relationship represents a restriction (equivalent to a value constraint) of the set in the supertype DataType.
Improve the modeling of ValueType has DataType to indicate 'ValueType draws values from DataType'. This will be functionally the same as the 'DataType is a subtype of DataType' relationship, namely a subtyping relationship with no instance-level fact type requiring a population of the supertype.
Apply the same population-bearing semantics to ValueType/ValueType subtyping as we do to EntityType/EntityType subtyping.

Honestly, I don't anticipate the ValueType/ValueType subtype to be that commonly used, but we need to conceptually distinguish between the two types of subtyping. The definition of an explicit data type (likely displayed with a dotted line instead of dashed) will allow user-defined types and type restrictions (letting you define a standard 'medium length string' in your model) and will also allow a value type to be displayed graphically as a subtype of a data type. This will remove any ambiguity between what is meant by value type subtyping, and allow modelers to see their data types without declaring a ValueType as a conceptually meaningless data type.

-Matt

Thu, Jun 2 2011 18:55 In reply to

Terry Halpin
Joined on Fri, Nov 23 2007
Maleny, Australia
Posts 154

Re: Semantics of subtyping value types

Reply Contact

Here's a simple, concrete example that will hopefully clarify the distinction between value types and data types in ORM, at least from my perspective.

Suppose we declare two value types CountryCode and Pronoun to store instances of pronouns and country codes respectively, and assign each of these a character string data type. Now populate CountryCode with an instance that uses the string "us", and populate Pronoun with an instance that uses the string "us". Now declare the unary fact type "Pronoun is plural", and populate it with the fact that the pronoun "us" is plural.

Does it make sense to now infer that the country code "us" is plural? No, I don't think so. In logic there is a principle called Substitutivity of Identicals (SI) that basically says that if x = y then anything you can say about x can also be said about y. Since the pronoun "us" is plural and the country code "us" is not, it follows that the pronoun "us" is not strictly identical to the country code "us". Nor is the character string 'u" plural. This imples that it is incorrect to treat Pronoun and CountryCode as subtypes of String, if we use subsitution semantics for subtyping (as we do for entity types).

Instead, we may assign a weaker semantics when comparing instance of value types that basically compares their corresponding data instances. In ORM, when you name a value type you are informally adding semantics on top of the pure data type from which instances may be drawn for the purposes of comparison. A value in ORM is a semantically typed constant, not just a constant.

Additionally, value types in ORM are restricted to finite sets, unlike most data types (at least conceptually) such as Integer or String.

Hope this helps

Terry

Thu, Jun 2 2011 19:02 In reply to

Terry Halpin
Joined on Fri, Nov 23 2007
Maleny, Australia
Posts 154

Re: Semantics of subtyping value types

Reply Contact

I just spotted a typo in my previous post. Please replace

the character string 'u"

the character string "us"

Cheers

Terry

Thu, Jun 2 2011 19:21 In reply to

Re: Semantics of subtyping value types

Reply Contact

Terry,

Thanks for the helpful reply, clarifying substitutability and comparability. It looks to me like that clears up the confusion we had over whether a join path can traverse a data type :]

To be fair, CQL doesn't use the word "subtype" except for entity type inheritance. For value types, I simply say "CountryCode is written as String(2)". Though I use the same expression for subtyping value types, I probably should change that to the same as entity types, namely "Age is a kind of TimeInterval" for example. Or at least allow both...

Have you ever come across a situation were it was fair and reasonable for a subtype of a value type to have a different data type? I don't think that should be allowed, and don't plan to support it.

Thu, Jun 2 2011 20:12 In reply to

Terry Halpin
Joined on Fri, Nov 23 2007
Maleny, Australia
Posts 154

Re: Semantics of subtyping value types

Reply Contact

Hi Clifford

Though rarely encountered, I think it's OK for a value type that is a subtype of another value type to have a "more restrictive data type", in the sense used in the following example, but not an incompatible data type.

Consider ISO2CountryCode as a subtype of ISOCountryCode, where the former permits only 2-character ISO country codes (e.g. "US") while the latter includes both ISO-2 country codes and ISO-3 country codes (e.g. "USA"). If you include the length facet as part of the meaning of data type, then the subtype's data type is "char2" while the supertype's datatype is "varchar3" with a minlength2 restriction. If you don't include the facet as part of the meaning of "data type", then they have the same data type anyway (e.g. string).

Cheers

Terry

Page 1 of 1 (8 items)

The ORM Foundation