Reference Modes explained

Reference modes are not described very deeply in Haplin and Morgan (2008) - there is an introduction to the different reference mode kinds (popular, general, and unit based) at page 85 and page 98, but the ORM syntax cheatsheet does not distinguish. I tried to unwrap the semantics of reference modes in a picture (source):

I also added a non-standard extension to implicitely specify a supertype similar to the syntax that implicitly defines a dimension supertype (unit-based reference modes). Please correct me if I am wrong or if the picture contains errors. As I understand reference modes, they just abbreviate binary 1-to-1 relationships that are used to uniquely identify an entity by a value.

Re: Reference Modes explained

Simple reference scheme patterns are short hand notations for what you show here in the first, second, and fourth graphic lines, with two exceptions:

The PersonName role is not explicitly marked as mandatory (it is implicitly mandatory because PersonName plays no mandatory non-existential roles).

The right uniqueness constraints are preferred (double lines).

(Tool-specific comment) NORMA records reference mode patterns in the object model, but the reference modes themselves are implied by the constraint patterns and value type names. The notion of 'reference mode' is not used in any mapping algorithms, verbalization, etc. If you manually enter the correct pattern, you will automatically have a reference mode value. The notion of reference mode collapsing belongs to the shape meta model, not the ORM core object model. Popular reference mode value type names always include the entity type name, whereas unit-based and general modes never include the name. Eventually we will add formal units to the meta model, which will further distinguish the unit-based and general reference mode patterns. (End of tool-specific comment)

I am not comfortable with the subtyping extensions you represent here. At the instance level, you're subtype relationships are indicating subset relationships between identifiers. This makes sense if you strip the conceptual meaning from the identifiers, but ORM is a conceptual modeling language, and subtyping represents an is a relationship. It makes conceptual sense to say 'a Dog is an Animal' or 'a MalePerson is a Person', but it does not make sense to say 'a Price is a Money'. If the 'is a' verbalization does not make sense, then you do not have a good candidate for an ORM subtype. We do not consider an entity to be a subtype of its identifier because, for example, 'a PersonName is a Person' is conceptual nonsense.

Unit-based reference schemes are admittedly confusing because the value has double meaning (the value, and the dimension). The idea behind units is that they can be automatically transformed based on either static (mass, distance, temperature, etc) or dynamic (monetary exchange rate) dimension conversion information. So, theoretically, you can automatically convert values that share units. For example, a model could record height in inches and weight in pounds, and still automatically produce an accurate BMI value given these inputs (BMI is kg/m2). Of course, this is non-trivial in practice due to round off errors and other concerns (recording inches in integers makes sense for the BMI problem, but inferring from this that meters should be rounded to integer units would be disastrous). I think the issue with your model is that you have the unit (Money) and the value ($) turned around. Price is recorded in dollars, which is a unit of money. If you want to do this subtyping (which is not a strong enough relationship to indicate the dimension conversions), you get 'Price is specified in DollarAmount, DollarAmount is a subtype of MoneyAmount'. This is a different meaning than is indicated by your diagram.

As for the PageURL->URL subtype, this breakdown makes more sense to me than Price->Money, but I'm not sure what this breakdown adds. A general reference scheme does not limit the value type be used as an identifier for more than one item. So, you can have an instance of Page(URL) and Site(URL) that share a URL identifier. Certainly you can consider PageURL and SiteURL as subtypes of URL with a disjunctive mandatory constraint across the subtypes. However, if you expect an exclusion across the subtypes as well, then you've moved beyond the meaning of the general reference mode pattern.

Re: Reference Modes explained

Hi Matthew!
Thanks for the detailed feedback! I changed the picture to show the preferred uniqueness constraints and the implied parts of the model. I think there is no standard syntax for implied model parts so I implemented them in grey -- better suggestions are welcome. I thought about using italics for implied types but there can also be implied model parts without text, so color seems better.

The statement "a Price is a Money" sounds strange but it's ok to say "a Price is a MoneyValue" (or MoneyAmount but it is also kgValue). I tried to catch your "Price is specified in DollarAmount, DollarAmount is a subtype of MoneyAmount" as implied meaning of the unit-based reference mode with "Money" dimension in the new version: Each MoneyValue can have exactely one $Value and it may have other values in other currencies.

The PageURL->URL subtype does not add much but you can talk about an "URL that references a Page" as "PageURL". I think that in natural language we regulary mix identifiers and entities so it should be easier to talk about identifiers when we mean identifiers.

Re: Reference Modes explained

The preferred identifiers are placed opposite the identified entity, so they go on the right role here.

I'm still not comfortable with the Price/MoneyValue/$Value relationships. First, the subtype between Price and MoneyValue seems backwards. Second, I don't think this should be a direct subtype at all. In order for price to be meaningful, you need to know the unit ($) and the associated dimension (Money, in this case). So, while it does make sense to say Price is a DollarValue/DollarValue is a MoneyValue, I don't think that putting price directly to the MoneyValue is the best thing to do because you only have partial information (the price is meaningless if you know the dimension but not the unit). If you take this to its logical conclusion, then you have 1-1 relationships between MoneyValue and all supported currencies. The identifier for MoneyValue gets very messy at that point as well. Try adding a parallel EUValue and specifying an identifier for MoneyValue to see this.

For Article/TextValue, what you're implying here is that an Article is identified by the contents of the article. Identifiers are always unique, so you're also implying existential uniqueness, meaning that no two articles could have the same text. Existential uniqueness is not unheard of (defining a team by the players on that team, for example), but it is rarely convenient (it is much easier to name the team, identify it by a single leader (caption, coach, etc)). The simplified identification process is natural, whereas constantly referring to a team by the player names is not.