Using data-type for rule modeling

First, I'm putting this in the nORMa forums, as we do not yet have a forum for general ORM questions or comments (and nORMa was used to create the examples).

The question: Is it legitimate to use an assignment of a data-type to implement a rule, while modeling an application domain in Object Role Modeling?

This issue came out of contemplating a mention of "current" limitations of ORM rule modeling in a paper or article by Dr. Halpin. I've lost track of the source, but his comment used the example rule: An Employee's Salary can never be less than their starting Salary - or words to that effect. As I can't recall the source, I don't know when it was written, and don't know to what extent his comment is in line with current ORM capabilities. I believe the topic comes under Dynamic Constraints, and State Transition.

The first figure below shows a typical way to model starting salary and salary adjustments. You should find it matches the terms and general semantic implication of a typical (if incomplete) business UofD.

To model the no salary decrease rule, an alternative conception of the actual workings in the application domain is applied. The term "Starting Salary" is commonly used and commonly understood; so a Starting Salary of $50,000 seems an appropriate value (if it matches the pay level of the job in that market). However, an alternative view would have a starting salary of zero, that has an initial increase of $50,000 on hiring. At least in some ways, the later is a more accurate assessment of what actually happens. Be that as it may, the second form allows the modeling of the salary can not decrease rule - as is shown in the second figure below.

The key feature of the second model fragment is that the Value Object Type "Amount" is given an unsigned integer data-type assignment. Therefore, all salary adjustments are non-negative; and so salary can never decrease.

There are other aspects of a comparison of the two model fragments that are worth considering; but for now, the main question: Is the use of data-type assignment a legitimate method to use in modeling application domain rules in ORM?

Re: Using data-type for rule modeling

That's an interesting way of going about it. I think an important factor in this would be the subtype definition - how do you determine what makes a SalaryAdjustment an InitialSalaryAdjustment, and what makes it a SubsequentSalaryAdjustment? Would you want to add a constraint showing that it must be one or the other, and not both?

If you choose the definition of 'A SalaryAdjustment with a date earlier than any other earliest Date is an InitialAdjustment", then that could have strange effects should you ever add a new SalaryAdjustment that has a date earlier than any recorded so far, or were to remove data in an instance of the the 'Employee has InitialAdjustemt' fact. (I hope that made sense.)

Re: Using data-type for rule modeling

Thanks for the input. When I modeled the two approaches, a whole slew of issues came to mind. Primarily, I was interested in what others thought of the general idea of using data-type assignment as a component in modeling a business rule. In an immediate sense, the use depicted does get the job done. However, it's not a general purpose technique, as are internal and external constraints, and primary reference modes, etc....

For one thing, data-type assignment is not obvious on the diagram (not covered by the ORM 2 symbol pallet). Adding a Model Note "The use of unsigned integer here is used to..." would provide documentation - but may be more like a sticker on a kludge. Also, once mapped and implemented as a RDBMS database, how secure is a data-type assignment in that process and environment? Bad enough that rule enforcement can get clobbered by de-normalization; would many DBAs have qualms about tweaking a data-type?

I don't know if controlling data-type assignment might become a part of ORM rule modeling; but I think other things would need to be added or changed before that can happen.

I could have made the context clearer, if I showed only the elements near the top of the "Alternative Salary" image (Employee(.id) receives SalaryAdjustment of Amount()). That way, you could see it was all about the data-type for Amount. The rest of the elements help frame the rule, and make it easier to compare to the "Typical" depiction.

Regarding your comments, there is an external constraint (XOR) between the two forms of SalaryAdjustment. The InitialSalaryAdjustment (for Employee(.id) only occurs on "Hiring", while the other form can occur on any date (not that of the hiring of that Employee(.id)). There should be no occasion where an employee's salary is adjusted before (the date of), their being hired.

In the Aternative Salary model, all functions in the UofD that use "Starting Salary" are taken over by InitialSalaryAdjustment. A way to indicate the one-to-one correspondence of the terms would be helpful. A model note would due; but an extension to ORM may be a better answer.

There are other issues (where the alternate approach shows deficiencies in the Typical approach); but I wanted to address the data-type issue first. Thanks again for your comments - it always helps to see it through another person's eyes.

Re: Using data-type for rule modeling

I'm surprised there hasn't been more discussion on your approach. I'll give my tuppence worth since you asked for others' views...

The rule that Amount must be a positive value is not read from the diagram model, and to me this is a shortcoming in the model - I too do not like the idea of leaving the integrity of a rule exposed to a database designer who may be remote from the modeling. Where did the rule that Amount must be positive come from? For argument sake, let's say it is the Employer's policy. Therefore, the diagram requires an Employer that has a Policy that, as of Date, SalaryAdjustment must be greater than or equal to a minimum Amount. (After all, the Employer may want to alter the Policy at a future date.) And what happens if in the future, the Employer decides that negative Amounts are allowed (say because of staying solvent rather than go bankrupt). Suddenly, that unsigned integer is a problem!

Re: Using data-type for rule modeling

Good observation. You are right that the role of data types needs more exposure. I agree with you that data integrity should not be left to a technical person - be it a database designer, an OO programmer or any of the other folks who seem to be able to dabble with data without any management oversight. However it seems to me that your observation exposes the "tip of an iceberg" of a widespread failure to manage data at the organizational level.

Regarding data types, I don't see how we can escape from the notion of using data types to constrain data values - because that is what they are for. As see it, the data type puts a fundamental constraint on the "permissible values" that may be stored in a data element. In this context, the function of an ORM constraint is to put an additional constraint on the range of permissible values that can be stored in a data element. For example: Data type = "Text - 5 characters" with permissible values of Peter, Brian, Terry, and Ken."

Using the English alphabet, a text data type with 5 characters means "This data element can have up to 11881376 permissible values. (26*26*26*26*26). By adding the ORM value constraint, you are saying that only four of the possible 11881376 values are allowed.

For a domain expert, the overt ORM constraint seems sufficient because it logically includes the data type restriction. However, this guide does not apply across data types as the following examples show.

My point is that there is an urgent need for data management at the organizational level and ORM offers the best way to meet this need.Ken

Here are two examples of the consequence of failure to manage data at the organizational level: Mars Climate Orbiter and Ariane 5.

The spacecraft was lost because it missed orbit by 60Km and is thought to have burned up in the Martian atmosphere. A lack of data management was the cause of the loss. According to CNN, The Lockheed Martin team who built the spacecraft used lbs as the unit of thrust whereas NASA used the metric measure of Newtons (1 Newton = 4.5 lbs). Thus, the spacecraft careered across millions of miles of space with the guidance team scratching their heads and wondering why the spacecraft was not responding correctly to their course adjustment commands. More at

Some of the reports cite a "software error" . I don't agree with this characterisation. In my opinion the problem was a management failure at NASA in that they failed to manage data at the organizational level. The "organizational data management problem" is widespread throughout industry and the public sector.

A Google search reveals many discussions that show that primary cause of the crash was the unexpected conversion of a 64 bit floating point number into a 16 bit integer. This triggered a chain of events that lead to a self-destruct command about 39 seconds after launch. The ESA accident report talks a lot about "detailed actions" but from reading the report, ESA managers don't seem to have any grasp of the fact that they, like NASA, have an "organizational data management problem".

ORM can be used to help to manage data at an organizational level and thus help to avoid the huge but hidden costs of dysfunctional data management.

Re: Using data-type for rule modeling

In your post, you made a statement that got me thinking….The following reflects some of my thoughts and understanding of ORM and the issue raised in this thread.

In your post, you said: Regarding data types, I don't see how we can escape from the notion of using data types to constrain data values –because that is what they are for. As see it, the data type puts a fundamental constraint on the "permissible values" that may be stored in a data element. In this context, the function of an ORM constraint is to put an additional constraint on the range of permissible values that can be stored in a data element. For example: Data type = "Text - 5 characters" with permissible values of Peter, Brian, Terry, and Ken."

The way I view ORM, the initial ORM model is a Conceptual Model, and is pretty much independent of data types.Example 1, if I model a ValueTypeas having permissible values “seven”, “pi”,and ”e”, is it a number or a string data type?It’s when I design the implementation that I need to make that decision.Example 2, if I have a ValueType to represent a byte count, is just using an unsigned integer in my model adequate?No, because a technician on a 32 bit system may go for a 32 bit solution, which could fail when a 5GB file is loaded.

So, in your example ORM constraint, “values of Peter, Brian, Terry, and Ken”, is a statement of THE Requirement (Business Rule) at the conceptual level (the “problem space”), and choosing the data type, "Text - 5 characters", is a design decision in the “solution space”.It’s only when we design the implementation (the “solution space”) that it becomes a constraint, because the “solution space” data type is an imperfect approximation (allowing invalid values) to the Requirement.Even if we had a data type (say, “My4Names”) that could only hold the permissible values listed in the ORM constraint, then we would select “My4Names” and not “Text – 5 characters”, but the model should still display the requirement (because ideally the conceptual model is independent of the implementation).

That is, items in the “solution space” (implementation) should be mappable to items in the “problem space” (concept model).

If we look at Brian’s original question:“Is the use of data-type assignment a legitimate method to use in modeling application domain rules in ORM?”,then I would say No; because choosing the data type is an implementation design decision of the unstated requirement.It appears that, not only is Brian’s example inferring the business rule, “SalaryAdjustment must be always positive”, it appears to be inferring the compound business rule, “SalaryAdjustment must be always positive AND SalaryAdjustment must be in whole units”.Using a Model Note to display such rules at least gets them stated in black and white, even if it may not be the best way to model them.

[And now my head hurts so I’ll post this before I do too much more damage to my grey matter.]

Re: Using data-type for rule modeling

You have highlighted what I feel to be a very important issue about the role of data types.

PeterC:

The way I view ORM, the initial ORM model is a Conceptual Model, and is pretty much independent of data types.

Well as you have already commented - I don't see it that way.

On page 219 of the Big Brown Book (BBB), Terry says " Although choosing syntactic data types is not the most conceptual or exciting aspect of modeling, it's something you need to do before you implement the model. Otherwise, you'll get whatever default data type the system provides, and this won't always be what you want."

So now, you may be aking yourself "What is a syntactic data type?"

PeterC:

if I model a ValueTypeas having permissible values “seven”, “pi”,and ”e”, is it a number or a string data type?

Here is some text that uses your terms:

One day, Seven recorded in the ships log the fact that she had seen the captain eating pi. The next day she realised her mistake in that she had left the "e" off the word pi and so corrected the log to record the fact that she had seen the captain eating pie.

This will make perfect sense to people who are familiar with the Star Trek Voyager series.

There are several points that I want to bring out:The first is that on their own, words do not have any meaning. Words only have meaning within a context.

On page 29 of the BBB, Terry uses the term "conceptual database" and in the surrounding text mentions that for effective communication between two people, there is a need for a common language and that each person must assign the same meaning to the terms being used. Later, starting on page 212, Terry discusses set theory and how ORM makes use of it.

An elementary fact in ORM is made up of objects playing roles. In ORM, the object names are labels for sets and as you know, sets are defined by their members.

Thus, the term "seven" could mean "The integer that is between six and eight." or it could mean "The abbreviated name for the character Seven of Nine as depicted in the TV series Star Trek Voyager. The only way that you can know which of these meanings to use, is to look at the semantic context of the utterance.

So, when you define an elementary fact, you need to be specific about the set of things to which each object refers.

This is why I see the data type as being an important part of the semantics of a conceptual object-role model.

Re: Using data-type for rule modeling

May I jump in? Since you're discussing such an important issue, I hope you won't mind if I critique your latest argument just a little.

Pretty much everything you said is true; the problem is that it doesn't support (logically) the point you're trying to prove. You make a persuasive case that we need to be careful in naming our object types -- and in particular, the values (character sequences) we use to identify things, might not identify them very readily (to the reader of our model or of our conceptual database) unless we name helpfully the value types we're using.

All well and good. But how does that imply we need syntactic data types in our conceptual modeling (rather than just in our implementing)? If I'm missing some logical connection here, please help me by pointing it out. Otherwise, I'll consider your (apparent) argument non-compelling.

Re: Using data-type for rule modeling

Thanks for your comments. The first point I want to make is that I'm trying to build a bridge between the ideas in the dozen or so books on language in my library and ORM. Clearly I need to do a better job!

The book I reviewed today before responding to Peter is: Language in Thought and Action Fifth Edition, Hayakawa, S.I. and Hayakawa, A.R.

The authors make some very powerful (and for me compelling) observations about the way humans make unconscious assumptions about language and its relationship with reality. I don't agree with everything they say but this 196 page book contains lots of stuff that seems to me to be very valuable in understanding semantics. The authors expose what they call "linguistic chaos" and (for example) discuss the widespread confusion between "the symbol" and "the symbolised".

They use a character called "Mits" (the Man In The Street) to make general observations such as:

Page 10: "LIke most people, he takes words a much for granted as the air that he breathes, and he gives them about as much thought." Page 11: One reason for Mits's failure to get any further in thinking about language is the belief that words are not really important: what is important is the "ideas" they stand for. But what is an idea if it is not a verbalization of a cerebral itch."

One of the "big points" they make is from semiotics "the signifier is not the signified". Or in their more accessible assertion: "The map is not the territory." Or as Rene Magritte put it "Ceci n'est pas une pipe"

Andy Carver:

You make a persuasive case that we need to be careful in naming our object types -- and in particular, the values (character sequences) we use to identify things, might not identify them very readily (to the reader of our model or of our conceptual database) unless we name helpfully the value types we're using.

Well that's not quite what I have in mind. I don't see "naming" as the big issue (not to say that its not important - but naming is important for reasons other than the general point I want to make about "object types" in ORM.)

As I see it :

An ORM object type is a container for a set of values.The name of the object type is arbitrary. i.e. the "symbol set" that is used as the name of an object type is not related to the members of the set for which it acts as a name.The members that form a set do so because they have one or more properties in common.

Now let's take Peter's example of "Seven", "pi" and "e". Peter asks "if I model a ValueTypeas having permissible values “seven”, “pi”,and ”e”, is it a number or a string data type?"

As I see it the only logical answer I can give to this question is that these three symbol sets will only fit into a text datatype of 5 characters.Because:

1: The "numbers" represented by the three symbol sets are not the same as the symbol sets themselves.(the map is not the territory)2: From a mathematical perspective "the number seven" is an integer. The "number" pi is an irrational number and the mathematical concept "e" is a unique real number.

From these two points the only conclusion I can draw is that if "seven" "pi" and "e" are members of a set, then they can only be members of the set of text strings with five characters or less.

Now, I have been able to draw this conclusion by looking at the three values in terms of "semantic data types".

I rest my case.

NOTE: Of course this is a "narrow" interpretation based only on the evidence available in Peter's question. Since we don't know the context in which these symbol sets are to be used, it remains the case that my "Seven of Nine" explanation could also be true.

Re: Using data-type for rule modeling

You said: “… on their own, words do not have any meaning. Words only have meaning within a context.”I could not agree with you more.

What I wanted to achieve in the first example (permissible values “seven”, “pi”, and “e”) was to demonstrate that <context in a conceptual model> and <data types as available in NORMA> do not necessarily match.In this example, string values (as symbols) could be, and sometimes should be, used to represent numeric values.I deliberately did not qualify the question further as I wanted to stimulate thought.In fact I had a numeric context in mind as I did not see much point in arguing over a string context.Given that the intended context is numeric, in my conceptual model I would only need to model the ValueType as “numeric”; but when it came to implementation I would need to consider the scale and precision I required (just as I needed to in the second example: ValueType to represent a byte count).

When I stated that I considered Conceptual Models as being pretty much independent of data types, I wasn’t saying that they were totally independent, but that the data type at the concept level should be independent of the target system (and thus usually much simpler – examples are string, numeric, integer), even if there is a one to one mapping between the two.When I used to use awk/nawk/tawk, my variables either contained strings (including the empty string) or numbers – no precision, types, or lengths were specified, except during input/output.

And yes, when I am making a conceptual model, I do need to give guidance or instructions on how ValueTypes can or need to be implemented, but I consider this information as implementation design decisions, and not part of the concept proper.For example, if my concept model says that Object X is identified by ID, and that ID’s sole purpose is to uniquely identify each instance of X, then conceptually I’m not interested in what ID is, it just has to be a unique value; but I am very interested in how it’s implemented.

Expanding further on Brian’s original question:“Is the use of data-type assignment a legitimate method to use in modeling application domain rules in ORM?”,the example rule that Brian was trying to model is: An Employee's Salary can never be less than their starting Salary.In my mind, if I were to model this rule as a data type, I would select “positive number”, not “unsigned integer”, because the latter also enforces a second rule that was not stated (namely, SalaryAdjustment must be in whole units).Only problem is, “positive number” is not an option in NORMA.So it would have to be “number” with a separate rule/constraint that said that this number must be greater than zero.And if we are going to do that, we may as well consider that zero is a special case that may change with time.

There appears to be a (thankfully very small) difference of opinion between the current ORM/NORMA philosophy and my views/understanding on data types at the conceptual model level, and this may change over time as I become older and wiser.For now, I am constrained by what tools are available to me, so I’ll just have to “get over it”L.(Mind you, I’m not complaining, I think ORM/NORMA is brilliant.) If you feel that there is more discussion on data types then perhaps it should be in a new thread as this one is starting to go off-topic?

Re: Using data-type for rule modeling

the data type at the concept level should be independent of the target system

This is a nice idea. It is implemented in VEA in the form of "Portable" data types. But I think that there may be hidden problems with this approach...

PeterC:

even if there is a one to one mapping between the two.

Now this is one of the problems. I don't see how there can be one-to-one mapping between something that is potentially a one-to-many mapping.For example, in the case of DDL, at some stage you have to choose a driver for a specific database. Different databases have different data type sets.(e.g. DB2/ Oracle/SQL Server 2000/SQL Server2005/SQL Server 2008)

PeterC:

For example, if my concept model says that Object X is identified by ID, and that ID’s sole purpose is to uniquely identify each instance of X, then conceptually I’m not interested in what ID is, it just has to be a unique value; but I am very interested in how it’s implemented.

Very interesting observation.

To paraphrase Shakespeare, "To OID or Not to OID? That is the question."In the Relational Model paradigm, data is identified by its content. The concept of the OID is nowhere to be found.You identify a "Unique instance of X" by saying that "this row in this table is unique". This maps back to the "fact-oriented" nature of ORM. In ORM, instances of fact types are represented by rows in tables. No OID's required!

In his informative blog, Jeff Attwood makes the following observation: (Note in this text the term "ORM" means "Object-Relational Mapping")

"Personally, I think the only workable solution to the ORM problem is to pick one or the other: either abandon relational databases, or abandon objects. If you take the O or the R out of the equation, you no longer have a mapping problem."http://www.codinghorror.com/blog/archives/000621.html

My own view is a bit different. I think that OO is very useful, however like all things it has its place. For example, it seem to me that if you are going to develop a transaction based system, then it's best to proceed as follows:1: Define an object-role model.2: Generate a relational schema & database.3: Write code to interact with the database.

Now to return to data types and conceptual models:The way you have expressed your perception of your hypothetical application domain about (permissible values “seven”, “pi”, and “e”) is to talk as though the value type come first. This is not the case with ORM because the basis of an object-role model is the elementary factThus, the objects that have data types only exist within the context of an elementary fact. This is in contrast to what I call the "lexicographer's approach" that sees some kind of "dictionary of terms" as an independent artefact.

To be sure, when you create an object-role model, you end up with a "dictionary of ORM objects" and very useful it is too.However, this dictionary is derived from the set of facts, not the other way around.

So, in your seven, pi and e example, these "things" would exist in the object-role model because they participate in an elementary fact.And therefore, "Seven wrote in the ShipsLog" on StarDate 5401.2", is just as correct as "On StarDate 5401.2, the ShipsLog contained seven entries."The difference in the meaning of the term "seven" becomes clear by examining each fact instance.

And I don't understand your claim that a postive number is not an option in NORMA.You just define the object as numeric and define its value range in a way that says that it cannot be less than zero.

Re: Using data-type for rule modeling

Hi Ken,Sorry if this is now off-topic, but I’ll try to clarify statements that you’ve found confusing because if you have trouble with them, then there may be many other readers also scratching their heads.I said: “even if there is a one to one mapping between the two.” andyou said: “I don't see how there can be one-to-one mapping between something that is potentially a one-to-many mapping.”I don’t know whether there is, or even could be, a one-to-one mapping.What I was saying is that even IF there was a conceptual data type that had the same name and value constraints as a logical or physical data type, they are still two different things and not to be confused.I’m afraid I got lost in what you are saying in all but the last paragraph, starting from “Now to return to data types and conceptual models”.I think that what you are saying is that: a)by asking a question on what certain words mean (“seven”, “pi”, etc) without providing the context (elementary fact), I’m putting the cart before the horse – and this was a furphy because that is not how ORM works; and/orb) an elementary fact sets the context for a ValueType, and therefore a conceptual data type is not required?

And lastly, you said: “And I don't understand your claim that a postive number is not an option in NORMA.You just define the object as numeric and define its value range in a way that says that it cannot be less than zero.”I was talking about a conceptual data type (come to think of it there are no logical or physical ones either?) that says “I am a positive number” in the same way that there is an “unsigned integer” that says “I am a positive integer”.Your second sentence is a paraphrase of my statement “So it would have to be “number” with a separate rule/constraint that said that this number must be greater than zero.”I am not sure what caused the confusion, except perhaps using the word “separate” instead of the word “additional” as you have used in an earlier post.

At this point I'll only continue if it's on topic, but I'm open to being enlightened in an appropriate thread.