The Seven Information Smells of Domain Modelling

Domain modelling (see sidebar) is a powerful technique that many IT professionals have in their toolkit. Unfortunately a couple of issues with domain modelling have caused it to fall out of favour over the past few years, especially in Agile circles. Two real problems with the approach are that it takes too long, and that it is prone to “analysis paralysis” which results in “spinning wheels”. We present an approach to domain modelling that addresses these issues.

We discuss signals in your domain model that tell you there are more questions to ask. We call these signals “information smells”, and they tell us we may not have a complete understanding of the information our domain cares about. The smell could mean that we are missing information from our domain model or that we included incorrect information on the domain model. Focusing on the information smells leads us to the questions we need to ask which is a very fast process. When all of the information smells are gone or we decide the remaining ones are acceptable, we stop, which avoids analysis paralysis.

The process starts with the output from the system that delivers value to the user. We do not explain how to find value in this article. ( We will cover this in a later InfoQ article called “An Introduction to Domain Modeling”. ) We then address each of the information smells in the model based on the output.

To demonstrate the information smells in this article, we use a fictitious example inspired by several real life cases. We have an HR director who is looking to understand what various developers are being paid so that they can avoid lawsuits resulting from pay inequities between different demographic groups.

During the discussion that ensued when the team was seeking to understand the director’s request, the following sketch emerged:

The best tools to use for domain modeling are pen and paper or sharpie and index card, or dry erase marker and whiteboard because it places the emphasis on the information being exchanged, not trying to make the examples or models “look pretty”. That being said, we created the models using a diagramming tool so that you did not get lost due to our poor handwriting. As a result, here is a cleaner version of the example:

There are some important things to remember about information smells and what they tell us about our domain model.

Information smells do not indicate there is an issue.

Information smells are a strong indicator that there might be an issue.

Information smells are not as strong as rules which are always correct.

The questions that the information smells reveal should always be expressed in the form “Please give me an example of...” rather than “Tell me how...”. We are looking for examples to explore the details of the domain rather than generalisations about the domain that hide the details.

So without further ado, here are the main information smells. If you find that you can pick up another scent, let us know so we can pass it along.

An item is in the output that is not in the model.

An item is in the model that is not in the output.

Two pieces of information in one place.

An entity is not related to any other entity.

One to one relationships.

Many to many relationships.

Undefined functions.

Following is a more detailed description of each information smell.

#1 – An item is on the output that is not in the domain model.

All the items on the output need to be in the domain model. The output is simply a display of the data in the model. Every piece of information displayed on output needs to be an attribute or method on the model. In the example above, department, average salary, role, salary, sex and race are missing from the domain model. To mitigate this smell, add them as attributes or methods. If the appropriate entity is missing to put them on, add the entity.

Information smell #2 – items on model, that are not on the output

Items on the model but not on the output are an example of “push” in the analysis process. The analyst thinks they need a value even though they do not. They are pushing values into the model. This is dangerous as you may end up doing extra development to add and maintain this value. It is a smell though and it was probably added because the Analyst thought it was useful. To resolve the smell, they should ask the user if they need the information. Note that this is a breakdown in process on the part of the analyst as they should record questions to the users in issue log and not in the model. This may occur because another project requests some information and it is being “slipped” into this requirement. This additional requirement should be just that, and treated as a separate piece of work.

In Structured Systems Analysis and Design Method (SSADM), this was referred to as an “Information Sink”.

Information smell #3 – two bits of information in one place (1NF)

Keeping two bits of information in one place is messy. The name “John Smith” could be stored as a first name and a second name. Its a smell though. In some cases, storing a name in one place is appropriate whereas in other cases it is not. The key question would be whether you want to analyse or process the two data items independently or not. In normalisation, this is known as a violation of “First Normal Form” or 1NF. Although 1NF is really a design rule, it can lead to the discovery that two pieces of diverse data are referred to as one. Violations of first normal form are identified by looking at real data as the data is normally referred to by a name. Another example is Race which contains “Jedi” which is a belief/religion and “IC1” which is an ethnic classification used by police in the UK.

Information smell #4 – no relationship

All business objects in the system should be connected. When you cannot identify a relationship between two business objects, you have a great question to ask the user. “What is the relationship between these two things? Is it a direct relationship? Or is it via something else?”

This is a very powerful smell. From experience it often leads to the most missing knowledge. In enterprise systems it is often the organisational structure that is missing.

Information smell #5 – one to one relationship. are they the same thing?

Whenever you encounter a one to one relationship, there are normally two possible explanations. First, the business person has used more than one term for the same thing and the two business objects should actually be one object. Second, the one-to-one relationship should actually be a one-to-many relationship but you do not know why. For example, a car and a chauffeur may be one-to-one but when you dig in you discover that only one chauffeur may drive one car at a time. Many chauffeurs may drive the same car at different times. This missing information was the temporal nature of the relationship.

Information smell #6 – many to many ( missing information )

Many to many relationships occasionally represent a valid relationship. Most of the time they indicate a missing “link” business object. In relational software design, many to many relationships are replaced with a link entity. This link entity is often a business object that has information about the relationship itself. In the example above, an employee may have different job titles at different times (temporal) or they may spend a proportion of their time in more than one role. Once again, the smell helps us track down potentially missing information.

Information smell #7 – undefined functions ( missing information )

Every method in your model should be defined. Everything referred to in the method should be on the business model. Take the example getAge

getAge calculate the age in years.

getAge = (today() – Employee.date of birth) / 365

The date of birth and a function to give the current date were missing. These need to be added to the business model.

To summarise, the process is:

Identify the output that will deliver value to the user.

If you do not already have a domain model that will support the output, create one.

Check for all of the smells until there are none in the model.

Stop.

This approach should be orders of magnitude faster and more focused than traditional domain modelling. You will to amaze your friends by applying domain modelling and never get caught “spinning your wheels”.

Sidebar - Domain Model

A domain model is a simplification of some aspect of an organisation, either its products, operations or market. The domain model is specific to an organisation and the way it works. Although a model may use industry standard terms, the model creates a precise vocabulary for the particular organisation and its context. A domain model would normally describe information people engaged in that business care about. Since most business systems are concerned primarily with collecting, processing, and providing data, it stands to reason that knowing what that information is and coming up with a clear way to categorize that information is helpful. Typically a domain model consists of business entities with associated data and behaviour. The interaction of business entities are typically specified as well. Examples of domain modelling include traditional entity-relationship models and object models.

Sidebar - Worked Example

We start with the report that the HR Director Wants.

Smell 1 – An item is in the output that is not in the model.

We have items in our output that do not exist in a model so we add the items to the model.

Note that average salary is calculated based on other data, so we call it getAverageSalary to remind us that it is a calculated attribute.

Smell 2 – An item is in the model that is not in the output

Someone gets carried away and adds Birthday to Employee.

We ask the HR Director whether they want Birthday on the report. They say "No" but they would like "Age" on the report.

Smell 3 – Two pieces of information in one place

The name consists of the first and the second name. We ask the HR director if they want to separate them out so that they can see all the people with the same family name (Family name can be used to identify cultural groups). He does not want to.

Smell 4 –An entity is not related to any other entity

All entities in the model must be linked. We ask the business subject matter experts whether the entities are linked directly to each other or through some other entity. For example, is the employee linked directly to a department or are they on a team, and the team is linked to the department.

Smell 5 – One to one relationships

One to one relationships are a bad smell so we ask the subject matter expert about the relationships. We do NOT ask. What is the relationship between and employee and a department as this will only reveal the general case. Instead, we ask. "Can you give us an example of an employee in more than one department." and "Can you give us an example of a department with more than one employee."

Applying this technique rigorously can result in some seemingly silly questions that are actually very valuable. "Can you give me an example of an employee with more than one sex, or more than one race" The individuals identified by these questions would be the one that would sadly be most subject to prejudice and discrimination.

We keep these questions open until we find an example. They act as risk indicators that the system may change.

Smell 6 – Many to many relationships

Many to many relationships may indicate missing information. We ask the subject matter expert what the missing information might be. In this case the employee's time is allocated between two or more departments.

For the issue of role, sex and race, the subject matter expert cannot think of any additional information of value. So the smells remain as open questions that indicate a risk of change.

One specialist web-site resolves the many to many sex relationship by having seven classifications ( male, female, male pre-op female, female pre-op male, male post-op female, female post-op male, and inter-sex ). The missing information could be the date that the individual decided to have an operation, or when they had the operation, and the impact it had on subsequent salary increases and bonuses.

Smell 7 – Undefined Functions

We ask the subject matter expert how to calculate getAverageSalary and getAge

This results in the value allocation.cost and employee.dateOfBirth being added to the model.

There are no smells we have not resolved or chosen to defer. We can now stop our domain modelling.

About the Authors

Chris Matts is an Agile Practitioner who does whatever is needed. As a result he takes on the role of project manager, business analyst, tester and even developer (though he is rather bad as that). Although most of his experience over the past twenty years has been building trading and risk management systems in investment banks, he is currently coaching the Skype Division of Microsoft in their adoption of Agile and Real Options at the Enterprise level. Chris's first masters degree is in Microelectronics and Software Engineering and his second is in Mathematical Trading and Finance. Along with Olav Maassen and Chris Geary, he is about to publish a business graphic novel called "Commitment". Commitment is a story about managing project risk.

Kent J. McDonald helps organizations improve the effectiveness of their projects. His more than 15 years of experience include work in business analysis, strategic planning, project management, and product development in a variety of industries including financial services, health insurance, performance marketing, human services, nonprofit, and automotive. He is co-author of Stand Back and Deliver: Accelerating Business Agility and currently delivers business analysis training for B2T Training, and shares his thoughts on project effectiveness at BeyondRequirements.com