Data Modeling Fundamentals for Database Management Systems

Anything you want to create needs to start with a plan. Whether you want to build a home, a newsletter, a report, an industrial center, you need to start the endeavor by creating a plan – an outline or a blueprint of what you want to do and how you expect it to look. One way to look at this is to say that you want to create a model of finished product to guide your steps along the way. The same thing applies to your data plan. If you want to create a database for your organization, the place to start is by creating a data model – a model that represents the information requirements of your organization.

A model gives you a framework to guide your later work. It can help you identify what data you have, what you want to do with it, what information needs to attach to it, how data relates, and other necessary qualities of your data. Not only can data modeling help build a database, it can help build a better data governance policy by clarifying your data, what it’s used for, and who uses it. But the primary purpose of a data model is to drive the creation of a database that reflects the real data owned by your company, and provides the data that is needed in the context in which it is used.

Creating a database is a complex business, one that requires interacting with many people in many parts of the organization, all of whom have different assumptions and often different vocabularies. A model can pave the way to clearer communication about how people interact with data. By creating a model, groups can view proposed database structures in a way that they can relate to their own experience. A model gives users and developers the same vocabulary, or at the least, a translation medium, as well as functioning as a blueprint for the database developers.

How to Look at Data

Ultimately, the goal of data modeling is to create entity types – category types with which to classify data that will be turned into database fields, tables, and so on. Looking at the data has other values, though – helping to provide business intelligence, for instance, assist people in doing their jobs, streamline workflow processes, and so on. Building a map of an organization’s data structures can provide enormous insight into many aspects of the organization.

A data model must model the content of data, its size, source, provenance, age, and business, and the security rules of each data element used in every part of an organization. The data model defines the relationship between elements, if any exists. A data model considers what data the organization has, where it is stored, and how it should be organized, virtually and physically. While it doesn’t necessarily take into account the business goals of the data, the rules for its use are a valuable piece of knowledge to help build the database structure.

Some conventions are there that help to fit data into a framework. Models can exist that describe data in specific ways: External models, Conceptual models, Logical models, and Internal or Physical models.

Classification of Information Models

Conceptual Model

This is a high-level, shared view, in which data is described in the most accessible terms. This is usually a very human-readable version of data and requirements, probably organized in a way that reflects that of the organization, more or less. This is the level at which both developers and users can engage. Vocabulary is familiar, or at least understandable, to the average user. Users from different groups are familiar with a portion of the overall model.

This is a stable model. In it, the model attempts to describe basic business objects, including constraints, attributes, and dependencies. In this model, the data is described in user terms, and is independent of considerations about the current or planned database structure. The conceptual model is a productive way to communicate with users and domain experts.

External Model

To create a conceptual model of an entire large organization’s data structure is – rightfully – an intimidating undertaking. How does one even begin? The external model is a tool that can be very useful in making the conceptual model. To create an external model, fragments of the conceptual model are created as models that can stand alone, later to be aggregated into a comprehensive structure. An external data model is a representation of an isolated group of requirements that pertain to only one group of users. By creating an external data model for each the functional group of users you can aggregate those models and arrive at a complete conceptual model that represents the data structure of the entire organization. The external data model enables the database designers to examine data use and requirements group by group, in manageable pieces.

Logical Model

In this model, the conceptual idea of data domains and their relationships are further expanded into the architectural domain. Details are included about the content of the data and other conceptual attributes. Storage, location and similar considerations are still not part of this model – even considerations of a specific database management system are exempted from this model.

General database decisions are made at this point, however, because the logical model does include conceptualizing how the data relate to each other. Consequently, deciding at this point whether to implement a relational or non-relational database and to make some general decisions about the network is important.

An important part of this model is detailing how data is perceived by the user, so it’s important to know at this point if the data will be perceived in a relational or hierarchical way. Relational databases work in two-dimensional tables. Consequently, a logical model of data used to build a relational database will define those tables and their relationships. Data within tables is represented as rows and columns, and that can be modeled at this point.

The logical model will be the blueprint, the construction model, in the actual architecting of the database. This is the first modeling level that is architectural rather than conceptual, and is the first step in building a database. This model is not built to communicate with users, although user input could influence some decisions. It is a good idea to take any ambiguities back to stakeholders.

Internal or Physical Model

This model represents the physical implementation of the database. It is time at last to think about the database server, storage, access, and performance issues. This is a technical blueprint of the database as it will be implemented, including the DBMS services and tools. Scaling issues are addressed at this time. The physical model is not used to communicate with the users – that work should have been done in conceptual and logical models. The physical model addresses architectural needs, from server decisions to file locations, and so on.

Approaches

A set of three basic approaches has evolved for data modeling, each with its own sets of symbols and methods of representing and expressing concepts.

Entity-Relationship Modeling

This approach is often felt to be the most intuitive. It was introduced in 1976 and remains the most widely used technique. The E-R approach describes the organization’s data as entities with attributes, participating in relationships. It is independent of the technical considerations of the database or hardware. This method lends itself to conceptual data modeling, and is good for communicating with users, though it falls short on technical detail for database architects.

Fact-Oriented Modeling

This approach views the organization’s data in terms of objects playing roles. Roles require objects to play them. Defining roles inherently defines relationships between the roles and thus the objects. Unlike E-R models, attributes are not addressed in fact-oriented models. Object-role modeling (ORM) is an example of a fact-oriented data modeling approach. Object-role modeling allows for relationships with multiple roles.

Object-Oriented Modeling

This approach was primarily intended for designing code of object-oriented programs. Some find this modeling approach useful in an adapted form for conceptual modeling for designing databases. The most widely used object-oriented approach is Unified Modeling Language (UML). UML has a broad set of diagram types, including class, which can represent data structures in a way that can make it function as an extension of the E-R technique.

Process

Implementing data modeling starts with the creation of external models. An external model should be built for every area of the business. In some businesses, this can be quite extensive. One approach is to follow the stages that data goes through in an organization. This will probably include several data paths, like client information, employee information, and product information.

When working with the different areas of the business, it is important to determine exactly what data is needed, and, for later work, how it is used. This is the opportunity users have to give you their most basic requirements in their own language, independent either of other organizational dependencies or of physical and programmatic considerations.

Each external model will represent an independent view of the data from the perspective of one particular part of the business. Each business area will have stakeholders to represent their requirements through the development process.

The next stage is the creation of a conceptual model. In this, the many external models must be aggregated to create a single model of the organization’s data structure. In this model, it is important to include data dependencies and competing requirements. Relationships between external models are noted, and can be leveraged in the final models to help reduce redundancy. This is still a part of the planning and requirements stage, and even though the goal is a complete and proper database system, the conceptual model is still about making sure that the organizational requirements are fully understood.

During this stage, designers will keep communicating with the business stakeholders to ensure that data elements, structures, and relationships continue to represent a conceptual model of reality and that they are in support of business objectives. The collection, access, and archive requirements should be well articulated. Users should understand clearly what the model is for, and affirm its correctness.

The next task is building a logical model. This is where database architects take the business requirements and makes a detailed model that includes the type of database and specifically how it will be implemented. This may include deciding on tables and building models that show your entities and their relationships.

Since it’s critical that this be a true representation of the conceptual model, business stakeholders should continue to be involved to clarify relationships and dependencies. This will be a model of the database as it will be implemented, and can become highly detailed, introducing questions that were not asked before.

Finally, the physical model completes the data modeling phase. Now that the data architects have identified the needed data and how it is used from the perspective of each business area, and built a conceptual model that was approved by the stakeholders, and now that the architecting of the database is complete, it’s time to choose the database management software, determine the server setups, including whether or not to be resident in the cloud, and to determine all physical technical aspects of setting up a database.

Who Performs Data Modeling?

Database architects are good candidates for this. Data modeling professionals exist, possessing the appropriate training to do successful data modeling. However, the better the database architect or data modeling professionals understand a particular organization and its environment, the better they can be.

Look for good communication skills because eliciting information from users requires someone who can ask the right questions and also document responses effectively. Because you need a variety of skills to complete a good data model, the data modeling is probably best done by a small team representing diverse specialties: database, data architecture, and business analysts.

Conclusion

Data modeling is the foundation of architecting an effective and useful database. Communication is the foundation of creating a good and true representation of the data of your organization. Using a model allows people to visualize together, to use a common vocabulary, and to better communicate their needs. They can be active participants in developing models and helping them better represent their understanding of the data around them.

This same tool is then of immense value to the data architect, ensuring the database they build is a good and true representation of the data model – because it is a good and true representation of the organization’s data structure.