Is Your Database Schema Too Complex?

Introduction

In the majority of today’s commercial software applications that require data persistence, a significant portion of time is spent designing and integrating the database with the application. This task typically involves:

Designing a large, normalized database schema in a Relational Database Management System (RDBMS) using a tool such as an Entity Relationship Diagram (ERD).

Implementing a mapping layer between the database tables and the application class model (either manual code or an ORM such as Hibernate or Entity Framework).

Performing iterations over the previous items as changes are made to the schema during development.

These tasks are very complex and require a significant investment of development resources to perform; they typically require senior-level application and database developers.

In contrast, Object Database Management Systems (ODBMSs) provide much more efficient development using standard application developers while providing superior performance and requiring a smaller hardware footprint (in most cases commodity hardware can be used).

The following sections provide an in-depth comparison of developing these Applications using an ODBMS vs. an RDBMS, highlighting the development resource savings and higher performance achieved using an ODBMS.

All of the ODBMS technical descriptions presented will be based on Objectivity/DB2. This is a high-performance, scalable database that utilizes a simple API in many languages to persist and query data from any Windows or Linux standalone, distributed, or web application.

RDBMS Schema

The relational database schema consists of normalized tables, which seek to minimize duplication of data and ensure referential integrity. Much of the relationship information is designed as foreign keys from one table into another table.

Figure 1 shows a simple example of an RDBMS schema described in an ERD. It normalizes the City field out of the Employee table and models the Salaried_Employee and Hourly_Employee tables as sub-types of the Employee table. The City table has a one-to-many relationship with the Employee table and the Salaried_Employee and Hourly_Employee tables have a one-to-one relationship with the Employee table.

Figure 1 - RDBMS Schema

ODBMS Schema

In an ODBMS schema, the database understands object-oriented concepts such as inheritance and polymorphism. In addition, pointers (references) from one object to another are represented directly in the database (in a storage-independent manner). Also, referential integrity is built-in. Primary keys are not necessary – they can be maintained by the ODBMS in a transparent manner. Note that foreign keys are also not necessary; the reason is described in the ODBMS Query section.

In short, in an ODBMS, the application data model is the database data model. The data modeling task is a single step that requires no translation (or mapping) between separate technologies.

Using the same schema example from the RDBMS Schema section, it is modeled in an ODBMS as a simple UML Class Model, as shown in Figure 2.

Figure 2 - ODBMS Schema

RDBMS Query

Because of the normalized tables in an RDBMS and the fact that it does not understand object-oriented concepts, most queries contain joins (and in commercial systems many joins). Because RDBMSs rely on dynamically executed joins, RDBMS queries can be complicated and perform poorly compared to ODBMS queries (described in the ODBMS Query section).

Figure 3 shows an example of an RDBMS query using the schema from the RDBMS Schema section.

Figure 3 - RDBMS Query

ODBMS Query

Queries in an ODBMS use an “SQL like” query language, because they have a different logical and physical design. An ODBMS stores references and data structures in the database and understands object-oriented concepts, which results in static evaluation of relationships in an ODBMS query. It also means there are no joins in an ODBMS. Because of these factors, an ODBMS query is simpler and provides much higher performance than an RDBMS query.

Figure 4 shows an example of an ODBMS query using the schema from the ODBMS Schema section.

Figure 4 - ODBMS Query

Notice that:

The query can use object-oriented polymorphism to process only SalariedEmployee objects stored in the database. HourlyEmployee objects are ignored, and because Employee is an abstract base type, there are no Employee objects. Thus, there are no joins on Employee objects. Queries involving inheritance are simpler, faster, and use less storage in an ODBMS than in an RDBMS.

Because the ODBMS stores references in the database, there are no joins on City objects. The syntax for following a reference is to use a “.”. Thus the “SalariedEmployee.CityPtr.City” syntax is used in the query to access the static pointer from the CityPtr field to the City object, rather performing an expensive join. The query is very similar for data structure fields (e.g. arrays). Again, queries involving references or data structures are simpler, faster, and use less storage in an ODBMS than in an RDBMS.

Conclusion

Because an ODBMS understands object-oriented concepts and stores references directly rather than using costly table joins, the design of the schema is performed once using the application class model. This makes the schema design process dramatically simpler and requires fewer development resources than for an RDBMS. It also is easier to query and executes faster than an RDBMS. Consider that:

An RDBMS requires separate application schema and database schema. For commercial systems, an RDBMS requires database developers and a great deal of coordination between the database developers and the application developers.

An ODBMS schema requires only an application class model. The application data model is the database data model! No database developer resources are required – the design and implementation, can be entirely performed by the application development team. In addition, the views, stored procedures, triggers, constraints, transaction logs are either eliminated or part of the application code. Thus, total development time is much smaller than with an RDBMS.