Database

Rapid Data Access Tier Implementation

Source Code Accompanies This Article. Download It Now.

DBAG is a tool for automating the creation of domain-specific classes.

December, 2005: Rapid Data Access Tier Implementation

John is a senior software engineer and Abdul is a technical manager at Sabre Airline Solutions. Hong is an independent consultant. They can be contacted at johnxjcheng@comcast.net, abdul_ akbari@yahoo.com, and hongrong1@comcast.net, respectively.

In today's enterprise software applications, the database plays an important role. Hence, the database access tier becomes a vital component. Luckily there are many libraries, such as ODBC, JDBC, ADO, and the like, that simplify implementation. Nonetheless, finding suitable design patterns and implementing them involves the tedious task of writing domain-specific access classes. This is still the responsibility of developers. Moreover, a considerable amount of time needs to be spent maintaining this layer when the schema is not static, but continues to evolve, especially when the database schema contains a large number of tables. In this article, we present techniques for automating this process. In doing so, we introduce a lightweight tool that you can use to generate the access layer from a template that can be customized to the various access libraries.

Data Entity and Table Relational Data Mapping

A database access tier needs to manage two tasks:

Handle the encapsulation of the relational data stored in a database.

Implement the four basic operations: insert, select, update, and delete. (These operations are sometimes also known as "CRUD," short for Create, Read, Update, and Delete.)

It is generally preferable to implement these two tasks with different objectsone to encapsulate data, and the other to handle the access logic. The one that encapsulates data is usually called "Data Entity" (DE) or "Business Entity." The one that implements access logic is normally called "Data-Access Logic Component" (DALC) or "Data-Access Object." Another reason for separating them is when the DE needs to be moved from one tier to another in multitiered systems. Separate tiers can directly use the DE, thus avoiding conversion from one format to another. Typically, the DALC is directly dependent on the language, library, and database connection management and does not lend itself well when transferred from one tier to another.

For single-tier apps, combining the two components has advantages. The biggest advantage of doing this is that it is simpler. Within the same component, it is easier to understand and quick to implement. It cuts the number of objects by half, and this number may be large for a typical medium- to large-size database. Actually, many applications choose to do so.

Data entities can also be categorized according to how they are mapped to a table or tables. A data entity can map to a single table with data members corresponding to the columns, or it can map to a join of tables with data members corresponding to columns across the tables. Note that here you care more about how the columns are mapped to the data members, not how instances of each data entity map to the rows of a table. For the entities that directly map to a table, all CRUD operations can be done easily with the help of data-access logic components. For the entities that map to joints, in many cases it's not possible or desired to perform the operations other than the select (read). When considering the member-to-column mapping, the database metadata that need to be considered include table name, column name, column ordinal, column type, and column constraints (key, size, nullable, and so on). The code side information that needs to be considered includes: programming language, data type, and member access permission.

Mapping the relationships between tables is another important point that must be considered. On the database side there are one-to-many, many-to-one, and many-to-many relationships. These relationships are established through foreign keys and many-to-many tables. On the code side, these relationships can be represented as parent-to-child, child-to-parent, and peer-to-peer. The business logic tier can use these relationships for navigation from one entity to another.

Problems and Benefits of Implementing the Data-Access Tier

Implementation of the whole data-access tier, including data entities and data-access logic components, is tedious and time consuming. Each substantial table may require one or more classes and interfaces. For a database that contains hundreds or more tables, the task of coding is huge. Furthermore, if the database schema continues to evolve, the maintenance on the code side is also considerable.

To simplify the work, developers have attempted to get help from code generators that allow tables and columns in the schema to be mapped to classes and members of the generated code (see Listing One). Code generation is a big leap in reducing manual work. It helps make the whole data-access tier code more consistent and predicable. Any improvements worked out on one database-access class can quickly and easily be applied to the other classes by means of code generation. Any mismatch between a database schema and the application that uses this schema can be detected by a compiler. For example, if a column is modified or removed, the generated code will reflect this change on a class member or function. Thus, any client code that uses this class member or function will be forced to make the necessary changes. It leaves no unpleasant surprises at runtime. This ease of propagating changes to the code, along with the error detection, forms a seamless link between the database schema and the application code. Several code generators and persistence frameworks are available, both commercially and through open source.

Some advanced code-generation tools provide another important featurethey let SQL statements be constructed with the generated information at runtime, which helps avoid schema mismatch (Listing Two). For instance, assume you collected the table and column information, such as type, name, size, and ordinal, into some static class members. Then, if you have some assistant objects that can utilize this collected information to build the SQL queries for data-access logic, you can say that your SQL queries are also "bonded" to the database schema.

When writing code to generate the data-access tier, it requires that you first know what you want; that is, the look and behavior of the code. Then, it requires that you know how to get what you wantthat is, how to write the generator. There are some problems with this strategy. For one thing, the generator is specific to one project. For another, the logic of the target code and the logic of the generator itself are often mixed together. It requires you to devote more time to coding and testing, and adhere to a more disciplined practice.

Here we introduce a tool called "Database Access Generator" (DBAG), developed by Hong Rong. (This code generator is copyrighted by Hong Rong with the Open-Source Software Alliance License [OSSAL]. The full license content is captured in the license.txt file.) The difference between DBAG and similar tools is that it makes extensive use of code-generation templates, which can be easily expanded. They also adapt easily to the various flavors of an access driver within a particular access technology (for instance, the ADO.NET driver from Microsoft versus that from Oracle). DBAG predefines the common database schema information as keywords. These keywords can be used in a customer template, which is then read by the generator to produce code.

The primary goal for DBAG is to create a lightweight persistence framework through code generation. Some of the features of this tool are:

Code generation is fully templatized.

Available under open source.

Can handle complex situations such as joins and relationships.

Strong type checking.

Schema matching for class, class members, and SQL queries.

Null handling.

Tested with Oracle and SQLServer, but not limited to any particular RDBMS.

Through the use of code-generation templates, multiple databases (Oracle, SQLServer, and the like), libraries (ADO, ODBC, and so on), and languages (such as C# and C++) can be the target. The data entities and data-access logic components are pre-compiled for higher performance, and the database schema can be validated against the application for greater type safety. The overall framework is simple and lightweight, and designed for ease of regeneration when the schema is updated.

The tool itself is implemented in C#. It only runs on a Windows platform. It requires a suitable OLE DB driver that can provide the most comprehensive metadata. But this doesn't mean the template and code are subjected to any of these conditions. Once the code generation is complete, any generic driver may be used by the application during runtime. The template can be in any programming language. It can use any data-access library, coding style, and software-design pattern. The tool may even be used to generate database scripts; for example, the scripts to create routine sequence and triggers. The bottom line is that, as long as you can do something once and know how to abstract the task into a template, the generator can fill in the rest.

Code Generation of Classes for Single Tables

Data entities that map to each single table are the most often used components in a data-access tier. These components are normally implemented as classes in a procedural language, such as C++, Java, and C#. The class members, properties, and methods are often mapped to the columns of the corresponding table. These kinds of classes may or may not implement the DALC functionality as just mentioned. When using the DBAG tool to generate these kind of classes, you need to:

Let the tool connect with the database with the selected driver and read the schema information.

Figure 1 shows the main interface of DBAG. This interface shows the collected table information based on Oracle's sample scott/tiger schema. The name, type mapping, and access control convention are stored in the DBAG.ini configuration file. Listing Three presents code template segments and the generated C# code fragments from this generator.

DBAG is just a tool. It depends on you telling how to perform the task. The dictation is done through templates. DBAG provide a list of control commands and keywords to be used in your templates. These commands and key words are listed in Listing Four.

A well-designed database requires that each table have an identifier (a natural key). In most cases, an identifier is also a key, but that's not always the case. There are many tables that have an identifier but no key. Both the identifier and the key are important concepts in a database. Without them, it will be hard or inefficient for a database access tier to perform update and delete operations.

Compared to an identifier, the key is a more widely recognized concept across relational databases. DBAG can get the key information from tables easily and pass it to the template. If you have control of the database schema, you may want to make sure that every table has at least the primary key defined. If a table does not have a natural key, you may want to create an artificial one. For example, create an Object ID (OID) column that gets its values from the database sequence, autonumber, or identity. In our experience, the artificial key is easier to use in a template because the names and types are more consistent.

Data entity instances hold the data from/to table rows. Data-access logic components decide what rows are affected through the where clause of a SQL command. The data entity structures can be defined during code generation. During runtime, the dynamic where clause can be passed to the data-access logic as an argument.

Generation of Classes for Table Joins

Data entities that map to some joins are also often used in the data-access tier. Depending on your business process, sometimes these entities are preferred over the data entities that map to single tables. To generate classes that represent this kind of entity, you need to define the SQL statement to specify the join. This statement includes a selection part that gives each selected column a unique name and a condition part that specifies the joint criteria. The selection part is crucial because it determines the structure of the generated code. The condition part is not as important. As long as it can compose a syntactically correct SQL statement, it's okay. At runtime you may provide the real joint relationship and parameters.

The DBAG tool accepts the joint definition statements through the Sql2Class.xml file. On the GUI, DBAG lists the join-to-class maps together with the single-table-to-class maps. The only difference on the join-to-class map is that these maps have no table name values. You may need to consider this in your template.

Generation of Classes for Stored Procedures

A stored procedure may take none, one, or more parameters. It may return some selections, a single value, or nothing. This information can also be collected and made available for generating data entity wrappers. DBAG does not handle the stored procedure wrapper yet because it requires a totally different display interface and quite a different keyword set. However, based on the provided code, it should be easy to add such support.

Sample Application

The example application we provide with this article (available electronically; see "Resource Center," page 4) implements a data-access layer for the famous Oracle sample database scott/tiger. To demonstrate some of the recommended concepts, we modified it to add an artificial primary key column OID to each table. We also rebuilt the relationships between the tables using the newly added columns. When generating the code, we used Microsoft's OLE DB driver for Oracle and a template that combines database entity and database access logic. Besides demonstrating the normal database operations, it also demonstrates how to handle joints and how to bind SQL statements to the database schema.

Conclusion

Database access tier is an important component in enterprise applications. It helps make the business process code simple and independent. Although repetitive, it requires a considerable amount of time and attention from developers who otherwise can spend more time on their business logic. Computers are excellent at handling repetitive tasks. They are more powerful in repetitive tasks than humans, and it is wise to use them for generative programming tasks.

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task.
However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Video

This month's Dr. Dobb's Journal

This month,
Dr. Dobb's Journal is devoted to mobile programming. We introduce you to Apple's new Swift programming language, discuss the perils of being the third-most-popular mobile platform, revisit SQLite on Android
, and much more!