Database Elaborations

Database Elaborations explores the human side of building and managing databases. The discussion revolves around data modeling as a semantic process, and how normalization only functions within a business rule context. Communication is hard; the meaning of everything is more fluid than we might wish to admit. Large gaps between logical versus physical or theory versus implementation are always worth reflection.

Far too often we all talk past one another. This cross talking, while not always drastic, remains perceived as an understood fuzziness. Much of the time we ignore these minor miscommunications because precision and clarity are not necessarily critical in all situations. If the general gist is effectively understood between those communicating, that generality may be all that is necessary. Those involved in the communication may feel comfortable that assumptions made to "fill in the gaps" will fall within an acceptable range. Although the lack of clarity in the message communicated may be acceptable, in other circumstances it may not be acceptable.

Primary keys come from candidate keys. Each candidate key consists of the attribute or attributes used to label a distinct row in a table. Every candidate key should contain the fewest number of attributes possible to identify rows individually and uniquely. Every entity within a design requires at least one candidate key.

Designing a data model that supports the reporting and analytical functions is no different, initially, than any other modeling effort. Understanding the data is crucial. The data architect or modeler needs to feel comfortable with dimensional modeling techniques and needs to obtain a working knowledge of the universe of discourse for the subject-at-hand. A good start in gathering this knowledge begins by reviewing the operational data structures containing the identified source elements. The challenge in designing analytical solutions is found in applying best practices for analytics simply and effectively.

The process for designing a database that supports the operational functions of an organization begins with simple understanding. The data architect or modeler needs to obtain a working knowledge of the language comprising the universe of discourse for the solution. This awareness is gathered through many activities, such as talking with the people currently doing the work, sitting with them and watching how they do their tasks, reading over existing training manuals or standard operation procedures. The designer is best served when figuratively walking a mile in the shoes of the future application users. The more that the designer knows about the user needs and goals, the better able the designer is to definitively craft a data model supporting user tasks.

In composing a data model, the structures are put together thoughtfully and with intention. Data structures emerge from the application of semantics germane to the universe-of-discourse and filtered through the rules of normalization. Each table expresses meaning, with columns that are self-evident. The best models reduce the data items within the covered subject area into obvious arrangements. However, this simplicity often confuses observers, persuading many that modeling itself must therefore be a simple task. DBMS tools often incorporate wizards that facilitate the quick definition of tables which are then immediately created within the chosen database. These tools enable the developer to build tables in the blink of an eye. At times some prototypes are approached in this fashion, and while this provides for placeholders, such slapped-up table structures are insufficient for an industrial strength solution. Under these instant-table setups, developers often have no problem reworking large sections of code for minor changes, or misusing data elements to mean many things while making the determination of meanings at a specific point-in-time less clear. Unaware of these less-than-stellar consequences, users become confused; they often wonder why modeling tasks should ever need to be done because the proof of concept worked, didn't it?

The pervasive nature of data continues unabated, leaving organizations more awash in data than ever before. Technology has enabled the access and leveraging of information to heights undreamed of a generation ago. Between corporate dashboards and internet Googling, vast quantities of information are truly at one's finger tips. Data-driven, domain-driven, model-driven … the data itself is a force to be met and managed. When managed well, users never explicitly think about the databases that persist all that data.

The art of building software solutions is comprised of many moving parts. There are project managers coordinating tasks; business analysts gathering requirements; architects establishing standards and designs; developers assembling processes; DBAs building database structures; quality assurance staff testing components for compliance to requirements; and an array of supporting roles providing for functional environments, infrastructure, security, etc. A common task that everyone must perform is estimating the effort necessary to deliver results. Certainly for simple and/or repetitive tasks there is no need for recurrent estimating, since applicable values are based on past known metrics. For example, creating the third or fourth version of the same database within the same environment should allow a DBA to incorporate costs experienced previously as a guide. And unless something unusual occurs, such estimates should be on target. However for creative tasks, such as designing new structures, or building new kinds of processes, there will be no previous documented events to refer to. Faced with these circumstances, individuals usually are not allowed to shrug and say, "it will take as long as it takes," and be left alone.

Composite keys are an implementation of business rules within the database. As an example, a table named INVOICE has a composite primary key consisting of Account_Number and Invoice_Date. In this example several possible rules are being expressed. The Account_Number which partially identifies an INVOICE instance implies that an account must exist before an invoice can exist. In addition to this INVOICE table, one expects to see a table named ACCOUNT with only the Account_Number as its primary key. Likewise, a referential integrity constraint would be defined between the INVOICE and the ACCOUNT tables based on that Account_Number value. In this manner, the DBMS would prevent Account_Number values from insertion into the INVOICE table unless they already existed in the ACCOUNT table.

New data modelers often see things as black and white. But rather than being concrete flooring beneath our feet, knowledge is more like a gossamer web that builds up layer upon layer to provide the effect of a solid foundation. We may think we know facts, such as one plus one equals two, or Columbus discovered America in 1492. What we have come to know about our world comprises our own internal knowledge base. We gain much of this knowledge because it has been passed onto us by others, people with experience, parents, educators, clever friends, verbally or in print. But how do we know such items are real, actual, absolute bona fide beyond-a-shadow-of-a-doubt truths?

While good database design is always necessary, the value of good design is reinforced by such endeavors as service-oriented architecture. The semantics-laden engineering of service-oriented tactics used in creating solutions melds seamlessly into the semantics-laden world of data modeling. Although one aspect does not necessarily build on top of the other, each works as part of a team in braiding solutions that grow in usefulness as they emerge. The processes surface as a method to further define the meaning of an object, and the object serves as a harbinger of the processes that must exist. Business rules much more advanced than today's simple constraints may one day reside within the database itself as a built-in function of a some future DBMS. Therefore, data architects should never try to wall themselves behind a veneer of thinking only in terms of data without a thought regarding the processes that touch the data. As a database designer, working within projects that use service-oriented architecture approaches can be exhilarating.

There comes a time at the start of a new engagement when the data architect must acquaint himself with the system for the first time. When first learning about a new application, the relevant data, and its foundational concepts, many questions are reviewed.

An inherent awkwardness exists in every many-to-many relationship between entities. Ambiguity causes this persistent awkwardness, primarily because a many-to-many relationship is such a fuzzy thing. In data discovery, encountering many-to-many relationships may actually expose a level of disregard about details by the subject matter experts.

In a relational database approach, regardless of whether one is considering persisted data or transitory query results, almost everything is viewed as a logical table. Associations between these envisioned tables are based on data values, versus the alternative to the pre-relational idea of linking multiple data structures via "hidden" pointers maintained by the system. Relationships among objects are ultimately derived from the semantics of a situation.

In formulating the tenets of relational theory, issues anent to order were explicitly addressed. These relational theory tenets included defining a relational database so that it need have no concern with the order of columns in a row, or with the order of rows in a table. And yet, such a stance seems counter-intuitive since the database brings structure and organization to content. Chaos is the primordial soup from which all things originated. Thus it seems only reasonable that a relational database, being the best and brightest of its kind, should abhor such chaos and bring ever more order instead, right?

Occasionally the database under design becomes more than a one-off project. For example, the data mart or data warehouse might act as a profit center for the organization, and as a solution sold to many customers. The possibility exists to capture a basic truth for a given industry in a simple universal design exactly suiting everyone. But unless the situation really qualifies as a valid "one-size-fits-all" circumstance, one needs a level of customization within the architecture.

Tight budgets and limited talent pools do not make a winning combination for staffing a database management group. Despite the importance of saving money, problems can arise when the lines blur regarding what gets wasted versus what gets saved.