In the last chapter, you were introduced to basic database design concepts.
In this chapter, you will build on those skills. If you recall one of the basic
mantras of this book, it takes several iterations of work to reach an optimal
database design. After your initial complement of tables has been created, the
next step involves outfitting the tables with columns. At this point, the only
columns that exist are the primary and foreign keys necessary to support the
relations between the tables. The question at this point is, "What columns
do you include in a table?" Some of the columns to include can be determined
through the application of common sense. Other columns to include are not as
intuitive. As you will see, whether to include a column can determine whether
new tables have to be created. If you think all the tables for the Time Entry
and Billing (TEB) Database have been created, you are in for a surprise! The
process of determining which columns go in a table is called normalization.
The normalization process is the focus of this chapter, and upon completing
this chapter, the TEB Database design will be complete.

What Does It Mean to Normalize a Database?

Database normalization can best be characterized as the process of organizing
a database. With the question of what out of the way, let's turn to the
question of why. The goal of normalization is to reduce problems with data consistency
by reducing redundancy. Sound confusing? The concept of normalization is probably
easier understood by way of a simple example. To illustrate, let's again
turn to the Northwind Traders Database that ships with Access.

Figure 4.1 shows the ERD (Entity Relationship
Diagram) for the Northwind Traders Database that was introduced in Chapter 2,
"The Anatomy of a Real Database." Notice the relationship between
the Customers and Orders tables. As a quick review of how relationships work
in a relational database, the primary key of the parent table is carried in
the child table as the foreign key. In this case, the CustomerID field is carried
in the Orders table. This is how order records for a specific customer can be
associated with that customer.

Why then not carry other fields from the Customers table? As you will see later,
sometimes you might elect to carry other fields from the parent table to the
child table. As a general rule, however, you will not want to do this. For example,
what if you decide you are going to carry the CompanyName field in the Orders
table and the company name changes? Not only would you have to update the Customers
table, you would also have to update the Orders table and any other table in
which CompanyName exists. Continuing with this example, what if the various
locations of CompanyName were not consistently updated? Which version would
reflect the current, most accurate version of CompanyName? If you were asked
to prepare a report that needed to include CompanyName, which table should you
use as the source of CompanyName?

Figure
4.1 The Entity Relationship Diagram for the Northwind Traders Database shows
a relationship between customers and orders.

It seems clear that if you have only one instance of a given data element in
a database, it leaves nothing to interpretation and guesswork. When you update
CompanyName in the Customers table, you can be sure that every report that relies
on CompanyName will be accurate and up to date. Why? Because CompanyName is
not carried redundantly in the database, and as a result, consistency is ensured.
The same concept can be seen throughout the data model in Figure
4.1. Greater consistency through the elimination of redundancythis
is the goal of normalization!