This chapter is from the book

This chapter is from the book

In this chapter we will review the basic principles of database design
and normalization. A well-designed database minimizes redundancy without losing
any data. That is, we aim to use the least amount of storage space for our database
while still maintaining all links between data.

We will cover the following:

Database concepts and terminology

Database design principles

Normalization and the normal forms

Database design exercises

Database Concepts and Terminology

To understand the principles we will look at in this chapter, we need to
establish some basic concepts and terminology.

Entities and Relationships

The very basics of what we are trying to model are entities and
relationships. Entities are the things in the real world that we will store
information about in the database. For example, we might choose to store
information about employees and the departments they work for. In this case, an
employee would be one entity and a department would be another. Relationships
are the links between these entities. For example, an employee works for a
department. Works-for is the relationship between the employee and department
entities.

Relationships come in different degrees. They can be one-to-one, one-to-many
(or many-to-one depending on the direction you are looking at it from), or many-to-many.
A one-to-one relationship connects exactly two entities. If employees in this
organization had a cubicle each, this would be a one-to-one relationship. The
works-for relationship is usually a many-to-one relationship in this example.
That is, many employees work for a single department, but each employee works
for only one department. These two relationships are shown in Figure
3.1.

Figure
3.1 The is-located-in relationship is one-to-one. The works-for relationship
is many-to-one.

Note that the entities, the relationships, and the degree of the
relationships depend on your environment and the business rules you are trying
to model. For example, in some companies, employees may work for more than one
department. In that case, the works-for relationship would be many-to-many. If
anybody shares a cubicle or anybody has an office instead, the is-located-in
relationship is not one-to-one.

When you are coming up with a database design, you must take these rules into
account for the system you are modeling. No two systems will be exactly the
same.

Relations or Tables

MySQL is a relational database management system (RDBMS)that is, it supports
databases that consist of a set of relations. A relation in this sense is not
your auntie, but a table of data. Note that the terms table and relation
mean the same thing. In this book, we will use the more common term table.
If you have ever used a spreadsheet, each sheet is typically a table of data.
A sample table is shown in Figure 3.2.

As you can see, this particular table holds data about employees at a
particular company. (We have not shown data for all the employees, just some
examples.)

Columns or Attributes

In database tables, each column or attribute describes some piece of data that
each record in the table has. The terms column and attribute are
used fairly interchangeably, but a column is really part of a table, whereas
an attribute relates to the real-world entity that the table is modeling. In
Figure 3.2 you can see that each employee
has an employeeID, a name, a job, and a departmentID.
These are the columns of the employee table, sometimes also called the attributes
of the employee table.

Rows, Records, Tuples

Look again at the employee table. Each row in the table represents a single
employee record. You may hear these called rows, records, or tuples. Each row in
the table consists of a value for each column in the table.

Keys

A superkey is a column (or set of columns) that can be used to
identify a row in a table. A key is a minimal superkey. For example, look
at the employee table. We could use the employeeID and the name together to
identify any row in the table. We could also use the set of all the columns
(employeeID, name, job, departmentID). These are both superkeys.

However, we don't need all those columns to identify a row. We need only
(for example) the employeeID. This is a minimal superkeythat is, a
minimized set of columns that can be used to identify a single row. So,
employeeID is a key.

Look at the employee table again. We could identify an employee by name or by
employeeID. These are both keys. We call these candidate keys because
they are candidates from which we will choose the primary key. The
primary key is the column or set of columns that we will use to identify a
single row from within a table. In this case we will make employeeID the primary
key. This will make a better key than name because it is common to have two
people with the same name.

Foreign keys represent the links between tables. For example, if you
look back at Figure 3.2, you can see
that the departmentID column holds a department number. This is a foreign key:
The full set of information about each department will be held in a separate
table, with the departmentID as the primary key in that table.

Functional Dependencies

The term functional dependency comes up less often than the ones
previously mentioned, but we will need to understand it to understand the
normalization process that we will discuss in a minute.

If there is a functional dependency between column A and column B in a given
table, which may be written A > B, then the value of column A determines
the value of column B. For example, in the employee table, the employeeID
functionally determines the name (and all the other attributes in this
particular example).

Schemas

The term schema or database schema simply means the structure
or design of the databasethat is, the form of the database without any
data in it. If you like, the schema is a blueprint for the data in the
database.

We can describe the schema for a single table in the following way:

employee(employeeID, name, job, departmentID)

In this book, we will follow the convention of using a solid underline for
the attributes that represent the primary key and a broken underline for any
attributes that represent foreign keys. Primary keys that are also foreign keys
will have both a solid and a broken underline.