Free Relational Databases, Part II

This site may earn affiliate commissions from the links on this page. Terms of use.

In part I of this story, we reviewed basic RDBMS concepts and created a simple table to hold some information about DVDs. In this section we’ll expand on that foundation, tying together multiple database tables, taking a detailed look at the all-important SELECT statement, and looking at ways to make loading data much easier.

Linking Multiple Tables

A database that could only contain one table wouldn’t be terribly interesting; after all, you could accomplish nearly as much with a flat-file format. The real power of relational databases comes from having multiple tables and linking them together in interesting ways. Here’s one way we could extend our DVD database, adding basic information about the director:

Now we have the ability to track not just who directed a movie, but when that director was born and died. We’ve also broken up the director’s name into its constituent first and last names. And most importantly, we’ve consolidated all this information about directors in its own table, separating it from information about the DVDs themselves — but keeping it logically linked by having a “director_id” field in the discs table that lets us look up the corresponding information in the directors table.

This relationship shown here between directors and discs is known as a one-to-many (sometimes 1:N) relationship, because a given director may be associated with many movies, but a given movie is only associated with a single director. This is actually an oversimplification that we could address using a more complex many-to-many relationship, but as it suffices for many movies we’ll adopt it for now.

One benefit to this segmentation is that it more closely mirrors how we think of the real world — DVDs are one kind of entity with a particular set of attributes, and directors are a different kind with their own attributes, so they really belong in separate tables. But there are other reasons why this is a good move. For example, if there were two directors named “Stanley Kubrick”, we would have no way of differentiating between them by name alone. Since each person in the directors table has a unique director_id, though, that’s no problem.

Similarly, storing all the director data in its own table makes it easier to keep that data consistent. There’s no need to worry that one DVD’s record might refer to the director “Kubrick, Stanley” while another calls him “Stanley Kubrick”. And when Kubrick died in 1999, this information only needed to be updated in one place–the record for him as a director.

This process of breaking apart information into its components and eliminating redundancy is, broadly speaking, known as database normalization. And even though information we need to answer queries such as “in what year was the director of ‘2001’ born?” is now split across multiple tables, we’ll see in just a moment how SQL makes it easy to draw on these multiple sources.

As an aside, it’s a common misperception that the “relational” in “relational database management system” is derived from this ability to link information in multiple tables. The term “relation” is actually a mathematical term that, in the context of database theory, refers to the collection of records (technically, “n-tuples”) that we normally think of as a “table”. If you’re curious about the underpinnings of database theory, you can check out the seminal paper “A Relational Model of Data for Large Shared Data Banks“, published in 1970 by E. F. Codd, a researcher at IBM.

Now let’s see how we can actually implement the one-to-many relationship.