Database Schema Design- Yii Framework Tutorial Sess. 2

In Session 2 of our Yii Framework Tutorial we’ll move on to the database schema. We’ll examine the functional requirements, then draft and refine a list of tables from that. We’ll also draft a list of attributes and use that to create a complete SQL statement to generate the database.

Overview

In session 1 we looked at some of the feature requirements for our app. We will no doubt want to change the feature list and that is absolutely fine; the schema can change as the app evolves. This approach is based on my own experience designing database schemas for web apps. It has worked for me and it will help us get a good database schema design to get started with. I hope it’s useful to you.

Database Schema Planning

A good way to flesh out the database schema is to read through the requirements list and make a note of any non-trivial nouns. These are all potential candidates for objects in the PHP app and/or the database schema. To do this, first just make a list of all the nouns in our feature list. Skip any that are not concrete or for for which we don’t want to store anything (for example output, this, that, WordPress). Also just leave out any that are duplicates (e.g.: admin is a duplicate of administrator). Here’s a list I made based on the features we identified in session 1:

directory, admin, site, install, site tag, site category, site rank, module, template, theme, user tracking.
let’s eliminate everything that is obviously not an object:directory, admin, site, install, site tag, site category, site rank, module, template, theme, user tracking.
…directory is not really a useful object, since it describes the entire app. We already know we will need some kind of options table and directory is essentially a duplicate of that. Install is pretty much the same as directory. Module, template and theme will all be tracked in our options table too, so we’ll get rid of those too. That leaves us with:

Site

Site Category

User

Admin User

User Tracking

Site Tag

Site Rank

Then go through your list again and examine each one. There are 5 tests or questions I ask myself for each object candidate:

Is this really a noun? If you can add a tense ending to any word in it without changing the meaning, it might be a process, for example: make pdf describes a process, and so is not an object. By way of contrast: pdf is clearly an object.

Is it unique / not a duplicate? If you have a user object and a person object you probably have duplicates.

Is it complex? If it has multiple attributes of it’s own it might be an object. For example: an email address is probably going to be stored as a single text field, so it is not complex.

What kind of relationships does it have? If it has a simple relationship with another object, it might just be an attribute of that object. For example: an email address has a simple relationship with a user: each user has an email address, and as we noticed in test 3, the email address has no other attributes to store. So an email address is going to work well as an attribute or column in our user table. On the other hand, a Site Category has several possible attributes (name, description, priority), so it is complex. In addition, each category name will likely be in use more than once. So categories don’t make good attributes of a site table, and therefore we’ll need to make categories objects.

Does it stand alone? If it doesn’t have a relationship with any other objects, it might not be an object. For database storage of an object to be meaningful, it will need to relate to other objects. The exception to this is the case of ‘options’ tables, which are often implemented as standalone tables. ‘Options’ tables stand alone because they are a convenient place to store application configuration, and are operated on directly by the PHP app. Think of this kind of table as a slightly hack-ish replacement for a config file for php. The advantage of doing this in the database is that the options can be changed by the app.

Here’s our revised list again, with some notes:

Site – the fundamental unit of the directory will be the listing, and each listing is a site. Or is it? What if we want to allow not just site listings, but also page listings? It might be better if our terminology could be abstracted away from sites to a more general term. And come to think of it, what if in the future there are other resource types that get listed using our directory app? For example, RSS. Maybe a directory admin will want to include some RSS feeds in their listings. For these reasons, I’m going with the much more general term entry. Entries pass all the tests we defined earlier.

Site Category – we’ve already covered why categories are an object.

User – we will need login accounts for our app. But should they be users or admins, or should we have both? I’ve decided that it’s best to stick with user. Initially all users will be admins; we won’t allow non-admin accounts. Later, though we will want to allow user account creation. When that day comes, we will probably prefer a single login mechanism (DRY!), so storing all user types together makes sense. We can add an attribute indicating the user role type now, which will ensure our system is pretty much ready when the time comes to add non-admin account features.

Admin User – see User.

User Tracking – Let’s rename this to the clearer log. Log is a noun and it doesn’t duplicate any other objects. It is also likely to be complex: off the top of my head every log entry will need some kind of level indicator, a timestamp, a description, and so on. So the log is an object.

Site Tag – Tags have to be an object: each Site could have several tags, each tag could appear on multiple sites. And tags will have multiple attributes: Name, Description, slug. But do tags duplicate the features found in categories? WordPress treats them as individual instances of a taxonomy object. Some taxonomies have tag-like properties (for example, they are non-hierachical), others are like categories. Using Custom Post Types, you can add as many taxonomies as you like. This kind of abstraction works well for WordPress (which is used as an all-purpose CMS), but I don’t think we need in raddir. We’re really not building a CMS here. We don’t need a flexible system for adding multiple taxonomies: we just want Categories and Tags. While we’re at it, let’s rename the taxonomy tables tag and category.

Site Rank – This is an interesting one. We want an initial simple number-based ranking system. But it also needs to be flexible, with support for ranking plugins. These might use many different forms of data. How do we prepare our database schema with this in mind? Certainly site rank seems to pass our tests for object-ness. For now, let’s call it an object and rename it rank.

Database Schema – Rough Whiteboard design

Here’s our whiteboard db schema. It should cover what we need. I’ve gone through and listed attributes for all these tables. We’ll look at the attributes next. For now, just take a look at the schema. Especially note the relationships:

One to Many relationship between users and log rows – each user might appear on many log row, but each log row only concerns a single user. Thus the foreign key in log to the primary key in user.

One to Many relationship between users and entries (ie sites) – each user can add multiple sites, and sites always have a single owner

One to Many relationship between categories and entries – each category can be used in multiple entries, and each entry only has a single category

Many to Many relationship between tags and entries – each tag can be used in multiple entries, and each entry can have many tags. We will need to add another table to allow the expression of this relationship. With typical ingenuity we’ll call the new table entry_tag.

Database schema – attributes

Our database schema needs attributes in order to be meaningful. There are no secrets to knowing what attributes we need. Some of it is based on prior knowledge: I know we’ll need an email address and a salted, hashed password for our login mechanism. Other items on this list are based on habit: I like to add datetime stamps for most significant objects. But let’s save the commentary; here is a list of attributes that I think we’ll need, along with notes where necessary:

User

email – we’ll filter and validate this in PHP to make sure this unique.

IP Address – for all the good it will do, someone will ask for this anyway.

severity – think Syslog severity levels.

event name

details

category

name

parent category – categories are hierarchical.

slug – by slug I mean the way the name is represented in the URL. We want our app to use pretty URLs and not ids.

description

sort order – categories might need sort order / weight at some point.

count

status

tag – tags are much like categories, but they are not hierarchical and they are not subject to sort order / weighting.

name

slug

description

count

status

entry

user id

category id

name

url

slug

description

sort order

priority

status

options – We want to be able to distribute this app. The app will have some configuration that is probably best stored in the database. We could implement fixed column names, but then we’d have a single row with many columns. So implementing a property/value pair system is a good solution. It will also allow us to add features as needed without unnecessary database schema changes, and allow plugins to store their own configuration.

id

property

value

rank – this schema should work fine for the default ranking system, and I’m hoping it will work or our modular ranking too.

id

entry id

property

value

entry tag – this is our many-many table.

id

entry id

tag id

Complete Database Schema

Disclaimer: Ok, so there is no way this database schema is actually complete. For a start no site is ever complete: first you launch, then you tweak. Endlessly. Also our product is likely to change in scope even as we build it: we are essentially prototyping as we go. With this kind of model of development we are likely to make database schema changes along the way. But this is enough for us to get started with our app.