Login

Plan a Good Database

Organize a collection of data somehow for later usage and you will get a database. In the last couple of years, with the spread of the Internet, databases have become something you will come across everywhere on the World Wide Web. Most websites have a strong database behind them. With this phase of the web’s evolution, if you work in software development, eventually you will need to create and plan a database from scratch. This article is here to teach you and show you what traits a solid plan should have.

To start, we need to separate the concept of data and information. While in the everyday usage these two merge, in the world of databases these are two very separate things. Data is the representation of facts/concepts of stuff that is adequate for human or automatic (by a machine) processing. The information is the meaning that we allocate to the data. For example, a ten percent salary decrease is a piece of data. For the employee, this is a bad thing (information), while for the boss it can be a good thing (also information).

The significance of a given piece of data depends on the current user. Therefore, we can work up and organize a structure for the data. You should leave the determination of the information out of the database, because this it is not always possible at this level. The application that uses the database should handle this (like the Windows/Mac/Linux application or the website).

A database management system is software that organizes and keeps your data together. This should work fast if you have an enormous quantity of data. There are two products from which you can choose currently on the market. There is the free version, MySQL; and the professional and expensive one, available from Oracle.

Actually, in 2009 Oracle bought MySQL, which has therefore become a slimmer version of Oracle’s database. You can achieve most of the same tasks in both interfaces. However, you will have to work a lot harder (in the sense of extra code to write) if you take the MySQL path. MySQL is currently up to the fifth main version.

My advice is to travel the Oracle path if you do it for a large company and can afford the extra cash. Otherwise, go for MySQL if you do it for your personal use or the $200+ price tag creates an impediment.

{mospagebreak title=Data Model}

Once you have the database management software (DBMS) up and running on your computer, your next step is to plan the database. The data model is a general theoretical structure of the database, and it is independent from the DBMS in which you implement it. To present how to construct this is the main objective of this article. The data model consists of a marking system that describes the data and an operation set that we use to organize the data.

We use the entity-relationship (ERM) data model to construct a conceptual structure for our database. The building blocks of the ERM are entity, attributes and relations. An entity is anything that can exist on its own and that we can differentiate an instance of it from another. For example, in planning the database of a company, entities are the employees, managers, branch shops (if they exist) and so forth.

While an entity may have multiple traits, one of its traits (or a combination of them) should identify uniquely any instance of the entity. To go back to the database of the company, this is the personal identification number of the employee. This is the key of the entity. In a company spread around the world, this could also be the employee’s country of residence and passport number.

By no means should this include only a single trait. Furthermore, if a few traits of the entity cannot identify it uniquely, we can create an internal ID number that we assign to it upon entering it into the database. For example, there is the account or membership number you receive from the gym, which is generated when you sign up. It can even be text, as in the case of your e-mail address. You can create whatever you want. The idea is to immediately and uniquely identify an instance of the entity.

In the ERM, an entity is a rectangle with the name of the entity in the middle:

The attributes of an entity (traits of the instance) we model with an ellipse with the name of it inside it. We link this with a line to the entity to which it corresponds, and if it is a key, you underline the attribute word. For example:

The final piece is the relationship. We use this to link together two or more entities. The object used is a diamond in the middle containing the name of the connection, and two lines from and to the rhombus to connect it with the entity. A relationship can be, for example, between the shop and the employee. The employee works in the shop.

The relationship itself can have traits if it symbolizes the work, and may differ depending on to whom it is connected. A good example for this is the connection between the supplier and the shop. The price at which the supplier sells depends on the relationship, as next time he may charge extra or make a better offer to the shop. In this case the relationship will have a price attribute, and for this we use the same ellipsoid notation paired with the line connection to the diamond.

A full ERM diagram will look like this:

– Image Courtesy of Wikipedia –

{mospagebreak title=Relationship Types}

We can structure relationships into three categories. A one to one connection is between entity one (E1) and entity two (E2) if E1 can have a relationship with at most one entity from the set of E2. This type of relationship is rare. This is the case in a database where we represent separately husbands and wives in a separate entity. In a monogamist society, a man can have at most one wife (nevertheless, it is possible that he has none).

The second type is the "one to many" relationship. In this case, you can associate E1 to zero or more entities from E2. However, going the other way, an entity from E2 may correspond to only a single E1 entity or none. For example, a company may have multiple branches, and a single branch may have multiple employees. Nevertheless, an employee may work in only one branch of our company at a time (please exclude the special cases of someone going to work in our branch in New York from Monday to Wednesday and in Washington on the other days).

The final relationship type is "many to many." This relationship type is troublesome and needs to be eliminated in the planning phase. For instance, an artist may sing one or more songs, and more than one artist may sing a song. In the ERM, to visualize this we put an arrow on all the relationship’s connecting lines on the side where we connect to "one" entity. This means on both ends for the one to one relationship and on the one side of the "one to many" relationship.

Here I should introduce the "is a" relationship. Some entities may have multiple purposes at the same time, and one of them is in fact just a specialization of another one. For example, the manager of a shop is also an employee. In this case, the manager will keep all the traits of an employee; the key of the employee will also be his key, and it may have additional attributes. In an ERM diagram, this looks like the screen below (imagine the attributes are there also):

{mospagebreak title=Relational Model}

As good as the ERM diagram may be at visualizing our plan, for optimization reasons we need to transform it into another model: the relational model. This is the form in which we will find it in the database: the tables.

For this representation there are a couple of rules. Two arbitrary rows cannot be the same. The set of combined attributes that forms the key cannot repeat itself in more than one instance (so the key is indeed unique). The attributes forming the key may be primary or foreign key (identifying a relationship, not the entity itself). A primary key attribute cannot be null or empty.

The order of the instances of the entity (the relation) in the table does not matter. Two attributes within the same entity cannot have the same name. These few constraints will allow us to create a database that we can manage with ease. The collection of relations (tables) will form the relational database.

How do we transform the ERM to the relational model? Here are the steps to follow. First, write down the entities, and list after them their attributes. We underline the keys. Second, write up the relationship names. For those in the parentheses, we will get the attributes that belong to that relationship. Furthermore, both key sets from the connecting entities will get in.

What will be the new primary keys for these relations? It depends on the relationship type. If it is "one on one," either one of the E1 or E2 sets can be the key set for this. If it is "one to many" then the key set at the many side will be the key. And finally, when you have an "many to many" relationship, the new key will be a composed key, and both key sets from E1 and E2 will form the new key set.

In the case of an "is a" connection, the specialization will have the key of the general entity, and will have its attributes as well. The third and final step in the transformation is to determine the foreign keys. If two relationships at this stage have the same key, we unify those two relationships and substitute a single relationship. The relationships that have a composed key (multiple attributes as keys) cannot be unified.

Before we create the database, first we need to further improve it by bringing it to the third normal form. I will leave this to my next article, where, besides the normalization of the databases, I will teach you how to visualize your plan in MySQL Workbench.

Additionally, we will write and generate the database creation code. I will use the MySQL syntax, however the Oracle one is pretty similar. If you understand the concept, with just a few modifications it should work for Oracle too. Therefore, by the end of that article we should have a fully operational and well-planned database.

Thanks for reading my article; I do hope that you learned a lot today. I invite you to share your thoughts about this subject or your comments on the article in the blog following the article. If you have any questions, I will answer them as soon as possible. If you would like to ask a community to help you with your issue you can join the friendly DevHardware forum. Rate my article if you liked it or not and remember to Live With Passion!