Pages

May 24, 2014

All about Normalization!

Normalization is one of the favorite topics of interviewee, here are the basics about what Normalization is all about.

What is Normalization?
A normalization is the process of organizing the columns, tables of a database to minimize the redundancy of data and ensure data dependencies make sense( means storing related data in a table).

It involves in dividing large tables into smaller tables and defining relationships between them. What are different types of Normalization Levels or Normalization Forms?
The technique of normalization includes some rules, which when applied, changes the unnormalized data to a normalized-structured state. The process of nomalizing is implemented in 3 levels which as known as normal forms.

First Normal Form:
1). Identify the primary key.
2). Remove all duplicate columns from the same table.
3). Create separate tables for each group of related data and identify each row with a unique column or set of columns (Primary Key).
4). Add a primary key column to the new table. It should be the same column as the foreign key in the base table.

Second Normal Form:
First it should meet the requirement of first normal form.

All the non-primary key column should be dependent on the whole primary key. When a non primary key column is partially dependent on primary key then it is known as partial dependency. In 2nd normal form this partial dependency is removed.Frequently asked SQL queries in interview..

How to remove Partial Dependency?
a). First you need to identify the columns which are partially dependent on primary key.
b). Remove those column from the base table and create another table, in which all the non-primary column are dependent on primary key of newly created table.

Third Normal Form:
First it should meet the requirements of second normal form. in 3rd NF all the non-primary key columns in the tables are directly dependent on the primary key and are not dependent on any other non-primary key.

1). First step is to identify the non-primary key column that depend on other non-primary key columns.
2). Then you need to remove these columns from the base table.
3). Create another table with this removed columns and include the non-primary key column that they are dependent on making it the primary key.
4). Create a foreign key in the base table and link it to primary key of newly created table.

What is Transitive Dependency?
It's a situation where a non-key column of a table is dependent on another non-key column.

What is De-normalization?
The de-normalization is the process of optimizing the read performance of a database by adding redundant data or by grouping data.

Let us take an example:

Student Details

Course Details

Result Details

Roll No

Student Name

DOB

Course ID

Course Name

Duration in Month

Date Of Exams

Marks Obtained

Grade

1001

Ram

11/9/1986

M4

Basic Maths

7

11/11/2004

89

A

1002

Shyam

12/8/1987

M4

Basic Maths

7

11/12/2004

78

B

1001

Ram

23/06/1987

H6

4

11/13/2004

87

A

1003

Sita

16/07/1985

C3

Basic Chemistry

11

11/14/2004

90

A

1004

Gita

24/09/1988

B3

8

11/15/2004

78

B

1002

Shyam

23/06/1988

P3

Basic Physics

13

11/16/2004

67

C

1005

Sunita

14/09/1987

P3

Basic Physics

13

11/17/2004

78

B

1003

Sita

23/10/1987

B4

5

11/18/2004

67

C

1005

Sunita

13/03/1990

H6

4

11/19/2004

56

D

1004

Gita

21/08/1987

M4

Basic Maths

7

11/20/2004

78

E

The above table contains the student and result details. Above table has below issues:

1). Firstly, we cannot insert prospective course which does not have any registered student or we cannot insert student details that is yet to register for any course.
2). Secondly, if we want to update the course M4’s name we need to do this operation three times. Similarly we may have to update student 1003’s name twice if it changes.
3). Also, if we want to delete a course M4 , in addition to M4 occurs details , other critical details of student also will be deleted. This kind of deletion is harmful to business. Moreover, M4 appears thrice in above table and needs to be deleted thrice.
4). Course M4’s data is stored thrice and student 1002’s data stored twice .This redundancy will increase as the number of course offerings increases.