What is Database Normalization in SQL Server – MS SQL Server – Zero to Hero Query Master – Part 4

What is Database Normalization

Database normalization is the step by step process of organizing data to minimize data redundancy i.e. Data duplication which in turn ensures data consistency

  • Normalization is a database design technique that reduces data redundancy and eliminates undesirable characteristics like Insertion, Update and Deletion Anomalies.
  • Normalization rules divide larger tables into smaller tables and link them using relationships.
  • The purpose of Normalization in SQL is to eliminate redundant (repetitive) data and ensure data is stored logically.
  • The inventor of the relational model Edgar Codd proposed the theory of normalization of data with the introduction of the First Normal Form, and he continued to extend theory with Second and Third Normal Form. Later he joined Raymond F. Boyce to develop the theory of Boyce-Codd Normal Form.

Problems of Data Redundancy

Let’s have a look at the example with our Student details as real time scenario.

  • Disk Space Wastage
  • Data Inconsistency
  • DML queries can become slow
Fig 1. Data repeated Student Details Table
  • So, to reduce the data redundancy, we can divide this large badly organized table into two (Student Details and Departments), as shown below.
Fig.2 Move Department Details to one table (Department Details Table)
Fig.3 Student Details Table
  • Now, we have reduced redundant department data. So, if we have to update the department head name, we only have one row to update, even if there are 10 million employees in that department.

Normalization Forms

In Database, there are certain rules to process Database Normalization. There are 6 Normalization Forms,

  • 1 NF
  • 2 NF
  • 3 NF
  • 4 NF
  • 5 NF
  • 6 NF

1 NF (First Normalization Form)

  • The data in each column should NOT be separated by Commas.
  • The data in each column should NOT be separated by Commas.
  • Identify each record uniquely using primary key

2NF (Second Normal Form) Rules

A table is said to be in 2NF, if

  1. The table meets all the conditions of 1NF
  2. Move redundant data to a separate table
  3. Create relationship between these tables using foreign keys.
Fig.4 Another Example

There is a lot of redundant data in the table. Let’s say, in my organization, there are 100,000 employees and only 2 departments (IT & HR). Since we are storing DeptName, DeptHead, and DeptLocation columns also in the same table, all these columns should also be repeated 100,000 times, which results in unnecessary duplication of data.

So this table is clearly violating the rules of the second normal form

, and the redundant data can cause the following issues.

  1. Disk space wastage
  2. Data inconsistency
  3. DML queries (Insert, Update, Delete) can become slow

Now, to put this table in the second normal form, we need to break the table into 2, and move the redundant department data (DeptName, DeptHead and DeptLocation) into it’s own table. To link the tables with each other, we use the DeptId foreign key. The tables below are in 2NF.

Fig. 5 Applied 2NF

Third Normal Form (3NF)

A table is said to be in 3NF, if the table

  1. Meets all the conditions of 1NF and 2NF
  2. Does not contain columns (attributes) that are not fully dependent upon the primary key

The table below, violates third normal form, because AnnualSalary column is not fully dependent on the primary key EmpId. The AnnualSalary is also dependent on the Salary column. In fact, to compute the AnnualSalary, we multiply the Salary by 12. Since AnnualSalary is not fully dependent on the primary key, and it can be computed, we can remove this column from the table, which then, will adhere to 3NF.

Fig. 6 Table Violating 3 NF

In the table below, DeptHead column is not fully dependent on EmpId column. DeptHead is also dependent on DeptName. So, this table is not in 3NF.

Fig.7 Table Violating 3 NF

To put this table in 3NF, we break this down into 2, and then move all the columns that are not fully dependent on the primary key to a separate table as shown below. This design is now in 3NF

Fig.8 Table with 3 NF

BCNF (Boyce-Codd Normal Form)

Even when a database is in 3rd Normal Form, still there would be anomalies resulted if it has more than one Candidate Key.

Sometimes is BCNF is also referred as 3.5 Normal Form.

4NF (Fourth Normal Form) Rules

If no database table instance contains two or more, independent and multivalued data describing the relevant entity, then it is in 4th Normal Form.

5NF (Fifth Normal Form) Rules

A table is in 5th Normal Form only if it is in 4NF and it cannot be decomposed into any number of smaller tables without loss of data.

6NF (Sixth Normal Form) Proposed

6th Normal Form is not standardized, yet however, it is being discussed by database experts for some time. Hopefully, we would have a clear & standardized definition for 6th Normal Form in the near future.

Summary

  • Database designing is critical to the successful implementation of a database management system that meets the data requirements of an enterprise system.
  • Normalization Process in DBMS helps produce database systems that are cost-effective and have better security models.
  • Functional dependencies are a very important component of the normalize data process
  • Most database systems are normalized database up to the third normal forms.

Other Articles in this Series

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Website Built with WordPress.com.

Up ↑

<span>%d</span> bloggers like this: