We present a practical example of how to normalize a table that is in the first normal form (1NF) into a set of tables in the third normal form (3NF).
Database normalization is a very important concept in database design. Every database designer should know the rules for different normal forms. In practice, the most important normal form is the third normal form (3NF). If you want to be a database designer, you should know how to check if a table is in 3NF. You should also know how to normalize a table that’s not in 3NF into one in 3NF.
As you recall, a table is in 3NF if:
- It is in second normal form (2NF).
- All non-prime attributes are directly (non-transitively) dependent on the entire candidate key.
Typically, you normalize a table from 1NF to 3NF in two steps: first you normalize it into 2NF, then you normalize that into 3NF. In this article, we’ll show you an example of normalization from 1NF through 2NF into 3NF. We’ll also discuss problems with 1NF tables that are solved when you normalize them into 2NF.
Normalization to 2NF
We’ll work with a table named Courses1
, which contains information about courses offered at a certain university: the titles of the courses, the lecturers, the departments, and the recommended textbooks for each course. The table is in the first normal form, since all column values are atomic. Here’s the table:
Courses1
course | department | lecturer | textbook |
---|---|---|---|
Relational Databases | Computer Science | Jeremy Brown | Database Systems |
Relational Databases | Computer Science | Jeremy Brown | Introduction to Databases |
Cloud Databases | Computer Science | Jeremy Brown | Database Systems |
Cloud Databases | Computer Science | Jeremy Brown | BigQuery Manual |
Algorithms | Computer Science | James Cormen | Intro to Algorithms |
Data Science | Mathematics | Jane Downing | Database Systems |
Data Science | Mathematics | Jane Downing | Statistics |
Calculus | Mathematics | John Smith | Intro to Calculus |
There are several problems with various operations on data in this table:
INSERT
: You can't add a course with no textbooks.UPDATE
: To change the lecturer for the courseRelational Databases
, you have to change two rows.DELETE
: If you remove theAlgorithms
course, you also remove the lecturerJames Cormen
.
These problems come from the fact that this table is not in 2NF. Let’s see why. There are three non-trivial functional dependencies in this table:
course, textbook -> lecturer, department
: Thecourse
andtextbook
determine thelecturer
and thedepartment
.course -> lecturer, department
: Thecourse
determines thelecturer
and thedepartment
.lecturer -> department
: Thelecturer
determines the department.
The candidate key in this table is the set {course
, textbook
}.
It is not in 2NF, because we have functional dependencies with only a part of the candidate key on the left hand-side of a dependency. The functional dependency course
-> lecturer
, department
violates the rules for 2NF.
To normalize this, we need to get rid of the functional dependency that violates 2NF: course
-> department
, lecturer
.
The normalization procedure usually involves decomposing a table into two or more tables that contain the same information. In this case, we’ll extract the columns of the functional dependency violating 2NF, that is, course
, department
, and lecturer
, into a separate table, Courses2
, with the candidate key {course
}. The remaining column in Courses1
is the textbook
column. Of course, we can’t just put it into a new table, because we’ll lose the course-textbook relationship. So, we will put both columns, course
and textbook
, into another table, CourseTextbooks
.
To decompose a table into a set of columns, you identify these columns, take the data from those columns only, and remove the duplicates. For example, we create a table named Courses2
by first taking the columns course
, department
, and lecturer
from Courses1
. We then take the data from these columns and remove the duplicates. There are two rows with all of the values Relational
Databases
, Computer Science
, and Jeremy Brown
in Courses1
, but there is only one row with all of these values in Courses2
.
Here’s the final result of the split:
Courses2
course | department | lecturer |
---|---|---|
Relational Databases | Computer Science | Jeremy Brown |
Cloud Databases | Computer Science | Jeremy Brown |
Algorithms | Computer Science | James Cormen |
Data Science | Mathematics | Jane Downing |
Calculus | Mathematics | John Smith |
CourseTextbooks
course | textbook |
---|---|
Relational Databases | Database Systems |
Relational Databases | Introduction to Databases |
Cloud Databases | Database Systems |
Cloud Databases | BigQuery Manual |
Algorithms | Intro to Algorithms |
Data Science | Database Systems |
Data Science | Statistics |
Calculus | Intro to Calculus |
You can reconstruct the same information in Courses1
: simply join the data from Courses2
and CourseTextbooks
.
Both of these tables are in 2NF. There are no functional dependencies within CourseTextbooks
, and the candidate key is {course
, textbook
}. The functional dependencies in Courses1
are:
course
->department
,lecturer
.lecturer
->department
.
The candidate key is {course
}, and there are no functional dependencies on only a part of a key, so it is in 2NF.
The following problems have been removed in the new database:
INSERT
: You can now add a course with no textbooks. Simply add the course toCourses2
without adding rows toCourseTextbooks
.UPDATE
: You can change the lecturer for the courseRelational Databases
by changing just one row inCourses2
.
This problem still persists:
DELETE
: If we remove theAlgorithms
course, we also remove the lecturerJames Cormen
.
We’ll address this problem in a little.
The Algorithm for Normalizing to 2NF
Here’s the general algorithm for normalizing a table from 1NF to 2NF.
Suppose you have a table R
with scheme S
which is in 1NF but not in 2NF. Let A -> B be a functional dependency that violates the rules for 2NF, and suppose that the sets A and B are distinct (A ∩ B = ∅).
- Let C = S – (A U B). In other words:
- A = attributes on the left-hand side of the functional dependency.
- B = attributes on the right-hand side of the functional dependency.
- C = all other attributes.
- We can split
R
into two parts:R1
, with scheme C U A.R2
, with scheme A U B.
- The original relation can be recovered as the natural join of
R1
andR2
: R =R1 NATURAL JOIN R2
.
Normalization to 3NF
Let’s look again at Courses2
. It is in 2NF but not in 3NF.
Courses2
course | department | lecturer |
---|---|---|
Relational Databases | Computer Science | Jeremy Brown |
Cloud Databases | Computer Science | Jeremy Brown |
Algorithms | Computer Science | James Cormen |
Data Science | Mathematics | Jane Downing |
Calculus | Mathematics | John Smith |
We have the following functional dependencies in this table:
course
->lecturer
,department
.lecturer
->department
.
The candidate key is {course
}.
The table is not in 3NF, because there is a transitive functional dependency from the candidate key course
to department
(via course
-> lecturer
and lecturer
-> department
). The functional dependency lecturer
-> department
violates the rules for 3NF.
To convert Course2
into 3NF, we have to remove this functional dependency. The procedure is similar to the one we performed for 2NF: we extract the columns lecturer
and department
into a separate table, Lecturers
. The remaining column in Courses2
is course
. We need to keep the course-lecturer connection, so the other table is Courses3
with columns course
and lecturer
.
Here is the result:
Lecturers
lecturer | department |
---|---|
Jeremy Brown | Computer Science |
James Cormen | Computer Science |
Jane Downing | Mathematics |
John Smith | Mathematics |
Courses3
course | lecturer |
---|---|
Relational Databases | Jeremy Brown |
Cloud Databases | Jeremy Brown |
Algorithms | James Cormen |
Data Science | Jane Downing |
Calculus | John Smith |
The problems we had in 2NF are now resolved:
INSERT
: You can now add a course with no textbooks. Simply add the course toCourses3
without adding rows toCourseTextbooks
.UPDATE
: You can change the lecturer forRelational Databases
by changing just one row inCourses3
.DELETE
: Removing theAlgorithms
course does not remove the lecturerJames Cormen
.
Get Started With Database Normalization
To normalize a table from 1NF to 3NF, you need to normalize it to 2NF first then to 3NF. In the normalization process, you decompose a table into multiple tables that contain the same information as the original table. The normalization process usually removes many problems related to data modification.
If you liked this article, check out other normalization articles on our blog.
If you’re a student taking database classes, make sure to create a free academic account in Vertabelo, our online ER diagram drawing tool. It allows you to draw logical and physical ER diagrams directly in your browser.
Vertabelo supports PostgreSQL, SQL Server, Oracle, MySQL, Google BigQuery, Amazon Redshift, and other relational databases. Try it out – see how easy it is to get started!