Normalization in Relational Databases: First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF)
What is database normalization? What are the different normal forms, and what do they do? Find out in this article.
Normalization in relational databases is a design process that minimizes data redundancy and avoids update anomalies. Basically, you want each piece of information to be stored exactly once; if the information changes, you only have to update it in one place.
The theory of normal forms gives rigorous meaning to these informal concepts. There are many normal forms. In this article, we’ll review the most basic:
- First normal form (1NF)
- Second normal form (2NF)
- Third normal form (3NF)
There are normal forms higher than 3NF, but in practice you usually normalize your database to the third normal form or to the Boyce-Codd normal form, which we won’t cover here.
So, what is this theory of normal forms? It deals with the mathematical construct of relations (which are a little bit different from relational database tables). The normalization process consists of modifying the design through different stages, going from an unnormalized set of relations (tables), to the first normal form, then to the second normal form, and then to the third normal form.
Don’t worry if this sounds complicated; I promise it will get clearer as we go through each step. Let’s start with 1NF – the first step.
First Normal Form (1NF)
A relation is in first normal form (1NF) if (and only if):
- Each attribute contains only one value.
- All attribute values are atomic, which means they can’t be broken down into anything smaller.
In practice, 1NF means that you should not have lists or other composite structures as attribute values. Below is an example of a relation that does not satisfy 1NF criteria:
Student courses
student | courses |
---|---|
Jane Smith | Databases, Mathematics |
John Lipinsky | English Literature, Databases |
Dave Beyer | English Literature, Mathematics |
This relation is not in 1NF because the courses attribute has multiple values. Jane Smith is assigned to two courses (Databases and Mathematics), and they are stored in one field as a comma-separated list. This list can be broken down into smaller elements (i.e. course subjects: databases as one element, mathematics as another), so it’s not an atomic value.
To transform this relation to the first normal form, we should store each course subject as a single value, so that each student-course assignment is a separate tuple:
Student courses
student | course |
---|---|
Jane Smith | Databases |
Jane Smith | Mathematics |
John Lipinsky | English Literature |
John Lipinsky | Databases |
Dave Beyer | English Literature |
Dave Beyer | Mathematics |
If you’re interested in reading more about the first normal form, I recommend the article What Is the Actual Definition of First Normal Form? by my colleague Konrad Zdanowski.
Second Normal Form (2NF)
A relation is in second normal form (2NF) if and only if:
- It is in 1NF.
- No non-prime attributes are functionally dependent on a subset of the candidate key(s). In other words, any column that’s not a key column is dependent on the whole information in the candidate key.
What does this mean? If the value of attribute A is determined by the value of attribute S, then A is functionally dependent on S. For example, your age is functionally dependent on your date of birth. For more on functional dependencies, see this article.
Let’s go back to the idea of candidate keys and non-prime attributes. What are they?
- A candidate key is a minimal set of attributes that determines the other attributes included in the relation. It’s minimal in that if you removed one attribute, the remaining attributes do not form a candidate key. For a more detailed exploration of keys, see this article.
- A non-prime attribute is an attribute that is not part of the candidate key. However, for a relation to be 2NF, the information stored by non-prime attributes must be related to the whole information in the candidate key.
Informally, the second normal form states that all attributes must depend on the entire candidate key.
Let’s see an example of a relation that does not satisfy 2NF. The underlined attributes are the candidate key.
Bike parts warehouse
part | supplier | quantity | supplier country |
---|---|---|---|
Saddle | Bikeraft | 10 | USA |
Brake lever | Tripebike | 5 | Italy |
Top tube | UpBike | 3 | Canada |
Saddle | Tripebike | 8 | Italy |
- The candidate key is the part and supplier set, which is expressed like this {part, supplier}.
- The non-prime attributes (which are not part of the candidate key) are quantity and supplier country.
- There are functional dependencies between part, supplier, and quantity (expressed as part, supplier → quantity ) and between supplier and supplier country (expressed as supplier → supplier country).
Why doesn’t this satisfy 2NF? The set {part, supplier} is the only candidate key of this relation. The value of supplier country is functionally dependent on supplier. Supplier country is not part of the candidate key, so it is a non-prime attribute and it is functionally dependent on part of the candidate key, not the entire candidate key {part, supplier}.
To transform this relation into 2NF, we need to split it into two relations: Bike parts (with the attributes part, supplier, and quantity) and Suppliers (with the attributes supplier and supplier country). This would look like as follows:
Bike parts
part | supplier | quantity |
---|---|---|
Saddle | Bikeraft | 10 |
Brake lever | Tripebike | 5 |
Top tube | UpBike | 3 |
Saddle | Tripebike | 8 |
The relation Bike parts is in 2NF because, as before, the quantity attribute depends on the pair supplier and part.
Suppliers
supplier | supplier country |
---|---|
Bikeraft | USA |
Tripebike | Italy |
UpBike | Canada |
The Suppliers relation is in 2NF because supplier country is functionally dependent on supplier, which is the candidate key of this relation.
Let’s see one more example of a non-2NF relation.
Student course fees
student | course | grade | course fee |
---|---|---|---|
Alison Brown | Databases | A | $100 |
Jason Liu | Mathematics | B | $150 |
Mariah Hill | Databases | B+ | $100 |
- Candidate key: {student, course}
- Non-prime attributes: grade, course fee
- Functional dependencies: student, course → grade; course → course fee
The following relation does not satisfy 2NF. The set {student, course} is the relation’s candidate key, but the value of course fee is functionally dependent on course alone. Course fee is a non-prime attribute, which is functionally dependent on only part of the candidate key.
To transform this into 2NF, we again split it into two relations: Student courses (with the attributes student, course, and grade) and Courses (with the attributes course and course fee). Thus, we avoid the partial dependency in the non-2NF relation above.
Student course
student | course | grade |
---|---|---|
Alison Brown | Databases | A |
Jason Liu | Mathematics | B |
Mariah Hill | Databases | B+ |
Courses
course | course fee |
---|---|
Databases | $100 |
Mathematics | $150 |
Why not try verifying for yourself that these relations are indeed 2NF?
Note that the 2NF partial dependency rule only kicks in if your relation has a composite candidate key (i.e. one that consists of multiple attributes). All relations that have a single-attribute key are by definition in 2NF.
Third Normal Form (3NF)
A relation is in third normal form (3NF) if and only if:
- It is in second normal form (2NF).
- All non-prime attributes are directly (non-transitively) dependent on the entire candidate key.
In other words, non-prime attributes must be functionally dependent on the key(s), but they must not depend on another non-prime attribute. 3NF non-prime attributes depend on “nothing but the key”.
Let’s see a non-3NF relation:
Order information
order_id | date | customer | customer email |
---|---|---|---|
1/2020 | 2020-01-15 | Jason White | white@example.com |
2/2020 | 2020-01-16 | Mary Smith | msmith@mailinator.com |
3/3030 | 2020-01-17 | Jacob Albertson | jasobal@example.com |
4/2020 | 2020-01-18 | Bob Dickinson | bob@fakemail.com |
- Candidate key: order_id
- Non-prime attributes: date, customer, customer email
- Functional dependencies: date depends on order_id (order_id → date); customer depends on order_id (order_id → customer), and customer email depends on customer (customer → customer email).
This relation does not satisfy 3NF. The only candidate key in this relation is order_id. The value of customer email is functionally dependent on the customer attribute, which is a non-prime attribute. Thus, the relation violates 3NF.
Once again, we split this into two relations: Orders (with the attributes order_id, date, and customer) and Customers (with the attributes customer and customer email):
Orders
order_id | date | customer |
---|---|---|
1/2020 | 2020-01-15 | Jason White |
2/2020 | 2020-01-16 | Mary Smith |
3/3030 | 2020-01-17 | Jacob Albertson |
4/2020 | 2020-01-18 | Bob Dickinson |
Customers
customer | customer email |
---|---|
Jason White | white@example.com |
Mary Smith | msmith@mailinator.com |
Jacob Albertson | jasobal@example.com |
Bob Dickinson | bob@fakemail.com |
Orders is in 3NF because the date and customer attributes do not violate the rule of 3NF; their values depend on the order_id number. Customers is in 3NF because customer email is functionally dependent on customer, which is the candidate key of this relation. In both cases, all non-prime attributes depend on the candidate key.
Let’s see one more non-3NF example.
Courses
course | year | teacher | teacher date of birth |
---|---|---|---|
Databases | 2019 | Chris Cape | 1974-10-12 |
Mathematics | 2019 | Daniel Parr | 1985-05-17 |
Databases | 2020 | Jennifer Clock | 1990-06-09 |
- Candidate key: {course, year}
- Non-prime attributes: teacher, teacher date of birth
- Functional dependencies: teacher depends on course and year (course, year → teacher); teacher date of birth depends on teacher (teacher → teacher date of birth)
This relation does not satisfy 3NF. The only candidate key in this relation is {course, year}, but the value of teacher date of birth is functionally dependent on teacher – a non-prime attribute. This violates 3NF.
Guess how we’ll transform this into 3NF? That’s right; we split the relation. Courses gets the attributes course, year, and teacher; Teachers gets the attributes teacher and teacher date of birth:
Courses
course | year | teacher |
---|---|---|
Databases | 2019 | Chris Cape |
Mathematics | 2019 | Daniel Parr |
Databases | 2020 | Jennifer Clock |
Teachers
teacher | teacher date of birth |
---|---|
Chris Cape | 1974-10-12 |
Daniel Parr | 1985-05-17 |
Jennifer Clock | 1990-06-09 |
Try verifying that these relations are indeed in 3NF for yourself. How would you explain the changes made?
Database Normalization: Summary
First, second, and third normal forms are the basic normal forms in database normalization:
- The first normal form (1NF) states that each attribute in the relation is atomic.
- The second normal form (2NF) states that non-prime attributes must be functionally dependent on the entire candidate key.
- The third normal form (3NF) states that non-prime attributes must be directly (non-transitively) dependent on candidate keys.
Stay tuned to our blog for more articles on database normalization!