Why do you need all of this normalization stuff? The main goal is to avoid redundancy in your data. Redundancy can lead to various anomalies when you modify your data. Every fact should be stored only once and you should know where to look for each fact. The normalization process brings order to your filing cabinet. You decide to conform to certain rules where each fact is stored.
Nowadays the go-to normal forms are either the Boyce-Codd normal form (BCNF), which we will cover here today, or the third normal form (3NF), which will be covered later. (Yes, there are also the first, second, fourth, fifth normal form. We’ll talk about them later.)
The Boyce-Codd Normal Form
A relational schema R is considered to be in Boyce–Codd normal form (BCNF) if, for every one of its dependencies X → Y, one of the following conditions holds true:
- X → Y is a trivial functional dependency (i.e., Y is a subset of X)
- X is a superkey for schema R
Informally the Boyce-Codd normal form is expressed as “Each attribute must represent a fact about the key, the whole key, and nothing but the key.”
Example
Let’s take a look at this table, with some typical data. The table is not in BCNF.
Author | Nationality | Book title | Genre | Number of pages |
---|---|---|---|---|
William Shakespeare | English | The Comedy of Errors | Comedy | 100 |
Markus Winand | Austrian | SQL Performance Explained | Textbook | 200 |
Jeffrey Ullman | American | A First Course in Database Systems | Textbook | 500 |
Jennifer Widom | American | A First Course in Database Systems | Textbook | 500 |
The nontrivial functional dependencies in the table are:
author → nationality book title → genre, number of pages
We can easily see that the only key is the set {author, book title}.
The same data can be stored in a BCNF schema. However, this time we would need three tables.
Author | Nationality |
---|---|
William Shakespeare | English |
Markus Winand | Austrian |
Jeffrey Ullman | American |
Jennifer Widom | American |
Book title | Genre | Number of pages |
---|---|---|
The Comedy of Errors | Comedy | 100 |
SQL Performance Explained | Textbook | 200 |
A First Course in Database Systems | Textbook | 500 |
Author | Book title |
---|---|
William Shakespeare | The Comedy of Errors |
Markus Winand | SQL Performance Explained |
Jeffrey Ullman | A First Course in Database Systems |
Jennifer Widom | A First Course in Database Systems |
The functional dependencies for this schema are the same as before:
author → nationality book title → genre, number of pages
The key of the first table is {author}. The key of the second table is {book title}. The key of the third table is {author, book title}. There are no functional dependencies violating the BCNF rules, so the schema is in Boyce-Codd normal form.
How Do You Decompose Your Schema into Boyce-Codd Normal Form?
To go from non-BCNF normal form to BCNF, you must decompose your table using these two steps.
- Find a nontrivial functional dependency X → Y which violates the BCNF condition (where the X is not a superkey)
- Split your table in two tables:
- one with attributes XY (all attributes from the dependency),[1]
- one with X attributes together with the remaining attributes from the original relation
Then you keep repeating the decomposition process until all of your tables are in BCNF. After sufficient iterations you have a set of tables, each in BCNF, such that the original relation can be reconstructed.
Example
Let’s get back to the example. The functional dependency:
book title → genre, number of pages
is one FD violating the BCNF rules. We split our relation into two relations:
- the ones in the functional dependency (book title, genre, number of pages)
- the rest: (book title, author, nationality). Note that the left hand-side of the FD (book title) stays in the relation!
The example data look like this. We select the values of columns from the original relation and we eliminate the duplicate rows.
Book title | Genre | Number of pages |
---|---|---|
The Comedy of Errors | Comedy | 100 |
SQL Performance Explained | Textbook | 200 |
A First Course in Database Systems | Textbook | 500 |
Author | Nationality | Book title |
---|---|---|
William Shakespeare | English | The Comedy of Errors |
Markus Winand | Austrian | SQL Performance Explained |
Jeffrey Ullman | American | A First Course in Database Systems |
Jennifer Widom | American | A First Course in Database Systems |
Are we done? Nope. The (book title, genre, number of pages) table is in BCNF. But (book title, author, nationality) isn’t. We have the dependency:
author → nationality
Together with the trivial dependency
book title → book title,
the pair (book title, author) is the key of the relation.
We have to decompose the table one more time. This time we decompose into:
- columns forming the functional dependency: (author, nationality)
- the remaining columns: (author, book title)
This time every table is in BCNF.
¹ This step is not entirely accurate. It will work in typical cases though, so for now we’ll keep it simple.