Back to articles list

# How (And How Not) to Decompose Relations

When you read about normalization you usually get the set of conditions that a database in the nᵗʰ normal form should satisfy and the set of rules, a sort of a cook-book, for obtaining that normal form. But why these rules are safe to apply to your denormalized relations is a skip material. Here, I would like to present some elementary concepts on how we decompose relations and what can go wrong.

The concepts I want to present may seem a bit complicated but, on the other hand, they are so basic that you rarely see it written. Still, they are essential for understanding how the normalization process works. One may prepare a perfect meal just by following a recipe but no one can master the art of cooking without understanding what's going on behind-the-scenes. The same holds true with databases.

I will talk about relations with functional dependencies only. Some time ago, Agnieszka did an excellent exposition on functional dependencies. Here, I only briefly review a basic fact. In a relation R, an attribute B is functionally dependent on attributes A1, …, An.(denoted as A1, …, An → B), if for every possible scenario there cannot be two tuples in R that are equal on all A's but differ on B.

I will also need a concept of a natural join. Well, I bet you know what a natural join is but let me say something, just in case. If you have two relations, say Dogs (dog_id, dog_name, owner_id) and Owners(owner_id, owner_name, address) then a natural join of Dogs and Owners consists of all tuples from Dogs merged with tuples from Owners that have the same value on the columns with the same name. In our case this column is owner_id and we get a relation with all information about a dog and its owner in one row. A notation for a natural join is Dogs⋈Owners and it has columns dog_id, dog_name, owner_id, owner_name, address (a common column is not repeated). But be careful, if instead of dog_name and owner_name we had in both tables a column name, then a natural join would give us only dogs and their owners that have the same name! Usually this is not what we want.

### How Do You Decompose a Relation?

Now, let’s go back to the subject at hand. Our working example will be the relation LazyAuthors.

Author Title Year
John Doe The story of my life 2011
John Doe The story of my uncle's life 2012
Mary Major Quantum gravity for dummies 2012

Let us assume that no author publishes two books in the same year (they are indeed lazy authors) and that no two authors publish books under the same title. We allow that there are two editions of the same book in different years. We can write these dependencies as:

author, yeartitle and titleauthor.

What happens when you decompose a relation? Let’s decompose LazyAuthors into two relations, AuthorYear and TitleYear. We choose for the first relation the attributes author and year and perform a projection of LazyAuthors onto these columns. Thus, AuthorYear may be understood as the result of the following query: `select author, year from LazyAuthors`. Similarly, TitleYear may be understood as the result of: `select title, year from LazyAuthors`. And here we have our new relations!

Author Year   Title AuthorYear TitleYear John Doe 2011 The story of my life 2011 John Doe 2012 The story of my uncle's life 2012 Mary Major 2012 Quantum gravity for dummies 2012

You may often meet a notation for projection with π and the set of attributes on which we project. Then, we could write AuthorYear as π{author, year}(LazyAuthors) and TitleYear could be written as
π{title, year}(LazyAuthors). In what follows I will use this notation.

### Why can’t I just split my relation into two?

Splitting is just one side of a normalization process. Once you split an immediate question arises: how could I retrieve my original relation from the two split ones? After all, you do not want to lose some information by decomposing, do you? A usual answer for this is: use natural join! It is a good answer but it works only if we split our original relation neatly. Do you remember our two relations AuthorYear and TitleYear? If we perform a natural join AuthorYearTitleYear we get something like

Author Title Year
John Doe The story of my life 2011
John Doe The story of my uncle's life 2012
John Doe Quantum gravity for dummies 2012
Mary Major Quantum gravity for dummies 2012
Mary Major The story of my uncle's life 2012

Look, we got some new tuples! Oops! Since two authors wrote a book in the year 2012 they were both joined with two titles by the equality of the year column. What’s worse, if we only have relations AuthorYear and TitleYear there is no way to reconstruct the original LazyAuthors. We will never know who wrote Quantum gravity for dummies.

Thus, if I decompose a relation R as πX(R) and πY(R), how could I secure the equality R = πX(R)⋈πY(R)? It would be enough that the intersection A of X and Y, A = X ∩ Y, implies all attributes in Y (or that A implies values of all attributes in X, for that matter). If it holds then any tuple from R generates exactly one tuple in πY(R) and any tuple in πX(R) will be joined with exactly one tuple from πY(R). This suffices for the equality to hold.

What does it mean for our example? Let's decompose LazyAuthors into TitleYear and AuthorTitle (remember TitleAuthor!). Now, after joining we get LazyAuthors again. Hurray! So, is such a decomposition correct? Not so easy.

### Lazy Authors Shall Never Be Busy!

After we decomposed a relation we work only with the relations we obtained, the original one is lost and may be reconstructed only with a join. Problems may arise when AuthorYear and TitleYear change over time. Assume that we add to TitleYear a tuple (The story of my aunt’s life, 2011). This is legal since there are no constraints on TitleYear. After the join we get something like this.

Author Title Year
John Doe The story of my life 2011
John Doe The story of my aunt's life 2011
John Doe The story of my uncle's life 2012
Mary Major Quantum gravity for dummies 2012

Our LazyAuthors are no longer lazy! But why did this happen? After decomposing we lost one functional dependency. The attributes of author, year → title were split into two tables and we lost this constraint in our database. A lesson to remember is that when we decompose a relation, we should think about all possible data our database may acquire. Therefore, we distribute functional dependencies of our original table as well. If we lost a functional dependency then it is possible that sometime in the future we insert data to decomposed tables that will not satisfy our original constraints after a join.

### Two Conclusions

We ended with two rules. When decomposing a relation R into πx(R) and πy(R), with A= X ∩Y, we should have:

1. A implies all attributes in X or in Y
2. All functional dependencies of R may be reconstructed from functional dependencies of πX(R) and πY(R)

Now, if all constraints of R are functional dependencies and if you meet the above requirements then you will have R=πX(R)⋈πY(R) (point 1) and you will always get a legal relation R after a join (point 2).

Of course, there are more constraints than functional ones and, of course, there is a whole theory behind the second point but this is already a long post. At last, let me mention that the functional dependencies of our LazyAuthors are based on an example by Beeri and Bernstein from 1979[1]. With these examples they showed that you cannot always get a Boyce-Codd normal form. But this is a topic for another story.

1 Beeri, Catriel and Bernstein, Philip A. Computational problems related to the design of normal form relational schemas. ACM Transactions on Database Systems 4(1979), p. 30-59.