The First Normal Form (1NF) is exceptional. The other normal forms (2NF, 3NF, BCNF) talk about functional dependencies and 1NF has nothing to do with functional dependencies. Moreover, we have precise definitions for other normal forms and there is no generally accepted definition of 1NF.
Does 1NF Equate to “Atomic Attributes”?
When you look at various descriptions of 1NF the word that you see most often is atomic. It is common to say that a relation is in 1NF if all its attributes are atomic. A good, theory oriented book by C.J. Date (Database Design and Relational Theory: Normal Forms and All That Jazz) presents four eminent examples of such definitions in Exercise 4.16. Let’s take a look!
- First normal form (1NF) … states that the domain of an attribute must include only atomic (simple, indivisible) values and that the value of any attribute in a tuple must be a single value from the domain of that attribute … 1NF disallows having a set of values, a tuple of values, or a combination of both as an attribute value for a single tuple … 1NF disallows “relations within relations” or “relations as attribute values within tuples” … the only attribute values permitted by 1NF are single atomic (or indivisible) values (Ramez Elmasri and Shamkant B. Navathe, Fundamentals of Database Systems, 4th edition, Addison-Wesley, 2004)
- A relation is in first normal form if every field contains only atomic values, that is, no lists or sets (Raghu Ramakrishnan and Johannes Gehrke, Database Management Systems, 3rd edition, McGraw-Hill, 2003)
- First normal form is simply the condition that every component of every tuple is an atomic value (Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom, Database Systems: The Complete Book, Prentice Hall, 2002)
- A domain is atomic if elements of the domain are considered to be indivisible units … we say that a relation schema R is in first normal form (1NF) if the domains of all attributes of R are atomic (Abraham Silberschatz, Henry F. Korth, and S. Sudarshan, Database System Concepts, 4th edition, McGraw-Hill, 2002)
“Atomic” means indivisible but, in fact, it is only one bit that is really indivisible in computer science. Certainly, we do not want to split attributes in our relations to contain 0’s or 1’s, only. So, is there any clue what people have in mind demanding atomicity? In my opinion, the intuition behind these definitions is that you should rarely need to extract information from a value of an attribute. This intuition is confirmed in Codd’s own writing who also uses the a-word. But that explains why one cannot decide, depending on theory only, whether a relation is in 1NF.
Let us look at an example. If you need a phone number just to call your clients then it is your atomic attribute. But if you run a database for a phone company in North America then you are also interested in splitting the phone number into three-digit area code, three-digit central office code and, finally, four-digit station number. So what is really atomic depends on how you plan to use your data. And, of course, there are cases that you want to decompose a value that you usually consider atomic. Imagine, that you want to anonymize persons in your database and to present only their initials. That’s certainly a legitimate decomposition of, otherwise atomic, first_name and last_name attributes.
Thus, in my opinion, it is a habitual use of data that makes attributes atomic, not theory. No wonder, there is so much mess in theory about what 1NF should be.
Codd’s Original Definition: No Relational Attributes
If we go back to 1970, F. Codd’s original definition may shed some light on the problem with 1NF. Codd writes:
So, Codd himself uses the a-word but he refers to nonatomic values as being just relations. Later, he states that relational domains can be eliminated and this is the place when he get relations decomposed into, what he calls, normal form. This is our first normal form – at that time there were no other normal forms.
Then, Codd presents an example of a relation not in normal form. Simplifying Codd’s example I describe two relations: employee(man#, name, birthdate, children)
and children (childname, birthdate)
. All primary keys are written in italics. The relation employee
has an attribute with its domain being another relation – children
. This is not a simple domain for Codd (and certainly it is not atomic). Then, Codd proceeds with a normalization procedure. He creates a separate relation children(man#, childname, birthdate)
and removes the attribute children from employee
. As you can see now children
has a new component in its primary key. In his description Codd doesn’t mention the term “foreign key” but this is exactly what he uses.
What Could 1NF Be?
In my opinion, Codd’s original intuition is simple and sufficiently precise. A relation is in 1NF if it has no attributes which have a relational domain. Do we want our relations to obey this definition? Well, imagine that you need to perform a join over an attribute whose values are relations. This would be a simple task to write such a query but this would be hard to compute. So, there is a good reason to keep relations normalized. Of course, we can write a corresponding SQL query for relations in 1NF but in this case the query complexity would be better paired with its performance complexity.
My bottom line is as follows. We would be better if we stuck with Codd’s original intuition and never tried to explain a vague notion of atomicity. All data types: NUMERIC, TEXT, DATE or even BLOB can be taken as atomic. It should be only sets and relations which are not.