Today we continue our series of posts on data normalization. In the previous post on data normalization I explained what functional dependency is. Today we will talk about candidate keys in a table.
A candidate key is a set of columns such that all other columns in the table are dependent on it, and the set is minimal, that is if you remove a column, then the resulting set is not a candidate key.
Example: Table Person
Remember the table
person seen in the previous post and in our videos. We’ve already discovered that all columns depend on the column SSN
. The set {SSN} has only one element. The set can’t be minimized – no columns depend on the empty set.
Thus SSN
is the candidate key in the table person
.
Example: Table Students
Another example we have already seen. The nontrivial functional dependencies in the table are:
student → semester student, lecture → teaching_assistant
All columns depend on the set {student, lecture}. The semester
column depends on the column student
, so it depends also on the set {student, lecture}. The teaching_assistant
column depends on the set {student, lecture} because of the second dependency.
The set {student, lecture} is a candidate key: it is a minimal set of columns on which all columns depend. You can’t remove the column student
from the candidate key: the semester
column does not depend on the lecture
column. Likewise, you can’t remove the lecture
column from the candidate key: the teaching_assistant
column does not depend on the student
column only.
Thus the pair {student, lecture} is the candidate key in the table students
.
Primary Keys
The might be more than one candidate key in a table. If that’s the case you choose one of the candidate keys as primary. The database management systems use primary keys to store and manage data more effectively.
Example: Person With Passport
Let’s take a look at the table person2
. We’ve added the passport
column to the table person
. The passport number determines personal data of a person in the same way as the SSN
column. The table person2
has two candidate keys: SSN
and passport
. You can choose either column as primary key.
The set {SSN, passport} is NOT a candidate key. You can remove either column and you still have all other columns functionally dependent on a candidate key.
Example: Student With Identity Card
We’ve modified the table students
by adding student identity card number. Now the semester
column depends both on the student
column and on the student_id_number
column.
student → semester student_id_number → semester
The table students2
has two candidate keys: {student, lecture} and {student_id_number, lecture}. Any of them might be chosen as primary key.
Superkeys
A set which is a superset of a candidate key (contains all columns in a key and more) is called a superkey. We’ll need to know what a superkey is when we talk about normalization.
Keys in Databases
In real-world databases you have to explicitly mark a set of columns as a key, either primary or alternate. The database uses the knowledge of primary keys to efficiently manage data. The set of columns might be a key (in the theoretical sense explained in this article) even if you don’t mark it as such. But then your database has no way of knowing that it is a key.