Back to articles list
- 3 minutes read

On Keys

Today we continue our series of posts on data normalization. In the previous post on data normalization I explained what functional dependency is. Today we will talk about candidate keys in a table.

A candidate key is a set of columns such that all other columns in the table are dependent on it, and the set is minimal, that is if you remove a column, then the resulting set is not a candidate key.

Example: Table Person




Remember the table person seen in the previous post and in our videos. We’ve already discovered that all columns depend on the column SSN. The set {SSN} has only one element. The set can’t be minimized – no columns depend on the empty set.

Thus SSN is the candidate key in the table person.

Example: Table Students




Another example we have already seen. The nontrivial functional dependencies in the table are:

student → semester
student, lecture → teaching_assistant

All columns depend on the set {student, lecture}. The semester column depends on the column student, so it depends also on the set {student, lecture}. The teaching_assistant column depends on the set {student, lecture} because of the second dependency.

The set {student, lecture} is a candidate key: it is a minimal set of columns on which all columns depend. You can’t remove the column student from the candidate key: the semester column does not depend on the lecture column. Likewise, you can’t remove the lecture column from the candidate key: the teaching_assistant column does not depend on the student column only.

Thus the pair {student, lecture} is the candidate key in the table students.

Primary Keys

The might be more than one candidate key in a table. If that’s the case you choose one of the candidate keys as primary. The database management systems use primary keys to store and manage data more effectively.

Example: Person With Passport




Let’s take a look at the table person2. We’ve added the passport column to the table person. The passport number determines personal data of a person in the same way as the SSN column. The table person2 has two candidate keys: SSN and passport. You can choose either column as primary key.

The set {SSN, passport} is NOT a candidate key. You can remove either column and you still have all other columns functionally dependent on a candidate key.

Example: Student With Identity Card

We’ve modified the table students by adding student identity card number. Now the semester column depends both on the student column and on the student_id_number column.

student → semester
student_id_number → semester



The table students2 has two candidate keys: {student, lecture} and {student_id_number, lecture}. Any of them might be chosen as primary key.

Superkeys

A set which is a superset of a candidate key (contains all columns in a key and more) is called a superkey. We’ll need to know what a superkey is when we talk about normalization.

Keys in Databases

In real-world databases you have to explicitly mark a set of columns as a key, either primary or alternate. The database uses the knowledge of primary keys to efficiently manage data. The set of columns might be a key (in the theoretical sense explained in this article) even if you don’t mark it as such. But then your database has no way of knowing that it is a key.

go to top