Today we continue our series of posts on data normalization. In the previous post on data normalization I explained what functional dependency is. Today we will talk about **candidate keys** in a table.

A candidate key is a set of columns such that **all other columns in the table are dependent** on it, and the set is **minimal**, that is if you remove a column, then the resulting set is not a candidate key.

### Example: Table Person

Remember the

person seen in the previous post and in our videos. We’ve already discovered that all columns depend on the column **table**`SSN`

. The set {SSN} has only one element. The set can’t be minimized – no columns depend on the empty set.

Thus `SSN`

is the candidate key in the table

.**person**

### Example: Table Students

Another example we have already seen. The nontrivial functional dependencies in the table are:

student → semester student, lecture → teaching_assistant

All columns depend on the set {student, lecture}. The `semester`

column depends on the column `student`

, so it depends also on the set {student, lecture}. The `teaching_assistant`

column depends on the set {student, lecture} because of the second dependency.

The set {student, lecture} is a candidate key: it is a minimal set of columns on which all columns depend. You can’t remove the column `student`

from the candidate key: the `semester`

column does not depend on the `lecture`

column. Likewise, you can’t remove the `lecture`

column from the candidate key: the `teaching_assistant`

column does not depend on the `student`

column only.

Thus the pair {student, lecture} is the candidate key in the table

.**students**

### Primary Keys

The might be **more than one candidate key** in a table. If that’s the case you choose one of the candidate keys as primary. The database management systems use primary keys to store and manage data more effectively.

### Example: Person With Passport

Let’s take a look at the table

. We’ve added the **person2**`passport`

column to the table `person`

. The passport number determines personal data of a person in the same way as the `SSN`

column. The table

has two candidate keys: **person2**`SSN`

and `passport`

. You can choose either column as primary key.

The set {SSN, passport} is NOT a candidate key. You can remove either column and you still have all other columns functionally dependent on a candidate key.

### Example: Student With Identity Card

We’ve modified the table

by adding student identity card number. Now the **students**`semester`

column depends both on the `student`

column and on the `student_id_number`

column.

student → semester student_id_number → semester

The table

has two candidate keys: {student, lecture} and {student_id_number, lecture}. Any of them might be chosen as primary key.**students2**

### Superkeys

A set which is a superset of a candidate key (contains all columns in a key and more) is called a superkey. We’ll need to know what a superkey is when we talk about normalization.

### Keys in Databases

In real-world databases you have to explicitly mark a set of columns as a key, either primary or alternate. The database uses the knowledge of primary keys to efficiently manage data. The set of columns might be a key (in the theoretical sense explained in this article) even if you don’t mark it as such. But then your database has no way of knowing that it is a key.