April 15, 2021 - 6 minutes read

What Is a Candidate Key in Database Design?

Agnieszka is a Chief Content Officer at Vertabelo. Before coming to Vertabelo, she worked as a Java programmer. She has a PhD in mathematics and over 10 years of experience in teaching mathematics and computer science at the University of Warsaw. In her free time, she enjoys reading a good book, going mountain hiking and practicing yoga.

Tags:

A candidate key is an important concept in database normalization. Read on to find out what a candidate key is and how to check if a set of attributes is a candidate key.

The candidate key, also simply called a key, is an important part of database design. It is the theoretical foundation for technical concepts like primary and alternate (unique) keys. Every database designer should be aware of how to identify candidate keys and how to choose the right one for their table.

The concept of the candidate key is taught in all university database courses as part of database normalization theory. The common problems you’ll face when learning about candidate keys are verifying if a given set of attributes is a candidate key and finding all candidate keys for a relation.

Understanding candidate keys is important for understanding the normal forms in database tables. This knowledge will help you remember the rules for the most common normal forms.

In this article, we’ll explain the concept of candidate keys in simple terms. Plus, we’ll show you how to verify if a set of attributes is a candidate key.

Basic database normalization terminology

Before you read about candidate keys, make sure you are familiar with the basic normalization terminology. Let’s briefly review the most important terms.

A relation is the theoretical name for a database table. A relation (table) has a name and consists of named attributes (columns).

A functional dependency in a relation (A -> B) tells you that whenever two rows have the same values for all attributes in set A, they will also have the same values for all attributes in set B.

The closure of a set of attributes is the set of those attributes that can be functionally determined from this set. You can review the algorithm to compute the closure of attributes here.

Superkeys

Informally, a candidate key is a set of attributes that uniquely identify a row.

By definition, a candidate key is a minimal superkey. So, what does this mean? A superkey is an attribute or a set of attributes such that its closure is all attributes in the relation.

Let’s see some examples. Here, we have the CourseEditions table. It stores information about course editions.

Each year, a given course can be taught by a different teacher, with a different price and different limit on spots. We thus have the following functional dependencies:

id -> course, year, teacher, price, spots – the ID determines all other attributes
course, year -> id, teacher, price, spots – the course and year determine the ID, teacher, price, and spots.

CourseEditions

id	course	year	teacher	price	spots
1	Databases	2019	Chris Cape	100	45
2	Mathematics	2019	Daniel Parr	80	34
3	Databases	2020	Jennifer Clock	110	30

What are the superkeys in this table? First, all of the attributes form a superkey, so the set {id, course, year, teacher, price, spots} is a superkey. Remember that the set of all attributes is a superkey in all tables.

Are there any smaller superkeys in this table? Yes, there are. The set {id} is a superkey. We have the functional dependency id -> course, year, teacher, price, spots, and of course, we have the trivial dependency id -> id. Once we have the id, we can determine all of the other attributes from the functional dependencies.

The set {course, year} is also a superkey. We have the functional dependency course, year -> id, teacher, price, spots, and we have the trivial functional dependencies course -> course and year -> year. Once we have course and year, we can determine all of the other attributes from the functional dependencies.

The set {id, course, year, teacher} is also a superkey. We have id, course, and year. So, we can determine all of the other attributes in the table with these three attributes.

On the other hand, the set {teacher} is not a superkey. If we know the teacher, we can’t determine any other attribute other than the teacher. The set {teacher, price} is also not a superkey. Once we have teacher and price, we can’t determine any more attributes.

Minimal Superkeys

Not all superkeys are candidate keys. To be a candidate key, a superkey must be minimal, which means that if you take any attributes out of it, it won't be a superkey anymore. Let’s look at some examples.

The set {id} is a superkey, and it’s minimal. You can’t take attributes out of it, because you’ll then have an empty set, and an empty set is not a superkey. Thus, the set {id} is a candidate key.

The set {course, year} is also a superkey and a candidate key. If you take any of the attributes out of it, the remaining set is no longer a superkey. You need both course and year to determine the other attributes in the set.

However, the set {id, course, year, teacher} is a superkey but not a candidate key. For example, if you remove the attribute teacher, the remaining set is still a superkey. In fact, in this case, you can remove any attribute from {id, course, year, teacher}, and the remaining set will still be a superkey.

Note that a minimal superkey doesn’t mean the superkey with the smallest number of elements. Both {id} and {course, year} are candidate keys even though they have a different number of elements.

Algorithm: Verifying That a Set of Attributes Is a Candidate Key

This is the common database design problem: how do you verify if a set of attributes is a candidate key?

Here’s the algorithm to verify it:

Step 1: Check if the given set is a superkey. Compute the closure of attributes in the set. If the closure is the set of all attributes, the set is a superkey.
Step 2: Check if the superkey is minimal. Remove each attribute, one at a time. If the remaining set is a superkey, the superkey is not minimal, and the set is not a candidate key. If you can’t remove any of the attributes and keep the superkey property, the set is a candidate key.

For example, let’s check if the set {course, year} is indeed a candidate key.

Step 1: Let’s compute the closure of {course, year}. Using the closure algorithm, we conclude that the closure is indeed {id, course, year, teacher, price, spots}. Thus, the set {course, year} is indeed a superkey.
Step 2. Let’s try to remove course from the set. We’re left with the set {year}. There is no functional dependency with just year as the left-hand-side. Thus, the closure of this set is {year}. Similarly, when we remove the attribute year, the closure of the remaining set is {course}. Neither {year} nor {course} are superkeys, so the set {course, year} is a minimal superkey and thus, a candidate key.

If you liked this article, check out other normalization articles on our blog.

If you’re a student taking database classes, make sure to create a free Academic Account in Vertabelo, our online ER diagram drawing tool. It allows you to draw logical and physical ER diagrams directly in your browser.

Vertabelo supports PostgreSQL, SQL Server, Oracle, MySQL, Google BigQuery, Amazon Redshift, and other relational databases. Try it out, and see how easy it is to get started!

Tags:

Superkeys

Minimal Superkeys

Algorithm: Verifying That a Set of Attributes Is a Candidate Key

You may also like

How To Remember the Rules for Database Normal Forms

On Functional Dependencies

How To Find the Closure of a Set of Attributes (With Examples)

Normalization in Relational Databases: First Normal Form (1NF), Second Normal Form (2NF), and Third Normal Form (3NF)

Denormalization: When, Why, and How