September 9, 2021 - 9 minutes read

What Is a Primary Key?

Martyna is a software developer with a passion for programming, automation, and innovation. Currently working for the global corporate IT consulting sector, she is a new-generation techie who hails from Warsaw University of Technology, Poland. She has certifications in both database administration and Java development. Her interests include creating small automation devices using embedded systems and various electronic components. In her free time, she practices yoga, which helps her to center her thoughts and come up with new ideas.

Tags:

keys

Keys are one of the most vital features in a relational database. Among the many types of keys, the primary key is widely used and appreciated. Let’s find out what kinds of primary keys are available, what a primary key does, and how to use it in a table.

In this article, you’ll learn all about primary keys in relational databases. The topics we’ll cover include:

An overview of relational database keys.
Primary key types: composite, natural, and surrogate keys.
Why primary keys are so important.
Primary key syntax and usage in SQL.
Typical primary key use cases.
How to set primary keys in Vertabelo.

We’ll include a lot of examples so you can see for yourself how to use this information. Let’s get started!

Keys in a Relational Database

Keys play a crucial role in relational databases; in different ways, the different types of keys allow for unique row identification in a table.

What Do Database Keys Do?

In a database, keys:

Identify a particular row in a table, based on the data in one or more columns.
Establish the relationship between two given tables using the foreign key concept (e. the values in one column match the values of a column in a different table, which allows us to make connections between the data in both tables).
Enforce the identity and integrity of the database.

Keys do other things as well; for example, unique row identification accelerates data searching and sorting.

Types of Database Keys

Let’s look at some common keys available in relational databases:

A primary key is a column that uniquely identifies each row in a table.
A composite primary key is a set of columns that together uniquely identify each row in a table. The column values themselves may not be unique (i.e. they can appear more than once in the same table), but the combination of their values must always be unique.
A super key is a group of single-column or composite keys that allow unique row identification. A super key can consist of columns that don’t uniquely identify each row by themselves. The columns can repeat in different combinations, as shown in the chart (See this article to learn the differences between super keys and primary keys.)
A candidate key is a set of columns that can individually (i.e. each column individually) uniquely identify each row in a table. It is similar to the super key, but it does not allow the repetition of columns. The primary key is selected from the candidate keys.
An alternate key is one or more columns that uniquely identify each row in a table. The alternate key can be used as an alternative to the primary key.
A surrogate key uniquely identifies each row in a table; this column is created specifically for the purpose of unique row identification.
A natural key is a column or a set of columns that uniquely identifies each row in a table. Unlike a surrogate key, it is part of the data set.
A foreign key creates a link between two tables and helps to maintain data integrity. You can learn more about foreign keys here.

You’ll find more information about database keys in this article.

Visual aids are always helpful. Let’s see the summary of the key types below. For that, we’ll use a Person table that consists of six columns. The Id, SSN, PhoneNo, and Email columns of this table are assumed to store only unique values.

	Id	FirstName	LastName	SSN	PhoneNo	Email
primary key
candidate keys
alternate keys
surrogate key
natural key
composite key
super key
super key

We assume that the Id column is the primary key of the Person table. The Id, SSN, PhoneNo, and Email columns could each play the role of the primary key. Hence, these columns are the candidate key. And the SSN, PhoneNo, and Email columns are the alternate keys.

The Id column is a surrogate key. This column does not store any real data. It is present in the Person table only to uniquely identify each row. On the other hand, the SSN column is a natural key. It stores data and uniquely identifies each row at the same time.

The composite key is any combination of columns that uniquely identify each row. Here, the Id, FirstName, and LastName columns could be the table’s composite key. The columns that comprise the composite key can store duplicate values, but the combination of these columns must be unique for each row.

The super key is a set of different combinations of columns that allow for unique row identification. In the example above, the combination of FirstName, LastName, and Email columns and the combination of FirstName, LastName, and PhoneNo columns both belong to the set of super keys. As opposed to the candidate keys, these columns’ combinations can repeat. Check out this article to get a better understanding of candidate keys.

Primary Key Basics

The primary key is the column or set of columns that uniquely identifies each row in a table. It’s also the one used to establish foreign keys and thus create relationships between tables.

Primary Key Rules

There is a set of rules for defining the primary key:

Each primary key column value must be unique; it cannot appear more than once in the column.
NULLs are not permitted in a primary key column (i.e. it is non-nullable).
There are specific rules for updating the values in a primary key column, as this column’s values can be used in the foreign key for another table.

Primary Key Syntax

A primary key is defined during table creation. Let’s look at the basic syntax for defining a primary key in SQL.

CREATE TABLE Product (
     ProductId INT PRIMARY KEY,
     ProductName VARCHAR(50),
     QuantityInStock INT
);

The ProductId column is the primary key of the Product table. It is also a surrogate key because it doesn’t store any data from the data set. Its only role is the unique identification of each row.

The primary key can also be defined after the table is created. In this case, we use the ALTER TABLE statement with ADD CONSTRAINT:

CREATE TABLE Product (
     ProductId INT,
     ProductName VARCHAR(50),
     QuantityInStock INT
);
ALTER TABLE Product
     ADD CONSTRAINT Product_PK
     PRIMARY KEY (ProductId);

We first created the Product table without defining its primary key column. After that, the ALTER TABLE statement is used to add the primary key constraint.

Composite Primary Keys

A composite primary key consists of more than one column. Its syntax is:

CREATE TABLE OrderProduct (
     OrderId INT,
     ProductId INT,
     Quantity INT,
     TotalPrice MONEY,
     PRIMARY KEY(OrderId, ProductId)
);

The OrderId and ProductId columns together make up the composite primary key for the OrderProduct table.

A composite primary key can also be defined after the table is created. It too uses the ALTER TABLE statement:

CREATE TABLE OrderProduct (
     OrderId INT,
     ProductId INT,
     Quantity INT,
     TotalPrice MONEY
);
ALTER TABLE OrderProduct
     ADD CONSTRAINT OrderProduct_PK
     PRIMARY KEY (OrderId, ProductId);

We first created the OrderProduct table without defining its primary key. Then we used the ALTER TABLE statement to add the composite primary key constraint.

Natural and Surrogate Primary Keys

A natural primary key stores relevant data; it’s part of the data set. At the same time, it fulfills the conditions for being the primary key column of that table.

Let’s look at an example using the Customer table:

FirstName	LastName	Email
Grace	Anderson	grace.a@email.com
John	Tyler	j.tyler@email.com
Bob	Black	bob.black@email.com

Assuming that each customer has a unique email address, the Email column is the natural primary key for the Customer table.

The surrogate primary key is quite the opposite; it doesn’t store any data that belongs to the data set. It is added only for the purpose of unique row identification.

Let’s recreate the Customer table with a surrogate key:

Id	FirstName	LastName
1	Grace	Anderson
2	John	Tyler
3	Bob	Black

The Id column is the surrogate primary key of the Customer table. You can find more insights on surrogate keys in this article. If you have trouble deciding between using a natural and a surrogate primary key, check out this article.

Primary Key Examples

It is always good to practice with examples, so let’s examine some using the Vertabelo database modeler.

How to Add a Primary Key in Vertabelo

There are two ways you can add a primary key in Vertabelo:

Check the PK checkbox next to the column definition, as shown below:

2. Or you could add a column (or a set of columns) in the Primary key section:

When looking at an ER diagram, you’ll notice the PK sign next to each primary key column.

Primary Key Example

Let’s look at another example where we’ll use multiple primary keys and foreign keys:

Let’s discuss the ER diagram presented above. The CustomerId column is the primary key for the Customer table and a foreign key to the Order table. The OrderId column is the primary key for the Order table. It is also a foreign key and part of a composite primary key in the OrderProduct table. The other half of this composite primary key is the ProductId column, which is a foreign key to the Product table. The ProductId column is the primary key of the Product table.

Composite Primary Key Example

Let’s take a closer look at a composite primary key.

The CourseEnrollment table has a composite primary key. Its primary key consists of the CourseId and StudentId columns. It means that the combination of the CourseId and StudentId columns must be unique for each row. However, both the CourseId and the StudentId columns can contain duplicate values.

CourseId	StudentId
123	111111
123	222222
123	333333
456	111111
456	222222
678	111111

If you think about it, you’ll realize that each student can take many different courses. Hence, the values of the StudentId column can repeat. Also, each course can be attended by many students, so the values of the CourseId column can repeat as well. However, one student can attend a given course only once. Thus, the combination of the CourseId and StudentId columns must be unique. And that is shown in the table above.

Learn More About Primary Keys

Now you know how crucial the primary key is. It improves the database’s search, sort, and query processes. And all of that is based on the simple idea of unique row identification.

When choosing a table's primary key column, keep in mind that all its values must be unique – unless you choose to use a composite primary key, in which case the combination of values must be unique.

Another important point to keep in mind when deciding on a primary key is that its values cannot be null. The primary key column must be unique and must contain non-null values. Check out our article on How to Choose a Good Primary Key to learn more.

Go ahead and do some more practice on your own. Good luck!