Keys are one of the most vital features in a relational database. Among the many types of keys, the primary key is widely used and appreciated. Let’s find out what kinds of primary keys are available, what a primary key does, and how to use it in a table.
In this article, you’ll learn all about primary keys in relational databases. The topics we’ll cover include:
- An overview of relational database keys.
- Primary key types: composite, natural, and surrogate keys.
- Why primary keys are so important.
- Primary key syntax and usage in SQL.
- Typical primary key use cases.
- How to set primary keys in Vertabelo.
We’ll include a lot of examples so you can see for yourself how to use this information. Let’s get started!
Keys in a Relational Database
Keys play a crucial role in relational databases; in different ways, the different types of keys allow for unique row identification in a table.
What Do Database Keys Do?
In a database, keys:
- Identify a particular row in a table, based on the data in one or more columns.
- Establish the relationship between two given tables using the foreign key concept (e. the values in one column match the values of a column in a different table, which allows us to make connections between the data in both tables).
- Enforce the identity and integrity of the database.
Keys do other things as well; for example, unique row identification accelerates data searching and sorting.
Types of Database Keys
Let’s look at some common keys available in relational databases:
- A primary key is a column that uniquely identifies each row in a table.
- A composite primary key is a set of columns that together uniquely identify each row in a table. The column values themselves may not be unique (i.e. they can appear more than once in the same table), but the combination of their values must always be unique.
- A super key is a group of single-column or composite keys that allow unique row identification. A super key can consist of columns that don’t uniquely identify each row by themselves. The columns can repeat in different combinations, as shown in the chart (See this article to learn the differences between super keys and primary keys.)
- A candidate key is a set of columns that can individually (i.e. each column individually) uniquely identify each row in a table. It is similar to the super key, but it does not allow the repetition of columns. The primary key is selected from the candidate keys.
- An alternate key is one or more columns that uniquely identify each row in a table. The alternate key can be used as an alternative to the primary key.
- A surrogate key uniquely identifies each row in a table; this column is created specifically for the purpose of unique row identification.
- A natural key is a column or a set of columns that uniquely identifies each row in a table. Unlike a surrogate key, it is part of the data set.
- A foreign key creates a link between two tables and helps to maintain data integrity. You can learn more about foreign keys here.
You’ll find more information about database keys in this article.
Visual aids are always helpful. Let’s see the summary of the key types below. For that, we’ll use a Person
table that consists of six columns. The Id
, SSN
, PhoneNo
, and Email
columns of this table are assumed to store only unique values.
Id | FirstName | LastName | SSN | PhoneNo | ||
---|---|---|---|---|---|---|
primary key | ||||||
candidate keys | ||||||
alternate keys | ||||||
surrogate key | ||||||
natural key | ||||||
composite key | ||||||
super key | ||||||
super key |
We assume that the Id column is the primary key of the Person table. The Id
, SSN
, PhoneNo
, and Email
columns could each play the role of the primary key. Hence, these columns are the candidate key. And the SSN
, PhoneNo
, and Email
columns are the alternate keys.
The Id
column is a surrogate key. This column does not store any real data. It is present in the Person
table only to uniquely identify each row. On the other hand, the SSN
column is a natural key. It stores data and uniquely identifies each row at the same time.
The composite key is any combination of columns that uniquely identify each row. Here, the Id
, FirstName
, and LastName
columns could be the table’s composite key. The columns that comprise the composite key can store duplicate values, but the combination of these columns must be unique for each row.
The super key is a set of different combinations of columns that allow for unique row identification. In the example above, the combination of FirstName
, LastName
, and Email
columns and the combination of FirstName
, LastName
, and PhoneNo
columns both belong to the set of super keys. As opposed to the candidate keys, these columns’ combinations can repeat. Check out this article to get a better understanding of candidate keys.
Primary Key Basics
The primary key is the column or set of columns that uniquely identifies each row in a table. It’s also the one used to establish foreign keys and thus create relationships between tables.
Primary Key Rules
There is a set of rules for defining the primary key:
- Each primary key column value must be unique; it cannot appear more than once in the column.
- NULLs are not permitted in a primary key column (i.e. it is non-nullable).
- There are specific rules for updating the values in a primary key column, as this column’s values can be used in the foreign key for another table.
Primary Key Syntax
A primary key is defined during table creation. Let’s look at the basic syntax for defining a primary key in SQL.
CREATE TABLE Product ( ProductId INT PRIMARY KEY, ProductName VARCHAR(50), QuantityInStock INT );
The ProductId
column is the primary key of the Product
table. It is also a surrogate key because it doesn’t store any data from the data set. Its only role is the unique identification of each row.
The primary key can also be defined after the table is created. In this case, we use the ALTER TABLE
statement with ADD CONSTRAINT
:
CREATE TABLE Product ( ProductId INT, ProductName VARCHAR(50), QuantityInStock INT ); ALTER TABLE Product ADD CONSTRAINT Product_PK PRIMARY KEY (ProductId);
We first created the Product table without defining its primary key column. After that, the ALTER TABLE statement is used to add the primary key constraint.
Composite Primary Keys
A composite primary key consists of more than one column. Its syntax is:
CREATE TABLE OrderProduct ( OrderId INT, ProductId INT, Quantity INT, TotalPrice MONEY, PRIMARY KEY(OrderId, ProductId) );
The OrderId
and ProductId
columns together make up the composite primary key for the OrderProduct
table.
A composite primary key can also be defined after the table is created. It too uses the ALTER TABLE
statement:
CREATE TABLE OrderProduct ( OrderId INT, ProductId INT, Quantity INT, TotalPrice MONEY ); ALTER TABLE OrderProduct ADD CONSTRAINT OrderProduct_PK PRIMARY KEY (OrderId, ProductId);
We first created the OrderProduct
table without defining its primary key. Then we used the ALTER TABLE
statement to add the composite primary key constraint.
Natural and Surrogate Primary Keys
A natural primary key stores relevant data; it’s part of the data set. At the same time, it fulfills the conditions for being the primary key column of that table.
Let’s look at an example using the Customer
table:
FirstName | LastName | |
---|---|---|
Grace | Anderson | grace.a@email.com |
John | Tyler | j.tyler@email.com |
Bob | Black | bob.black@email.com |
Assuming that each customer has a unique email address, the Email
column is the natural primary key for the Customer
table.
The surrogate primary key is quite the opposite; it doesn’t store any data that belongs to the data set. It is added only for the purpose of unique row identification.
Let’s recreate the Customer
table with a surrogate key:
Id | FirstName | LastName |
---|---|---|
1 | Grace | Anderson |
2 | John | Tyler |
3 | Bob | Black |
The Id
column is the surrogate primary key of the Customer
table. You can find more insights on surrogate keys in this article. If you have trouble deciding between using a natural and a surrogate primary key, check out this article.
Primary Key Examples
It is always good to practice with examples, so let’s examine some using the Vertabelo database modeler.
How to Add a Primary Key in Vertabelo
There are two ways you can add a primary key in Vertabelo:
- Check the PK checkbox next to the column definition, as shown below:
- 2. Or you could add a column (or a set of columns) in the Primary key section:
When looking at an ER diagram, you’ll notice the PK sign next to each primary key column.
Primary Key Example
Let’s look at another example where we’ll use multiple primary keys and foreign keys:
Let’s discuss the ER diagram presented above. The CustomerId
column is the primary key for the Customer
table and a foreign key to the Order
table. The OrderId
column is the primary key for the Order
table. It is also a foreign key and part of a composite primary key in the OrderProduct
table. The other half of this composite primary key is the ProductId
column, which is a foreign key to the Product
table. The ProductId
column is the primary key of the Product
table.
Composite Primary Key Example
Let’s take a closer look at a composite primary key.
The CourseEnrollment
table has a composite primary key. Its primary key consists of the CourseId
and StudentId
columns. It means that the combination of the CourseId
and StudentId
columns must be unique for each row. However, both the CourseId
and the StudentId
columns can contain duplicate values.
CourseId | StudentId |
---|---|
123 | 111111 |
123 | 222222 |
123 | 333333 |
456 | 111111 |
456 | 222222 |
678 | 111111 |
If you think about it, you’ll realize that each student can take many different courses. Hence, the values of the StudentId column can repeat. Also, each course can be attended by many students, so the values of the CourseId column can repeat as well. However, one student can attend a given course only once. Thus, the combination of the CourseId and StudentId columns must be unique. And that is shown in the table above.
Learn More About Primary Keys
Now you know how crucial the primary key is. It improves the database’s search, sort, and query processes. And all of that is based on the simple idea of unique row identification.
When choosing a table's primary key column, keep in mind that all its values must be unique – unless you choose to use a composite primary key, in which case the combination of values must be unique.
Another important point to keep in mind when deciding on a primary key is that its values cannot be null. The primary key column must be unique and must contain non-null values. Check out our article on How to Choose a Good Primary Key to learn more.
Go ahead and do some more practice on your own. Good luck!