A natural key is used to provide simple, easy-to-remember values (or set of values) that are meaningful to the business as an identifier for each row, rather than using business-agnostic, system-generated values as primary keys for database tables.
Before getting into detail about what a natural key is, you might want to read the article “On Keys” to fully understand the concept of keys in a database model and their different types.
Let’s Start With the Basics: What Is a Primary Key?
One of the first things you learn about designing a database is the purpose and the uses of primary keys. They are a special type of unique keys, with a couple of additional considerations:
- The column or columns that are members of a primary key cannot accept NULL values.
- A table can have only one primary key defined (but it may have additional unique keys, as we will discuss later).
You should consider the following criteria in deciding which column or columns to use for the primary key:
- Uniqueness. This is the best known and almost obvious characteristic of primary key The values used should be unique; no other row in the table should have the same value(s).
- Irreducibility. This means that no subset of the columns chosen as the key uniquely identifies a row in the table. If the passport number is unique, we should not use the passport number and gender together as the primary key.
- Simplicity. It is preferable to use as few columns as possible and to have values that are simple to read and remember.
- Stability. This is a critical aspect to be considered. Neither the definition nor the values of the column should change. For example, a cell phone number is something that could uniquely identify a person. However, if a person changes his/her phone number, it will require changes not only in the table storing the person’s information, but on all person-related tables that use the cell phone number as a foreign key.
- If the values used as keys are familiar to the user, it will be easier for him/her to interact with the system.
What Is the Difference Between a Natural (or Business) Key and a Surrogate Key?
When you need to create a primary key, the first thing to consider is whether to use existing, business-related columns (known as natural or business key) or to add an additional column to the table that will store system-generated values as primary key (known as surrogate key).
Surrogate keys have been discussed in another blog, “What Is a Surrogate Key?”. In case you have not yet read it, we can briefly mention that surrogate keys are columns that do not have any business value, used only to ensure the primary key is simple and has unique values. If you are designing a table to store customer information, you might just add a numeric column CustomerID
and use the features available in most RDBMS engines (like identity or sequences) to assign values to it automatically. Every time someone inserts a new row in the Customer
table, a new, unique value for CustomerID
will be automatically generated.
The other alternative, using existing business-related columns as the primary key, will require us to find a set of columns that uniquely identify each row in the table. Sometimes, picking the appropriate column (or columns) is easy. But this is tricky in most cases, and it is possible that the criteria mentioned above may not all be fulfilled.
Before continuing, please take a look at the article “Designing a Database: Should a Primary Key Be Natural or Surrogate?” if you have not read it, and learn more about natural and surrogate keys.
Choosing the Right Columns for a Natural Key
Imagine we are developing the database model for a car rental company and we need to start with the Vehicle
table. We have created a logical diagram in the Vertabelo Data Modeler with the following attributes defined:
At first sight, there are a couple of columns that may be usable as the natural primary key in this model: LicensePlate
and VIN
(also known as the chassis number or the frame number; it is a global unique code to identify individual vehicles as explained in Wikipedia). Now, let’s evaluate them as well as a generic surrogate key against the criteria for a primary key:
Criteria | Surrogate KeyExample: 45734 | License PlateExample: US-NY-KLM4567 | VINExample: 4Y1SL65848Z411439 |
---|---|---|---|
Uniqueness | Always unique. | Usually unique at the state or country level. May need to ensure that state/country are included as a part of the key. | Always unique. |
Irreducibility | Irreducible. | Using country + state + number format guarantees global uniqueness, but simpler format (state + number or just number) may be enough. | Irreducible. |
Simplicity | Simple. | Simple to moderate. | Complex. |
Stability | Never changes. | License plates format may change. | Never changes. |
Familiarity | No business meaning. Normally hidden from end users. | Very easy to remember specific values. Can be recognized at first sight on a vehicle. Has business meaning. | Difficult to remember. |
Reduced business meaning. | Very easy to remember specific values. | Simple to moderate. | Complex. |
Comparing these alternatives, we can see that using the VIN guarantees uniqueness, irreducibility, and stability, but it is much more complex and lacks familiarity (only a few car owners know the VIN
of their cars, and nobody would know the VIN of a rented one!).
Evaluating LicensePlate
as a possible primary key is not as simple as VIN
, since we may need to decide whether to include country or country plus state in them or not. Having them as part of the license plate information ensures global uniqueness at the expense of simplicity and irreducibility (if our company is local and uses only local cars, the number alone would be enough). Although stability is not 100% guaranteed, license plates do not change frequently, so we can consider this a neutral to positive aspect, while its familiarity is an obvious advantage for LicensePlate
compared with VIN
.
Choosing which of the two existing options to use is a matter of understanding the system we are designing and how it will be used. There is no exact rule to apply. If I had to create this model using a natural key, I would use LicensePlate
(with the state and country if needed). For me, familiarity is the most useful advantage of natural keys over surrogate ones, so I would use the natural key that makes the best use of that aspect.
Other Things to Consider
When choosing an existing column or set of columns as a natural primary key, there are additional considerations that may impact the simplicity of the data model, database size, and performance. Here are a few of them:
- Number of columns: The fewer, the better. Since the columns used in the primary key will be added to all the child tables as foreign keys, use as few columns as possible. Having a single
LicensePlate
column instead ofLicensePlateCountry
,LicensePlateState
andLicensePlateNumber
will make it simpler to add the foreign keys in all related tables. - Size: the full license plate format will be between 6 and 14-15 characters (depending on if you use country and state and if you use “-” as separators between them), while the
VIN
is always 17 characters. You can save 2 to 11 characters on every row in both parent and child tables, if you useLicensePlate
instead ofVIN
as the primary key. - Performance: Smaller size also means smaller indexes and less memory and time required to perform JOINs between the parent and child tables.
- Consistency: Be consistent when you design your data model. Whether you prefer surrogate keys or natural keys, use them consistently. That way, it will be easier to understand and navigate for those working on the data model in the future.
Extra Tip: Numeric values use less storage space than character values, so usually there is an advantage in choosing the numeric data type over the character data type. The risk, however, is that you may need to store character values in the future. Business rules and definitions may look immutable at the time you design a system, but business requirements, laws, and regulations evolve all the time, which may force you to change the data type of some columns. Using natural primary keys always risks a change in data type or size if business requirements change, which can have a major impact in the data model. This is one reason why using numeric surrogate keys is so common, since they usually enhance performance and reduce storage requirements.
I Have Created the Primary Key. What Should I Do Next?
Defining and creating the primary key is a critical step in designing each table of our data model. But all the work you have done in selecting the best primary key should not be thrown away even after you select the primary key. All candidate keys should be enforced on the data model to ensure uniqueness is guaranteed and no business rules are violated.
Using Vertabelo Data Modeler to create a physical model, our Vehicle
table would look like this, with LicensePlate
as the natural primary key and VIN
as an additional unique key:
The same applies if you choose a surrogate primary key. We should then create two additional unique keys in the Vehicle
table to implement the unique constraint for both candidate keys, as shown in the image below:
Regardless of if we are using natural or surrogate keys, our system should not accept two vehicles with the same license plate or the same VIN.
What’s Next?
We have reviewed the importance of primary keys in a database model. We have explained the advantages and disadvantages of using natural keys versus using surrogate keys and what to consider when choosing between several candidate keys. We have also seen some examples using the Vertabelo Database Modeler as our database modeling tool.
Now, it is your turn! Use the comments section to tell us about your reaction to the article, ask any additional questions, or share your database modeling questions regarding natural keys.