With the database design techniques explained here, designing a user authentication data model to support a rock-solid user authentication system is a breeze.
The design of an application should specify mechanisms to prevent unauthorized users from entering it. Such mechanisms are commonly designed in an authentication module. This is like how the design of a house ideally defines the security measures that prevent intruders from entering the building, in addition to defining the functionality of the house.
A blueprint of a house establishes the operating guidelines and constraints on what can and cannot be done for each room of the house (e.g., a car cannot be parked in the dining room). Similarly, a data model defines the constraints and operating guidelines for each table, its data, and the applications that operate on them. In this sense, a data model is a blueprint that describes the internal structure and relationships of a database.
The purpose of a data model is to demonstrate how the objects in the database interact with each other to meet the prescribed business logic. Data modeling is the creation of artifacts that abstractly represent data objects and their interactions in a model. Its benefits include facilitating database migration and properly documenting data models, among many others.
User authentication is a limitation imposed by applications on users attempting to access them. Through authentication, an application can verify that each user attempting to log in is who he/she claims to be and has permission to access and use the application.
To design an authentication module for an application, you need a database schema that supports it. We review the best practices for designing a database schema to support the requirements of a versatile and robust user authentication module. Let’s start by stating those requirements.
Requirements of Any User Authentication Module
To provide effective protection against unauthorized login attempts, a user authentication module must provide the following capabilities as the functional requirements:
- Register new users.
- Send confirmation emails.
- Provide secure options for recovering a forgotten password.
- Protect authentication data from unauthorized access.
- Support authentication via third-party services.
- Define roles and permission sets per role.
Read on and follow the steps to go from a minimalistic authentication module to a full-blown model that includes all the essential features. You can find a detailed description of the various techniques for building models that meet these requirements in these articles: “How to Store Authentication Data in a Database, Part 1,” “Email Confirmation and Password Recovery,” and “Logging in With External Services.”
Building a Database Model for User Authentication
The design of an authentication module needs to include a table that stores the information for validating each user’s login. In its most basic and minimalist form, this table contains the username and the password hash.
Storing the password hash instead of the password itself is the first of the best practices to take into account when designing a user authentication module. Let’s see why.
Storing Password Hashes Instead of Passwords
The database model for user authentication is the last line of defense to protect the stored user access data when all other protection measures fail. You must design the authentication schema for the possibility, however remote, of the user information falling into the hands of malicious people. You cannot afford the risk of a hacker gaining access to your user table and obtaining their passwords.
You may think storing encrypted passwords is enough as a security measure. But encryption functions must be bijective functions; that is, every encryption function must have an inverse function for decrypting the data. This means there is always a risk that an encrypted key can be decrypted. For this reason, passwords should not be stored in the database, not even in encrypted form.
Hashing functions, unlike encryption functions, are not bijective. It is practically impossible to obtain a password from its hash. However, hashing can be used to validate a password: if the password entered by a user renders the same hash as the one stored in the table, the hashing algorithm guarantees the password is the same as the one set by the user. The flowchart below helps illustrate this idea for a standard validation procedure:
The password validation algorithm compares the hash of the password entered by the user with the hash stored in the user table for the given username.
There are numerous hashing functions with varying degrees of reliability. Among the most reliable are SHA-2 and SHA-3. Design the database model for user authentication to support different hashing algorithms. This allows for incorporating new algorithms should a vulnerability be detected in the algorithm you use. When designing a user authentication module, add a field to the authentication data table to indicate the algorithm used to hash the password.
Adding Salt to Passwords
In addition to hashing passwords, it is a good practice to add salt to the hash. A salt is a random string generated when a password is set; it must be stored in the authentication data table along with the password hash. As a random factor but fixed for each user, it intervenes in the hashing function so that the password hash is unique even if a user chooses the same password as another user.
Giving the Table an Appropriate Name
You already know what fields your basic authentication data table should have. Now, you need to give it a name. There is a best practice for this, also: do not use a common name such as “User” or “Users.” Using common names makes your model an easy target for SQL injection attacks if application programmers do not do a good job of preventing such threats.
Use a more specific name such as
usrloginfo, or obfuscate the table name with numbers or other characters to make it impossible for a hacker to guess. This is also applicable to any table with sensitive information that should not fall into the hands of malicious people.
Choosing the Right Primary Key
By now, you know which fields to include in your minimalist user login data table. Now, we choose the primary key.
The login name is unique and cannot be repeated between different users, so it is theoretically possible to use it as the primary key of the table. However (as we see in detail below), while it is convenient to give the user the option to modify his/her login name at will, it is not convenient to modify the values of the primary key of a table.
Ideally, the primary key of the user login data table should be an abstract value internal to the application. The obvious choice, in this case, is a surrogate key.
We are now able to create our minimalistic authentication system database design, which contains a table of users with their respective login names, the password hash, and the salts for the hashing algorithm. This table also includes the ID of the hashing algorithm used, which refers to a lookup table of hashing algorithms.
A minimalistic schema for user authentication via login name and password.
Commonly, user authentication modules request a valid e-mail address as a second authentication factor in addition to the password. For this, you need to design an authentication module with a mechanism to verify the validity of this email address.
The way to do this verification is through a random and unique token, long enough so that it cannot be guessed by brute force. It sends this token to that email address, included in the URL pointing to the verification page of the application. The user must access that URL so that the application can verify that the verification email was received at the declared email address.
Upon receiving the confirmation request from the user, the application extracts the token from the URL and looks for it in the user table. If it finds it in the data of the user account, it can guarantee that the email address associated with that account is valid.
How do we define the data model for this mechanism to work? We can start from the minimalist model defined above and add the necessary elements to it.
To begin, we need to add to the table a field to store the email address, another to store the confirmation token, and another to store the timestamp for the moment the token was generated since it expires after a while. In addition, we need a field that indicates the validation status of the email, which should be a foreign key of a status table. With these additions, our authentication system database design looks like this:
To add email confirmation to our schema, we need to add some fields to the login data table and a table containing the possible validation statuses. The confirmation token field must allow null values since it is cleared once the email address is confirmed.
Any effective user authentication system provides an option to recover a forgotten password. The best practice is to provide the user with a link on the login page to manage password recovery.
This is done by requesting a user ID (usually a username or email address), validating that the user ID corresponds to an active user account, and sending the user an email with a randomly generated token to the previously validated email address.
The recovery token must be short-lived since its mere existence implies a security risk. So, to implement a password recovery mechanism, we have to add two more fields to our authentication data table: one for the recovery token and another to capture the timestamp with the date and time the token was generated so that it can be deleted after a set period.
If we build on the schema for email confirmation, we already have a field to store the user’s email address. We can use the same address to send the password recovery email.
With these fields added, the authentication system database design looks like this:
With a couple more fields, we can add support for password recovery to our schema.
Authentication Through External Providers
Using external services for authentication is becoming increasingly common among web applications. In fact, it makes sense to include it in our list of best practices.
There are several advantages to using services such as Google, Twitter, or Facebook to authenticate users. They include a better experience for the users from not having to choose and remember a new password and delegating the management of the user’s authentication data to an external service.
A typical login window with the option to use Google as an external authentication provider.
If a user chooses to authenticate through an external service, we don’t need to store the information we saw above for password authentication. This means the corresponding fields must allow null values. You may eliminate them if your only authentication option is through external services. However, if these services are only an alternative to password authentication, consider both options.
External authentication services employ a protocol called OAuth (Open Authorization). It provides a unique token that identifies the account on the authentication service as the only thing needed to validate the user. In a nutshell, the mechanism is as follows:
- The user clicks on the login link in the application corresponding to a given authentication service provider.
- The application redirects the user to the credentials entry page of the chosen authentication service.
- The authentication service validates the user’s credentials and, if valid, returns a token that identifies the user on the authentication service.
- The application looks for the identification token in the corresponding table. If it finds it, it obtains the user’s account ID and enables the options allowed by the associated profile (see below for how the available options are determined for each user). If it doesn’t find it, it treats it as new user creation, allowing the user’s personal information to be entered.
The proposed model to support the authentication method using an external service extends the model defined above, adding a couple of fields to the user accounts table and a table to catalog the available authentication services.
By including the fields for referencing an external provider, password management is no longer required. In turn, all the fields we previously defined (except the primary key) are no longer mandatory.
ExternalProviderId field refers to one of the items in the
external_providers table, while the
ExternalProviderToken field stores the identifier assigned to the user by the authentication service.
Additional fields referring to the service may be added to the authentication services table. They may include things like the logo (to be displayed on the application login page) and possibly the URL of the authentication provider’s web service.
User Identity Versus User Account
Unique usernames, phone numbers, and email addresses are considered credentials or authentication factors for user identity. However, we do not want any of these items to be used as the unique identifier of the user’s account. In a well-designed authentication module, credentials are allowed to mutate without changing the personally identifiable information (PII) of the user’s account.
PII is a set of data that, when used alone or with other relevant data, can identify an individual. PII may contain direct identifiers (a passport number, for example) that uniquely identify an individual, and/or quasi-identifiers (e.g., nationality, gender, date of birth) that can be combined to recognize an individual.
The best practice in designing an authentication module keeps the concepts of user account and user identities separate. This separation allows users to link multiple authentication factors to a single user account and change those factors at will.
When applying these concepts to a data model, we are faced with having a table with PII and other tables with the data for authenticating the user. Building on the above model, we have the following:
The separation of the user ID and user account entities makes it possible to better specify the different login options.
Separating the concepts of the user identity and the user account is a necessary step in implementing multi-factor authentication (MFA) systems.
MFA is another best practice for strengthening a user authentication system. It is based on the parallel use of validation elements (factors) of different types. For example, it may be a piece of information the user knows (a password or a set of personal data, such as nationality, year of birth, etc.) used in parallel with a physical element the user possesses (a phone or other device).
Each authentication factor requires the data stored in user tables to validate the information provided by the user at login.
Assignment of Roles and Permissions
The final aspect in designing a user authentication module that follows best practices for a robust and versatile data model is to allow assigning roles and permissions to each user.
To do this, you add three tables to the schema: a roles table, a permissions table, and a table that stores the association between the roles and the permissions. Examples of roles may include:
While examples of permissions may include:
The permissions and roles tables are related by a third table that specifies a set of permissions defined for each role.
After adding these three tables to the schema, you need a way to assign roles to each user. Assuming each user has only one role, adding a field that indicates the assigned role to the user table is sufficient. By querying these tables, the application knows what it should allow each user to do.
With these last additions, our model looks like this:
The complete schema, including tables for assigning roles and permissions.
One Last Best Practice in Designing a User Authentication Module
We have learned how to design an authentication module by having the model assembled with everything we have seen throughout this article. Only one detail remains to be highlighted.
Ideally, the user authentication subschema should be loosely coupled with the application data model so that it can be copied to different applications without making major changes to it. This means the rest of the database tables should have no foreign keys pointing to the authentication schema tables besides the user ID.
For instance, these include an Orders table that relates each order to a user. Do not store data such as users’ email addresses, home addresses, or any other user data in tables other than the ones included in the authentication schema.
This can be achieved by normalization, which, in addition, solves many of the most common database design errors. It is worth bearing this in mind since you may create vulnerabilities that jeopardize user privacy.