How difficult is it to program a user login function for an application? Novice developers think it’s very easy. Experienced developers know better: it is the most sensitive process in your application. Errors in login screens can lead to serious security issues. In this article we take a look at how to store authentication data in your database.
The most common way to authenticate users nowadays is with user name and password. How do you store this information in a database? Obviously, you should store login info in one table. The name of the table is up to you. My advice is to avoid the name user for this table. It’s a keyword in many databases, which makes querying the table a pain. The name I have chosen in my very simple bookshop database model is user_account
.
The primary concern with database design in this case is security. Login information is very sensitive. Even if your application is some obscure niche forum, users often reuse passwords in other applications. A password to your forum can also be a password to a user’s bank account. You should design your database so that even if your data ever gets stolen, user passwords cannot be retrieved from your database. (Of course, the design does not free you from securing your database access properly.)
Storing Passwords in a Database
Rule No. 1 is: never store passwords in plaintext in a production database. Rule no. 2: don’t encrypt passwords. If it can be encrypted, it can be decrypted, for example by a laid-off, malicious administrator. There is absolutely no need for an administrator to know the user’s password.
What should you do instead? You should store a hash of the password. A hash function takes an input string and returns a new string (a hash) so that it’s difficult to find the original input just from the hash. Simple hash functions are used in the implementation of hash maps. For hashing passwords, you should use good hash functions: such as the ones
- for which no two inputs with the same hash values have been found,
- which are difficult to reverse: once you have a hash value, it’s difficult to find the original input.
Popular algorithms for hashing passwords are: SHA-2 (this is a family of hash functions: SHA-256, SHA-384, SHA-512, and more) or SHA-3 (again a family of hash functions). Not so long ago MD5, SHA-1 were also popular for hashing passwords. However, MD5 has been broken, and a theoretical attack on SHA-1 has been reported, so it’s best not to use them for hashing passwords.
To store the password you do something like this:
hash = hash_function(password)
and you store the computed hash value.
To check the password you compute the hash for the password provided by the user. If the computed hash matches the hash stored in the database, you assume that the user entered a valid password. Since no two values with the same hash are known for a good hashing function, you’re safe to assume that the password is correct. The pseudocode for validating a password:
entered_hash = hash_function(entered_password) if (entered_hash == stored_hash) { VALID_PASSWORD; } else { INVALID_PASSWORD; }
Salting Passwords
A little amendment: you should hash a password together with salt. Salt is a random-generated string which you add to the password when you compute the hash.
hash = hash_function(salt + password)
What’s the point of salting passwords? Many users choose very easy passwords like 123456, or qwerty. If you just hash the password, then you can tell that two users have the same password, because the password hash is the same. You don’t want to be able to do it. Another reason is that user passwords are usually short. If you just hash the password, then you can use the brute-force technique to guess passwords. Generate all short strings, compute their hash and see if the hash is in your database. For popular hash algorithms, like MD5, tables with precomputed hash values (so called rainbow tables) are available on the internet.
Using the salt prevents you from doing either of the two things. With salting you can’t tell if two users have the same passwords. Their salt will be different and their hashed password will be different, too. It’s not that easy to use brute-force to break passwords: you’d have to attack each user separately, not all users at once.
The salt should be long and random-generated. Do not use user names as salt. User names are – from the point of view of a hash function – quite similar and quite short.
Database Design for Storing Passwords
We end up with three columns to store password info:
password
– this is the column for the password hash. The hash values are rather lengthy, even for short password. You should make sure that your password field is long enough to store the hash.password_salt
– again, the column should be long enough. The column is nullable because you may skip salting passwords in the development process.password_hash_algorithm
– the algorithm used for hashing passwords. Why do you need this column? First, in the development process you may want to skip hashing passwords. Second, if you have to migrate users from a different system, you can migrate them with their old passwords. On first login to the new system, you check their password using the old method. Then you force them to change the password and store it in your new format. Third, hashing functions become outdated. With the algorithm column, you may seamlessly upgrade to a different hashing function when the need arises.
User Columns
There are also a number of columns which store user details.
id
– a primary key column. Of course, you can use auto_increment, identity property or sequence to automatically manage the incrementing of id. Even though the user name is unique, it’s not the best idea to use it as primary key. The primary key column in a user table is a column you will very often make joins with. The user name is a varchar column and joins on varchar column are not very efficient. A primary key should be unmodifiable. If you used the user name as primary key, and later decided to make the user name modifiable, you’d have to update all references to the table. With a surrogate primary key, the primary remains unchanged no matter how much the user data change.user_name
– anything you want to use as login criteria. The user_name should be unique. Add a unique key to the table.email
– obviously, you may not need this field in your system. On the other hand, email does double duty as user name in many systems. Remember that the allowed email length is 254, as per this RFC errata. Always consider setting email to unique.
In my user_account
table there is also a bunch of other columns which I omitted. What other info do you store in your user_account
table?