Database design is one of the most important factors contributing to the performance of an application. Consequently, how well the database is designed is of utmost importance. Database design is all about efficiently organizing data based on product workflows, future roadmap, and expected usage patterns.
The output of a database design exercise is a data model. A data model represents all the objects, entities, attributes, relationships, and constraints in the system. Broadly speaking, data models can be of two types: logical or physical. The representation of the data model is done by creating an ER diagram, also known as an entity relationship diagram, an ERD diagram, or a database diagram.
The physical data model relates to the actual implementation details in the database. The logical data model, on the other hand, abstracts away the implementation technicalities. This makes the logical data model consumable for the business. One key difference between the two models is that the logical model is database-agnostic while the physical model has to be specific to the database in use.
Proper database design is often understated and neglected during application development. The cost of this neglect is realized usually much later when new application features come in or when old features require change. This is when the database design ceases to make sense. While it is not possible to future-proof the design of a database, it is very much possible to make the effort to best understand the business use cases and design the database accordingly. Read more about tips on better database design here.
With that in mind, let’s go through the steps in database design.
Step 1: Gather Business Requirements
The first step is to talk to the business about their requirements. If the conversation is effective, it should result in enough information to start working on a conceptual data model, which is an abstraction of the logical model. Talking to the business, first of all, provides a complete picture of the business processes, which, in turn, provides information about the various data points that are important for the business to capture and track. For instance, in a taxicab booking model, it is worth asking the following questions:
- Does the business want the vehicle tracking data in the database irrespective of whether there's an active trip or not? If yes, then the field
vehicle_trip_id
in the tablevehicle_trips
would be nullable. Otherwise, it won't be nullable. - Does the business want the history of changes to
trip_status
stored in the database? If yes, then every time thetrip_status
changes, there'll be another record in thetrips
table. Otherwise, every time thetrip_status
changes,trip_status
will be updated in place.
As shown in this example, based on inputs from the business, you would end up choosing one option over the other. It would result in changing the concerned entities and their relationships.
Requirement gathering also generally involves initiating a conversation about data security, such as which data to be masked and encrypted. The requirement gathering exercise results in a requirement document often supported by a working draft of the conceptual data model.
Step 2: Understand Business Roadmap
Businesses change their processes all the time; their ability to adapt makes them less likely to fail. Changing business processes means changing workflows and data models. Although it is not possible to know these changes way ahead of time, it is certainly possible to be up to date with the business roadmap.
For instance, if a company has plans to target a new geographical region, the model would have to cater to language support, currencies, time zones, and so on. The benefits of understanding the long-term business roadmap often show up in a smoother transition to new business processes.
Let me share one more example, which is more about continuously evolving business priorities. The taxi business was impacted badly at the beginning of COVID-19. As a cab company, you want to act preemptively to assure people that you're doing everything to make sure that your travel in the cab is as safe as possible, that the vehicle is disinfected every day, that the driver wears a mask at all times, and that there's hand sanitizer available in the cab. Now, to capture all this information, changes to two entities, drivers
and vehicles
, would be required. Several Boolean flag fields need to be added to these entities to cater to this business use case.
Step 3: Identify Entities and Attributes
Once the business requirements are gathered, the information can be used to identify entities along with the essential set of attributes. One or more entities generally map directly to business processes, and the relationship between those entities also mimics the business process workflow.
This step is also used to identify which attributes will act as identifiers in the entities. Identifiers translate to primary keys in the physical model. In addition, it is also common to specify data types for all the attributes in this step.
For instance, in the taxicab booking model, you would have to identify the attributes which will act as the mandatory fields for the registration of users and drivers from the booking app. User registration would be done using user_phone
and driver registration would be done using driver_phone
.
Similarly, other entities and attributes are identified during this step, after having been mapped to the business process workflows.
Step 4: Identify Relationships
After identifying the entities and their attributes, the next step is to define the relationships between entities based on the business workflows that were documented in the requirement gathering phase. In addition to establishing that there's a relationship between two entities, it is also important to identify which of the following four types of relationship exists between them. Consider two arbitrary entities, A and B:
- One-to-one → One record in A corresponds to at most one record in B.
- One-to-many → One record in A corresponds to many records in B.
- Many-to-one → Many records in A correspond to at most one record in B.
- Many-to-many → Many records in A correspond to many records in B.
In the taxicab booking model, only one type of relationship has been used, i.e., one-to-many. Take the relationship between users
and trips
as an example. In the model, there's an assumption that only one user can be related to a trip, which implies that there are no shared or pooled cabs. But if there were shared or pooled cabs, there would possibly have been a many-to-many relationship between users
and trips
, if many users shared the same trip_id
.
Step 5: Create a Logical ER Diagram
With entities, attributes, and entity relationships defined, the immediate next step is to draw the ER diagram. All of the steps listed above can be done within Vertabelo. There are no hard and fast rules for the way logical modeling is supposed to be done, with the possible exception of the reference notation.
For instance, take a look at the following example of a logical ER diagram. It captures a simple business workflow of a cab company, where a user can book a ride with the ability to track the vehicle.
Step 6: Validate the Logical ER Diagram
Logical modeling is a process in which a lot of business information needs to be translated into a database design. Without thorough checks, this phase of database development is prone to errors that can prove to be quite costly at a later stage.
To take care of this, Vertabelo has a thorough list of checks that can be performed on a logical model. Checks can be performed at all granularities, from the model as a whole to individual attributes, and everything in between. Some of the simple checks are:
- Names of entities, attributes, relationships, etc., cannot be null and have to be unique.
- An entity must have at least 1 attribute.
- Identifiers (PKs) must be defined for every entity.
- The model must use one of the listed data types for attributes.
All of these checks are optional and can be configured to be skipped, if there's another validation framework in place. Proper validation from Vertabelo helps you move to the next step with the minimum amount of friction possible.
Step 7: Create a Physical ER diagram
Once the logical ER diagram is created, the next step is to create a physical data model. The physical data model will be specific to the database where you want to deploy the data model. All databases have their unique implementation of nomenclature rules, data types, and constraints. Due to this, the Data Definition Language (DDL) often differs from one database to another.
To create a physical data model, follow these steps:
- Find the closest data type in the target database to replace the generic data type selected in the logical data model.
- Follow the nomenclature rules for tables, columns, and constraints as prescribed by the target database.
- Modify the model to align with predefined query workflows. This generally results in increasing redundancy to save joins.
- Finally, you can create indexes, partitions, distribution keys, and sort keys. This is when you can create any performance-enhancing modifications to the model.
These steps can be performed using any data modeling tool you can use to create a data model from scratch. However, Vertabelo has an option to convert a logical data model to a full-fledged physical data model for all the major database systems like MySQL, PostgreSQL, Oracle, Microsoft SQL Server, Amazon Redshift, Google BigQuery, and more. Once the logical data model is converted to a physical data model, you can carry on with the four steps we discussed.
Vertabelo also has an option to add pre- and post-deployment scripts at the table level along with any comments at the very granular level of the model. The comments turn out handy when the documentation generation feature offered by Vertabelo is used. The database document can be exported in any of the following three formats: HTML, PDF, or DOCX.
Continuing with the cab booking example, let's take a look at the physical data model generated by Vertabelo.
Step 8: Validate the Physical ER Diagram
Just like the logical ER diagram was validated, Vertabelo has a tool to validate physical ER diagrams with several additional checks, like whether or not FKs exist and whether the length of a table name or a column name exceeds the limit based on the database selected.
The validation doesn't need to be run explicitly. It happens as the diagram is modified. The issues with the model fall in one of three categories: errors, warnings, and hints, in order of decreasing severity. There's a useful, well-written article which talks about the common mistakes made during the database design process.
Step 9: Fix Issues With the Physical ER Diagram
The results of the validation may identify issues which need to be fixed. Some of the most common issues are:
- Missing foreign keys where entity relationships have been defined.
- Missing primary keys from tables.
- Unsupported data types for the selected database.
Once these and other similar issues are resolved, the model is ready to be exported to a deployable SQL script for the selected database management system.
Step 10: Generate the DDL Scripts for Deploying the Model
Database design is not just about creating ER diagrams. A data modeling exercise using ER diagrams is successful only if it results in something deployable. Vertabelo has a convenient option to export the physical model to a ready-to-deploy SQL script. Once it is generated, any pending issues can be resolved directly in the SQL script.
However, changing the generated SQL script is not recommended. It causes a drift between the physical data model and the SQL script deployed on the database, which can also mean a drift between the actual tables and the database documentation.
Now that we've reached the end of the database design process, let's have a look at the SQL code generated by Vertabelo.
Share Your Thoughts
Database design is a high-impact activity in software development. The field of database design has evolved over the years with new ways to represent the design for the business, for the engineers, and for the data analysts. This has often resulted in new types of diagrams, modeling standards and notations. Much of the evolution has been covered in the design fundamentals section.
We'll be happy to see what your experiences have been in designing databases. Write to us at contact@vertabelo.com.