A data model determines the logical structure of a database. It helps you find possible design issues before implementing and deploying the database. However, there are different types of data models which are used for different things. That’s what we’ll discuss in this article.
The first step you should do when creating a new database is to model it. There are some basic principles that should be followed in this case. By first taking the time to model the database you will have a clearer picture of the problem. You can also exchange ideas with your client and use the data model to help represent complex business concepts in your discussions. You might also find and clarify potential edge cases (unlikely but possible scenarios) that could change the data model, like having to split a table into multiple tables to represent different sub-categories.
By modelling your data, first you can avoid having to make major changes after the database is deployed.
What Is a Data Model?
A data model is a graphical way to design your database, allowing you to outline the main concepts first. It gives you a standardized framework for representing real-world entities and their properties. These entities are stripped down to their basic components and are usually related to each other in some way. Everything – the entities, their characteristics, and the relationships between entities – is represented in a very simple way. Data models also help you describe how data about the entities is organized, stored, and queried.
Here are some terms you need to know about data models:
- Entities represent real-world objects involved in a business. In an online store, customers, the orders they place, and the items offered for sale are all entities. These are very similar to a table in a database. Obviously, stores have more than one customer and sell more than one item; thus, each row in a table will eventually represent one customer, one order, or one item. In a data model, entities are represented by shapes (usually boxes).
- Attributes are the characteristics that describe individual entities. For example, a customer might be described by their first and last name, account number, and email address. An item might be described by its product number, color, and size. And an order will likely be identified by its unique number and the date it is placed. In a database, attributes are analogous to columns; a table can have many columns, and each column can have one value per row (although it can be the same value for multiple rows). In a data model, attributes are listed inside each entity
- Relationships show which entities are connected. For example, an order entity would be related to an item entity because the order is made up of various items. Likewise, customers and orders would be connected because customers place orders. In a data model, relationships are represented by lines connecting entities.
In the end, all of these concepts should come together in a diagram like the one below. But don’t worry, we’ll take you through all of the necessary steps to understand and build a data model.
Why Create a Data Model?
There are many benefits of creating and maintaining a data model for your database. I will outline and explain a few of these below.
From the business view, the benefits of having a data model are:
- A data model provides a common layer of communication that facilitates discussions between the data architect and the business people. As a visual model with a common vocabulary, it ensures all the specifics can be discussed in detail.
- Having high-quality documentation ensures that the implemented code is also of good quality. The code builds on the decisions made in the previous documentation phase, which reduces the number of errors.
- Cost control. Because there is less code that needs to be rewritten, developers can spend time mostly on feature development. This reduces time spent coding and thus eliminates some costs.
- Clarify project scope. Thanks to the data model’s visual representation, complexity is reduced and difficult concepts can be clearly understood.
From a technical view, the benefits of having a data model are:
- A data model has a technical layer associated with it. This layer provides all of the technical details (which are defined by the data architect), allowing the developers to focus more on implementation and less on interpretation.
- Fewer errors. Thanks to the clarity and specificity of the design in the data model, fewer mistakes are made on the data and application side. Developers can focus on feature implementation and not database design.
- The database structure can be optimized from the very beginning – before data is inserted. This reduces the need to move data (i.e. to improve performance after the database is in production).
- Reduced data risks. Having a clearer picture of the scale of the data, data architects and database administrators can plan their backup and restore strategies. Having plans and safeguards in place reduces risks in a disaster recovery scenario.
There are more benefits of using a data model in your process. You can read about them in the article Why Do You Need Data Modeling?.
Types of Data Models
Several types of stakeholders are involved in working with data models. Thus, there are three different types of data models to suit the different needs of each stakeholder. These are the:
- Conceptual data model.
- Logical data model.
- Physical data model.
Each data model builds on the preceding one to finally generate the database structure.
Conceptual Data Model
This is the simplest data model and often the first to be created. It’s defined in an ER diagram ( a simple chart showing entities, relationships, and sometimes attributes) and shows the main entities and their relationships to each other. During this phase, business stakeholders and the data architect are involved. Although this is the first data model, it is not the first step in building a database. See this article to read about all the steps required to build a database.
The role of the conceptual data model is to help define the initial scope of the problem. In addition, it helps organize the main business requirements, concepts, and rules.
The result of building the conceptual data model is a diagram showing the main entities, their attributes, and their relationships. If you’re not familiar with relationships between database entities, I can recommend this article on the essentials of data relationships. This knowledge will be useful going forward in our discussion, so take some time to read about entity relationships before moving on.
Here’s an example of a conceptual model. We’ll explain it in more detail later in the article:
Logical Data Model
The logical data model expands on the conceptual data model. We start by applying normalization techniques with the goal of reaching 3NF (third normal form). Third normal form is a way of structuring data storage so that it avoids duplications, promotes referential integrity, and reduces the risk of incorrect or missing data.
After applying normalization, we make sure that all relationships are outlined. This is the key step that differentiates the conceptual data model from the logical data model.
Normalization will break our entities into multiple tables (if necessary) and set the relationships between them. At this point, we no longer talk about entities; they’re tables now. After a table is normalized, its attributes are clearly defined and the data types are set. We also need to choose tables’ primary keys at this stage.
This is what a logical model looks like. Notice that there are more details provided for the attributes and the relationships (i.e. the types of lines that connect the tables):
We’ll discuss this in detail in an upcoming section.
Physical Data Model
The physical data model is the final and most detailed data model. After we’ve finished this model, we have all the database and table details defined and ready to be created.
During this phase, we finalize the primary keys and create indexes to improve performance. Foreign keys are also explicitly created to emphasize the relationship between two tables. Views, users, and user access are also defined at this step in order to create the data-access and security layer of the database.
Other important elements that are determined during this phase are nullable and mandatory columns, default values, column constraints, and collations. As you can see, the physical model has even more detail:
After this step is finished, you can write the SQL DDL scripts to create the database. Or, if you’re using a tool like Vertabelo, you can automatically generate your DDL scripts.
Example: Going from Conceptual to Physical Data Model, Step by Step
Now that we’ve gone through the theoretical part, let’s look at a practical example so we can really understand the concepts.
Let’s first define the scenario we want to model. Suppose we want to model the database for an online store. The store needs to keep track of their products, inventory, and orders. Additionally, customer information, such as delivery address and login data, needs to be saved.
Building the Conceptual Data Model
As we know, the first step in building a database is creating the conceptual data model. For this example, I will be using the Vertabelo Database Modeler because it will provide some big benefits down the line.
Because the difference between the conceptual and logical data models is very small, I will be creating a logical diagram in Vertabelo. However, since I’m starting with simple entities that will later be expanded, we can safely call this a conceptual model.
For our scenario, we’re going to have a diagram with only three entities: one entity to store customer information, one for order information, and one for product information. We also want to indicate the basic relationships between these entities. We’d have something like the data model shown below at the end of this first phase.
Building the Logical Data Model
In the logical model, we will apply normalization to the entities in the conceptual model. After that, we’ll ensure that all the relationships between tables are set and each table’s attributes are defined. In the end, we should have a logical data model like the one shown below:
You can see the difference between the conceptual and logical diagrams. Note that I have:
- Named attributes for each table.
- Assigned data types to all attributes.
- Selected the primary key for each table (denoted by
PI
, which stands for primary identifier.) - Created the appropriate relationship (one-to-one, one-to-many, or many-to-many) between tables. These describe how records are related to each other, e.g. one record can be related to many records in another table.
- Created a new table called
Address
based on normalization rules. I included this table because a customer can have multiple addresses where they might want to ship their orders.
This looks like a pretty detailed diagram for our use case. Now it’s time to create the physical data model of our database.
Building the Physical Data Model
If you’ve used the Vertabelo Data Modeler like me, you’re in luck. Vertabelo has a built-in feature that allows you to generate a physical data model from your logical model. There is a standalone tutorial on how to generate a physical diagram from a logical model that explains this feature in detail. In my case, all I need is a quick export from the logical data model. This is the result:
After exporting, a new table called Order_Product appears. This is because the Vertabelo modeler automatically creates linking tables if they are needed.
The generated physical model has also created foreign key columns in the Order and Address tables; it automatically creates the required foreign keys between tables.
It looks like our physical data model is complete! We now have a good foundation on which to develop our database. This is where I will stop, but if you need to go one step further, you can export the physical model to a SQL script with DDL commands.
Keep Learning About Data Models
I hope you enjoyed this article on the different types of data models and why they are useful. Building data models will help you as a software developer or a data architect. Knowing when to use the right data model and how to involve business stakeholders in the decision process is immensely useful. So, keep learning about data modeling!