Curious about why we model data with an entity-relationship diagram (ERD)? You've come to the right place.
An entity-relationship diagram, or ER diagram, is essential for modeling the data stored in a database. It is the basic design upon which a database is built. ER diagrams specify what data we will store: the entities and their attributes. They also show how entities relate to other entities.
Another advantage of ERDs is that they represent the data in a graphical manner. This makes it easier for business folks to understand.
Here is a simple example of an ERD. It’s for an online forum; we have a few tables with attributes and lines that represent the relationships between tables. I'll go into detail about the notation later, so if you're not familiar with it, don't worry.
Why Would You Need an Entity-Relationship Diagram?
This is a very good question. Another way of saying it is, what good is an ER diagram anyway? Why should I care?
To answer this question, let's take a step back. Why do we need a database? This may seem an obvious question, but it's worth addressing.
An Example from Everyday Life
Put simply, databases store (or “persist”) information.
If you haven't been living under a rock, I suspect that you know Amazon. So let's think more about Amazon: It's essentially a huge database of items that are being sold. What items are sold on Amazon? Which items are currently available? At what price? In what color? New or used? Plus many, many more details.
Jeff Bezos must have thought about how to store information about available products, starting with a list of items and creating a line with the name, price, color, etc. of each item.
Of course, that information needed to be updated regularly and by many people. This meant the company needed to store that information and make it available to many, many different systems so that people could buy things while others were adding more items to be sold. Obviously, Amazon relies on powerful databases to manage their online sales.
And that, in a nutshell, is why databases are so important: they are far more efficient at storing and retrieving large amounts of information than humans could ever be.
Let’s go back to Amazon’s products for a moment. To document the requirements for this level of information storage, developers need to speak with business people about what data would be associated with each item: its characteristics, price, any special deals (i.e. promotions or sales), etc.
Of course, when you start gathering requirements, they tend to multiply. Typically, everyone has their own opinion of what needs to be done. It’s also important to communicate effectively with the people who are providing the requirements so that they can be sure you’ve properly captured their needs. This is where the ER diagram comes in; it provides easy-to-read documentation of the information being stored and how it is related.
As the saying goes, "A picture is worth a thousand words". By having a diagram that clearly shows the entities and relationships you’re representing, it is much more obvious what areas are related. It’s also easier to visualize the impact of a change in one part of your model on other parts of the model.
Being able to graphically view and manipulate your entities and relationships opens up the design phase to non-technical stakeholders. They do not need to learn database development to understand the information being presented. An ERD is intuitive and lets them contribute their business knowledge to the design. Thus, both business and technical people should be able to understand a logical data model, at least at a high level. The ERD provides a common language they can use to communicate about database design.
Another advantage of a modern, online graphical ERD design tool is how it promotes remote collaboration. In the past, we might have gathered into a meeting room to "whiteboard" the requirements. In our increasingly global and remote interactions, we need a collaboration tool where teams can work on an entity-relationship diagram from different locations. Each member of the team can track progress and contribute to the gathering of requirements.
But remember, database design is more than simply creating an ER diagram. For more about this topic, read this article.
Once you have an ER diagram, tools can automate the process of creating the actual database. You don't need to re-model everything when you want to bring your ERD to life as an actual physical database; ER modeling tools like Vertabelo generate the SQL DDL scripts for you.
Maintaining and Modifying a Database
Like anything else, the life of a database has its ups and downs. In the future, we might need to make updates to the database. If we have properly documented the database with an ER diagram, it is much easier (and I am telling you this from experience) to make updates to the database.
Sometimes we find ourselves faced with a database that we did not create but need to understand. An ER diagram is an invaluable tool for piercing the secrets of the data being stored. It will not solve all our problems, but it does provide a roadmap to grasp what information is expected and possibly even how it will be used.
Another challenge that I have lived through far too often is encountering a database with well-documented SQL scripts for creation but no ER model of the database. This leads to one of my favorite features of good ER modeling tools: reverse engineering. You provide the SQL scripts of the database, and the modeling tool creates a physical data model. (We won’t go into the differences between conceptual, logical, and physical data models, as this blog has already covered it elsewhere.)
As is the nature of things, some databases get into trouble. With a well-documented database design based on an up-to-date ERD, it is much easier to debug any issues. Too often, I’ve seen developers trying to solve a problem without consulting the ER model – as if the solution is always to change code rather than to consider if the database needs to be updated to deal with an unexpected need.
Elements in an ER Diagram
So, now you know the value of an ERD. Let's move on to the elements within it, starting with the notation (i.e. the symbols that represent entities, relationships, etc.).
IE (Information Engineering) notation is very widely used in ERDs. This notation was originally created by Gordon Everest in 1976 and is often referred to as "crow’s foot notation". It was documented in the 1989 book An Introduction to Information Engineering: From Strategic Planning to Information Systems by Clive Finkelstein. I will be focusing on this notation. If you're curious about other options, you can find information about different ERD notations here.
IE notation is straightforward and doesn't need any special knowledge. Here are the basics:
- The items to be stored in the database are referred to as entities and are represented by tables. Logical entities (or physical tables) are represented by rectangles, as you can see in this simple model of an address.
- Within each rectangle, we document the characteristics for that entity, which we refer to as attributes. We also document the particular type of data that is stored in each attribute: Is it a number? Is it a string of characters (e.g. a name)? Is it yes/no or true/false?
- Then we have the relationships between the entities. We link entities/tables together with a connecting line. Below we see a simple example of a table of users that’s linked (or related) to their role.
- When properly documented, each line/relationship should provide more information about the meaning of the link. Is it mandatory or optional? Is one item linked to one other item or to many items? Rather than explaining every possible variation, I’ll simply point you to this description of crow's foot notation for the details. Suffice it to say that the lines, circles, and branching lines you see in the example below all have meaning:
Even without extensive knowledge of online forums, I can see that we will store users and that users have a relationship to threads and posts. Also, I can see that threads are linked to posts. If you're interested, I have a 3-part series on creating a progressively more complex ERD for the database design of an online forum.
But I suspect that you didn't come here to learn to design an online forum database. So, let’s move on to ...
How to Design Your Own ERD
You can design your own ER diagram. A simple option is to use a dedicated database modeling tool like Vertabelo. Vertabelo allows you to create conceptual/logical and physical data models. Once you complete your model, Vertabelo will generate database setup scripts, also referred to as SQL DDL (Data Definition Language) scripts. These scripts allow you to create the database in the particular database management system (DBMS) that you plan to work with.
- Online musical equipment store.
- Wine store.
- Peer-to-peer lending platform.
- Automobile repair shop.
- Smart home.
And, besides the online forum, there’s also a model for an online survey:
I also particularly like Emil Drkušić's example of a data model for map-and turn-based games.
This model looks a little daunting, but Emil breaks down each subject area into understandable chunks. Also, you'll find links to several of his database models for other MMO games and classic games like Dominoes or Star Trek's 3D Chess (which Spock played so much).
What’s Next in Your ER Diagram Journey?
If you're looking for database design best practices or tips for better database design, this blog is a great free resource. And if you want to get started using Vertabelo to create your own ERD, I've got just the article for you.
As you build more and more complex ER diagrams, you’ll be building your professional skills in a very in-demand area. Happy learning!