A step-by-step guide on how to design an ER diagram for a movie database and model the relationships between actors, movies, and studios.
An entity-relationship diagram (ER diagram or ERD) is a visual representation of a database that displays the relationships between entities. In this article, we’ll guide you through the process of designing an ER diagram for a movie database, using a case study to illustrate the concepts. We’ll cover the basics of data modeling and ER diagrams and show you how to model the relationships between actors, movies, and studios in a clear and efficient way.
By the end of this tutorial, you will have a better understanding of how to use ER diagrams to design a movie database and other types of databases.
Creating the ER Diagram for a Movie Database
When creating an entity-relationship diagram for any kind of database, you’ll go through the three main phases of data modeling: drafting the conceptual model, logical model, and physical model. What are each of these models?
- Conceptual model: In this phase, you will identify and define the entities, attributes, and relationships that are relevant to your database. You will also establish the rules and constraints that will structure the data. The goal of this phase is to talk to the business stakeholders and create a high-level understanding of the data and its structure.
- Logical model: In this phase, you will map the conceptual model to a more detailed logical model. This will involve expanding the entities found in the conceptual model and defining the attributes that will be used to store the data. Also you will determine the relationships between the entities (which will turn into database tables), and the integrity constraints that will ensure the accuracy and consistency of the data.
- Physical model: In this third and final phase, you will design the actual database schema for a specific database engine. This will include specifying the data types, indexes, and other physical characteristics of the tables and columns, as well as optimizing the schema for performance and other factors. The physical model represents the final design of the database and is used to create the actual database instance with DDL statements.
Who Creates the ER Diagram?
The responsibility for creating the ER diagram falls to the database designer. The database designer works to design and implement the database schema and maintain the integrity and performance of the database. Database performance can be improved throughout time. However, if the database designer can predict some areas where best practices can be applied to improve the overall quality of the database, then those should be made.
Because some of these performance aspects include when to apply denormalization (which influences the physical model), this can also fall to the database designer. Also, the designer needs to ensure data integrity by applying CHECK constraints on some columns and making sure foreign keys are used where necessary. Sometimes data quality and database integrity might also involve making sure to apply CASCADING TRIGGERS ON DELETE
and/or UPDATE
.
The database designer may also consult with other stakeholders and IT professionals to ensure the database meets the needs of the business and is compatible with the application or system that will be accessing it.
An ER diagram is an important tool that defines the structure and relationships of the data. It is the responsibility of the database designer to create a clear and accurate representation.
Designing a Movie Database
We start our data modeling by defining the main entities in our movie database. This is the initial step of creating the conceptual data model. The main entities for our ERD are movies, genres, actors, and directors. We’re going to use Vertabelo to design our data model so we can take advantage of the end-to-end integration of the logical and physical data model. This will make things easier later in the process.
In addition to the main entities, we’ve also added a few more that are relevant to cataloging movies. Our example database model is quite simple, but if you are working with a more complex concept, you may need to leave a few comments in the model. It’s a good idea to use text notes in Vertabelo for these clarifications.
Ok, now let’s add some detail to our model:
All of the tables have basic attributes, like first names and last names for entities that represent people and simple names for other objects like movie genre, languages, cinemas, and awards. We’ve also added relationships between entities. This is the logical model.
Most of the entities and attributes are self-explanatory, so we won’t go through each one in detail. There are a few cases where we need more information, which we’ll cover next.
For the Movie
table, we have these two interesting attributes:
MovieLength
– The running time of the movie, typically stored in minutes. Hence, we use an integer datatype.Rating
– This is a computed value of all the movie review ratings. (Note: We use rating as in a review of “ 3 out of 5 stars”. This doesn’t refer to the content rating, like PG, 6, or 12+). This is stored at the movie level, which is important so we don’t recalculate all of the ratings every time we need this information.
For the Award
table:
Apart
from theAwardID
and AwardName, we are actually missing an important concept. Awards are generally given each year during film festivals; this is missing in our current table diagram. Read on and you’ll see we haven’t forgotten this idea.
For the Ratings
table:
Source
– It’s important to know who gave the rating for a movie, since some sources are typically more critical of certain movie genres than others.
Notice that most of the relationships are coming from the Movie
entity towards the other entities. These relationships are all many-to-many relationships. The reason why all of these relationships are many-to-many is illustrated in three examples below; it holds true for all the many-to-many relationships in this ERD:
- Movie -> Actor: A movie can have one or more actors playing in it, and the same actor can play in multiple movies.
- Movie -> Genre: A movie can be classified under multiple genres, like Comedy and Romance, but every genre will usually have multiple movies.
Movie -> Director
: A movie can have one or more directors, but the same director can direct many movies.
Another important aspect is the cardinality of the relationships. If you look closely, the cardinality in Movie
-> Cinema
, Movie
-> Ratings
, and Actor
-> Award
entities is a 0 to many cardinality. This means that a movie does not necessarily need to have ratings or be shown in cinemas, since some cinemas refuse to show movies from certain genres, directors, etc. Also, an actor does not necessarily need to have an award to be in a movie.
We also need to make sure that we’ve marked which attributes are primary identifiers (these will become primary keys) and which attributes are mandatory.
Generating the Physical Data Model for the Movie Database
If we’ve built our logical data model in Vertabelo, then we have a big advantage at this point. Vertabelo allows you to easily convert a logical data model into a physical data model.
Now that we’ve generated our physical model, we can inspect and see if the output is as desired and meets all the criteria of a high-quality data model. For example, we need to make sure that each table has a primary key so that we can uniquely identify each row in that table.
Additionally, we can see that many of the relationships now have an extra table added between them. These linking tables allow us to manage the many-to-many relationships between these tables. If you’ve defined the right type of relationship in the logical model, Vertabelo will automatically create these linking tables for you.
We can create a linking table ourselves by simply including foreign keys that point to both tables. Below, you can see that Movie_Genre
(which links each movie with its genre(s)) includes only two fields: a foreign key that points to GenreID
in the Genre
table and another foreign key that points to MovieID
in the Movie
table.
At this stage, double-check that every table has a primary key. For all relationships between tables, make sure there are the right foreign keys.
Apart from these points, there are also a few more things related to table and column naming conventions, applying data types, etc. that you should follow. I recommend having a look at our article 8 Things to Consider When Creating a Physical Data Model for more details.
Generating a SQL DDL from the Physical Data Model
Once we are happy with the design of our physical data model and fine-tuned it by adding any needed linking tables, are we done? Not quite. We also need to make sure the right data types are applied to all columns and the foreign keys point to the right columns. Sometimes you might also want to create constraints on some columns (to avoid bad data coming in); that is explained in more detail in the article I linked above.
Once we’ve checked and double-checked the model, we can turn it into a database. We do this by creating a DDL (Data Definition Language) script. This is an SQL script that contains the instructions to create the database. In Vertabelo, you can generate the DDL script, by pressing the SQL button on the top menu of your physical model. This will make the Generate SQL Script pop-up appear, where you can select the settings you want for generating your DDL script:
Once you’ve double-checked the settings, you’ll press Generate and then set a few more things like the file’s name and where to save or download it. In a few seconds you’ll have your SQL DDL file. For example, I created this DDL script for the movies database schema example:
Beyond Movie Database ER Diagrams
And this is how we would create a minimum viable data model of an ER diagram for a movie database. For other use cases that require ER diagrams – like a data model for a wine store or a SaaS subscription data model – I recommend these example ERDs from the Vertabelo blog.