Back to articles list

September 6, 2017 - 13 minutes read

Example er diagrams

A Song of Ice and Databases: A Game of Thrones Data Model

Emil Drkušić

Database designer and developer, financial analyst.

Tags:

No plot spoilers – just a data model based on the Game of Thrones TV series.

“You know nothing, Jon Snow.”

Starting from episode one, the storyline was intense, dynamic, and full of twists. George R.R. Martin did a great job of writing A Song of Ice and Fire, the multi-book series on which Game of Thrones is based. Only five of the projected seven books in the series are currently completed, and the TV series’ storyline is now ahead of the published books.

We can’t find out what will happen in the next season. Until then, let’s try something completely different. Let’s create the Game of Thrones database model.

A GoT Data Model? What? Why? And How?

This is not the sort of data model we’ve been featuring on the Vertabelo blog. Most of those are about running various types of enterprises or solving real-world business problems. This data model will focus on storing and modifying relationships and events in the story. So, let’s start with describing what this database will cover.

What should this data model cover?

In each series (movie, book, etc.) there are many plot changes. Characters are born, they change alliances, they travel, they fight other characters, they die, etc. To track all this action, we’ll need to store:
- Locations
- Events
- Characters
- Timeline
- All relationships between all these categories
Of course, we can’t store the whole series. But that is not the idea. I want to be able to store the backbone of the story in a structured format. That backbone should contain all we need to recreate the storyline from the start.
Can this model be used for another TV series? What about movies?

Yes. Most other series, regardless of genre, follow the same concept. I can’t think of any that wouldn’t fit into this model: all have characters, locations, events, and plots. Maybe some experimental movie wouldn’t fit, but most will.
Are we limited to movies and TV series?

No. We could apply this to A Song of Ice and Fire or any other book. The same rules that are valid for movies and TV series also apply to books.
How is the data model limited?

Obviously, we won’t use this model for running a business or for project or process management. On the other hand, we could use something like it when storing relations between events, persons, and locations. After all, some elements of GoT are based on real-life events. And if we can store its plot, we should be able to store the “plot” for re-creating the sequence of real-life situations as well.

Marketers could use something like this to track what people do and buy with the idea of sending them personalized offers for products and services.
Why was Game of Thrones chosen as an example?

When people ask me “Why?” (and now I’m asking myself “Why?” ☺ ), I usually answer “Why not?” Game of Thrones is popular, so its fans should easily understand the concepts involved in storing the data. After all, they already understand the relationships between all the elements in the series.
And also, it’s cool. ☺

Let’s move on to the model.

The Data Model

The data model consists of three subject areas:

Locations & timeline
Groups & characters
Events

We’ll describe each subject area in the order it’s listed.

Section 1: Locations and Timeline

The first subject area we’ll describe is the Locations & timeline subject area. The tables here are prerequisites for the tables in other subject areas.

The location table stores all locations in the series. For each location, we’ll store its details and the first location that is “superior” to it. For a city (King’s Landing), we will also store its region (The Crownlands) and its continent (Westeros). We’ll have one record for each of these three and define the relation to the “superior” location. The columns in this table are:

location_name – The location’s name, such as “King’s Landing”. Please note that two different locations could have the same name, e.g. two towns in different regions have the same name. Therefore, location_name alone can’t be used the alternate (unique) key.
location_type_id – A reference to the location_type dictionary. This dictionary is used to differentiate villages, rivers, and mountains; we’ll discuss it more in a moment. For “King’s Landing”, we could expect that the location type is “city” or “capital city”.
superior_location_id – The ID of the larger location that the current location is part of. For King’s Landing, the superior location is its region, the Crownlands. For the Crownlands, the superior location is the Seven Kingdoms, and the Seven Kingdoms are located on the continent of Westeros. Westeros and the other two known continents (Essos and Sothoryos) form the Known World. So obviously, the Known World is the top level in our location hierarchy. That is also the only location that won’t have a superior location, which is the reason this attribute could contain a NULL value. All other locations must have a superior location defined.
level – The location’s position in the location hierarchy. The first level is the Known World, the second is the continents, etc. This value will tell us how many relations (joins) we have to make to get to a given location’s level. If we want to know the continent where a city is located, we’ll need to make two joins: one to get the region and another to get the continent.
existed_from and “existed_to” – Define when a certain location existed within the story (e.g. when the continent was discovered, when the city was destroyed, etc). Both attributes can be NULL because the location could have existed “from the beginning” or it may never be destroyed.

The dates in this field are DATE types, which should work for most cases. If we only have the year when something became known, we could go with 1 January of that year. If a location exists for several epochs, we don’t need to store a relation for every single epoch. (We’ll talk more about epochs and timelines later in this post.)

This structure – a record from one table is related to the foreign key of another record from the same table – is needed because we can’t know in advance how many levels we’ll have. For GoT, our levels could be the Known World (1), continents (2), kingdoms (3), regions (4), and cities, villages, or forts (5). We could go with five tables to store this hierarchy, but that would work only for this model. What if we need six levels in the hierarchy? And if you use five tables to store hierarchy, you’ll have five foreign keys each time you reference a location – you won’t know until you enter it which level of hierarchy you’ll point to. Obviously, that’s not the best choice. We’ll go with references to the same table, as I did here.

We have already mentioned the location_type dictionary. Here, we’ll store a list of UNIQUE type_name values that describe each kind of location. We could expect values like “continent”, “city”, “village”, and “mountain”.

The third and final table in this section is the timeline table. The idea is to store each epoch and age that are relevant in the series. Although the complete story of the series is placed in “modern times”, events that took place in previous epochs and ages are extremely important to the plot. That is the reason why I also want to store the events, locations, and characters in relation to various timelines.

The structure is similar to the location table. We’ll have ages and their “superior” or preceding ages. For example, the oldest epoch we know is the Dawn Age. It’s composed of three periods: Prehistory, the age when the first men invaded Westeros (12,000 BAL) and the age when the Pact was signed (10,000 BAL). All timeline events in Game of Thrones use the Conquest of Aegon the Conqueror as the reference point, or age 0. Everything before that is BAL (Before Aegon's Landing) and after that is AL (Aegon's Landing). For each of these timeline periods, we’ll store:

timeline_name – The name uniquely denoting that timeline.
start_time and end_time – Are dates, if available, when this period started and ended. Both could contain NULL values. The first period ever won’t have a start_time value, and the current period won’t have an end_time value.
superior_timeline_id – References the superior period, if any. Main epochs (i.e. The Age of Heroes, The Andal Invasion, The Age of the Hundred Kingdoms, etc.) don’t have superior timelines. Therefore, this attribute can contain NULL values.
level – Works as in the “location” table: the lowest number means the highest level. Epochs will have a “1”, their component ages get a “2”, and so on. This can be extended downwards as needed.

Section 2: Groups and Characters

The Groups & characters subject area stores all groups and organizations as well as the characters that belong to them. Characters could belong to several groups at once. For example, Daenerys Targaryen belongs to the House Targaryen and at the same time is part of the armies she leads.

We’ll start with the character table. We’ll store details about the characters that appear in the series and/or are important to the plot. We could store every single character, but there are many “nameless” characters that are not important to the story. For Game of Thrones, the most important characters include Tyrion Lannister, Cersei Lannister, Jaime Lannister, Daenerys Targaryen, Bran Stark, Arya Stark, Sansa Stark, Ned Stark, Petyr Baelish, Samwell Tarly, Robert Baratheon, and – of course – Jon Snow. Is Jon Snow the ice and fire? The peace bringer? Or maybe Ygritte is not really dead at all? (Hint, hint, wink wink.)

Ahem. Back to the database. ☺

For each character, we’ll store the following:

The first_name and the last_name of the character.
title – Any title the character has. If the character has no title, then this would be the place to store their position, e.g. peasant, cook.
full_name – The title, first name, and last name, combined into one attribute.
background – Describes, in text format, the character’s background.
profile_picture – Link to a picture of the character.

Groups will almost always evolve as the story develops. Characters will change groups and switch alliances along the way. So, we need to define each group that took part in the story. We’ll keep a list of all such groups in the group table. These are its columns:

group_name – A name that UNIQUELY identifies this group.
existed_from and existed_to – Dates denoting when the group was active. Both of these values could contain NULL values because 1) we don’t know these dates, or 2) the group is still active.

Now we have everything we need to relate characters to groups. We’ll do so using the in_group table. For each character_id – group_id pair, we’ll store the period when the character was part of this group and any additional details. One character can belong to more than one group at one time.

The last thing we’ll do in this subject area is to record the dynamic between characters themselves and between characters and groups. A character can be friendly to a group (or fellow character) or allied with it; they can also be opposed to a group or neutral towards it. And all of that could change in the course of the series. A list of all possible relationships is stored in the relation_type dictionary. Besides the primary key, we’ll store three more values:

relation_name – A UNIQUE name that distinguishes this relationship type.
is_group_relation – If this attribute is set to “True”, this type of relationship is assigned to a pair of groups. Examples are “ally”, “enemy”, etc.
is_character_relation – If this attribute is set to “True”, this type of relationship is assigned to characters. Examples are “father”, “mother”, etc.

The remaining two tables in this subject area (the group_relation table and the character_relation table) are used to assign relationships to pairs of groups or characters. They have similar structures, so I’ll describe them together. They contain the following values:

group_1/”character _1” and group_2/”character_2 – Denotes the pair of groups or characters involved.
relation_type_id – The type of relationship.
date_from and date_to – The period when the relationship was valid. Both of these attributes could contain NULL values, e.g. in cases when the relationship does not end.
description – An additional text explanation of the relationship, if needed.

Section 3: Events

We’re finally at the most exciting part of the model! There is no story if nothing changes. Who writes stories where nothing happens? OK, exclude dramas. ☺

Events are the most important part of any story. They keep us reading or watching. The whole plot can change, either slightly or dramatically, based on its events.

We’ll use five more tables to store event details. The remaining two tables – the group table and the timeline table – are copies, used here to avoid overlapping relations.

We’ll start with the central table in this subject area, the event table. This is where we’ll store events like battles, meetings, etc. An example of a GoT event is The Red Wedding. For each event, we’ll store:

event_name – A name that UNIQUELY denotes that event.
timeline_id – References the timeline catalog and stores when that event took time.
details – All details related to the event, in free textual format.

The first thing we’ll do is to relate events with locations. Normally, we’d expect each event to occur in one location, but it could also take place at several locations. Let’s put a many-to-many relation between events and locations to cover that contingency. For each location-event pair, the is_located table will hold:

event_id and location_id – References the event and its related location.
start_time and end_time – When the event happened, if we have this information. Note that these are NULLable.

The group_is_part and character_is_part tables define the groups and individual characters that took part in the event. Both tables have almost exactly the same structure, so I’ll describe them together. The attributes in these tables are:

event_id – The ID of the related event.
group_id / character_id – The ID of the related group/character.
start_time and end_time – When the event happened. Both attributes could contain NULL values because we may not have that information.
details – A textual description, if needed, about the group/character role in that event.
event_outcome_catalog_id – The outcome for that group/character in that event.

The last table in this model is the event_outcome_catalog dictionary. Values from this table are used to denote outcomes for groups/characters in events. We can expect values like “victory”, “defeat”, “death”, etc.

What Do You Think?

Today we talked about a data model that could be used to translate George R.R. Martin’s Game of Thrones into a database. It can’t be done completely, at least not with this model, but I think we’ve captured the most important facts.

You could also use this model to store the storyline of other TV series, such as Midsomer Murders or Fawlty Towers. What would you add to this model? Or maybe remove? Feel free to share your ideas, even if it’s just a comment on the existing model!