What Do the Olympic Games, UEFA Euro 2016 Football Matches, and Databases Have In Common?
On hearing what I do, people tend to ask me the same question: Can you develop a system that predicts football match results? Or Olympic medal outcomes? Personally, I don’t put much faith in predictions. Still, if we had a large amount of historical data and relevant indicators, we could certainly design a system to help us come up with more accurate assumptions. In this article, we’ll consider a model that can store the results of matches and tournaments.
This model is primarily focused on European football (soccer) matches, statistics, and results, but it could easily be tweaked to accommodate many other sports. My main motivation for this article was this year’s two big football events: the UEFA Euro 2016 Championship that just happened, and the 2016 Summer Olympic Games that are happening right now.
What Do We Know Before the Tournament Starts?
Before the tournament starts, we know almost everything about it — except the most important thing: who will win. Let’s briefly state exactly what we already know:
- The dates the tournament starts and ends
- The locations where the matches will take place
- The exact times the matches will start
- Which teams have qualified for the tournament
- The players on each of these teams
- Each player’s past performance and their current form
What Match Details Do We Want to Store?
Tournaments consist of multiple matches. Before we store any match details we need to:
- Relate each match with the tournament
- Record the tournament stage when the match was played (e.g. group stage, semi-finals)
We also need to store details for single matches, including:
- The teams involved in the match
- Starting lineups and substitutions
- Match events (in football these are: goal, penalty, foul, yellow card, etc.)
- Final score
- Players’ actions during the match
We’ll use this data to capture all important match events. Comparing a player’s performance before and during the match could lead to certain conclusions. Maybe we wouldn’t be able to predict the final results of their performance (i.e. a win or a loss), but statistics certainly could help us make assumptions with a degree of reliability.
Introducing the Model
The model is divided into four main areas:
Tournament details
Match details
Events
Indicators and Performance
The tables outside these areas are dictionaries (sport
, phase
, position
), catalogues (sport_event
, team
, player
) and a single many-to-many relation (plays
).
We’ll describe the uncategorized tables first, and then take a close look at each area.
The Uncategorized Tables
These tables are important because tables from all four areas use them as dictionaries or catalogues.
The sport
table lists all the sports we’ll store in our database. We’ll probably have only one sport here, men’s football, but this table gives us the flexibility to add similar sports (e.g. women’s football) if needed.
In the sport_event
table, we’ll store the events connected with our sport(s). One example would be the “2016 Olympic Games”.
The phase
table is a dictionary that holds all possible tournament stages. It contains values like “group stage”, “round of 16”, “quarter-finals”, “semi-finals”, “final”.
The team
table is, as you’d guess, a simple list of all teams. Possible values are “Croatia”, “Poland”, “USA” etc. If we use the database to store information about club or league competition, we would also have values like “Barcelona”, “Real Madrid”, “Bayern”, “Manchester United” etc.
In the player
table, we’ll store records for all players belonging to the relevant teams.
The plays
table is our sole many-to-many relation, and it relates players and teams. A player can belong to more than one team at the same time ( e.g. the national team and a club), but during a tournament they will obviously play for only one team.
Finally, we have the position
table. This simple dictionary will store a list of all the required positions. In football, these include goalkeeper, center-half, striker, etc.
Tournament Details
Note: If you just want to store the results of single matches, you don’t need to use this section.
A tournament consists of more than one match; both UEFA Euro 2016 and the football events in the 2016 Summer Olympics are tournaments. As we said before, we can store a single match in our database, but we can also relate matches to their relevant tournaments. The tables in the Tournament section are:
tournament
– This contains all the basic tournament data: the sport, start date, end date, etc. We also need to store the tournament name and a description of where it’s taking place. Thesport_event_id
attribute is optional because a tournament doesn’t have to be associated with a larger event (such as the Olympics).group
– This lists all the groups in that tournament. UEFA Euro 2016 had six groups, A to F.participant
– These are the teams playing in the tournament; each participant can be assigned to a group. Most tournaments begin with a group stage and then continue to a knockout stage (e.g. UEFA Euro, UEFA World Cup, Olympic football). Some tournaments will have only a group stage (e.g. national leagues), while others will have only a knockout stage (e.g. national cups).in_team
– This table provides a many-to-many relation that stores information about the players registered for that tournament and their expected positions.tournament_schedule
– In my opinion, this is the most interesting table in this section. The list of all games played during this tournament is stored here. Thetournament_id
attribute denotes which tournament each match belongs to, and thephase_id
attribute defines the phase during which the match will take place. We’ll also store the match location and the time when it begins. Both participants will be described by text fields. When the group stage finishes, we’ll know all the matchups for the elimination round. For example, at the beginning of UEFA Euro 2016, we knew that winner of Group E (1E) will play against the Group D runner-up (2D). After all three rounds in the group phase were played, this pair was Italy vs. Spain.
Match Details
The Match details
area is used to store data for single matches. We’ll use two tables:
match
– This contains all the details about a single match; this match can be related to a tournament, but it could also be a single game. So thetournament_schedule_id
attribute is optional, and we’ll store thesport_id
,start_time
andlocation
attributes again here. If the match is part of a tournament, thentournament_schedule_id
will be assigned a value. Theteam_1_id
andteam_2_id
attributes are references to the teams involved in the match. Thegoals_team_1
andgoals_team_2
attributes contain the result of the match. They are mandatory and should have “0” as the default value for both.in_match
– This table is a list of all the players who are registered for that match; players who don’t participate will have a NULL in thestarted_at
attribute, while players that came in as substitutions will havestarted_at
> 0. If a player was replaced, they’ll have anended_at
attribute that matches thestarted_at
attribute of the player who replaced them. If the player stayed in for the entire match, theirended_at
attribute will have the same value as theend_time
attribute.
Match Events
This section is intended to store all the details or events that happened during the game. And the tables are:
event
– This is a dictionary that lists all the events we want to store. In football, these are values like “foul committed”, “foul suffered”, “yellow card”, “red card”, “free kick”, “penalty”, “goal”, “offside”, “substitution”, “player ejected from match”.match_event
– This relates events with the match. We’ll store theevent_time
as well as player information related to that event (in_match_id
).related_event
– This is what brings event information together. To explain, let’s look at an example when Player A fouls Player B. We’ll insert a record in thematch_event
table that indicates that Player A committed a foul and another one that indicates that Player B suffered a foul. We’ll also add a record to therelated_event
table, where the ‘committed foul’ will be the parent and the ‘suffered foul’ will be the child. We’ll also record the results of the foul: a yellow card, a free kick or a penalty kick, and maybe a goal.
Indicators and Performance
This section should help us analyze players and teams before and after the match.
The indicator
table is a dictionary with a predefined set of indicators for each player before each match. These indicators should describe the player’s current form. This list could contain values like: “number of goals in last 10 matches”, “average distance covered in last 10 matches”, “number of saves for GK in last 10 matches”.
The performance
dictionary is very similar to indicator
, but we’ll use it to store only values that are related to the single match: “distance covered”, “accurate passes”, etc.
The player_indicator
and performance_indicator
tables share an almost identical structure:
in_match_id
– refers to the player taking part in a certain matchindicator_id
/performance_id
– references theindicator
or ”performance dictionariesvalue
– stores the value for that indicator (e.g. a player covered 10.72 km distance)description
– holds an additional description, if neededWhat Happened During the Match?
With all this data entered in, we could easily get match details, events, and statistics for every match in our database.
This simple query would return basic details for an upcoming match:
SELECT team_1.`team_name`, team_2.`team_name`, `match`.`start_time`, `match`.`location` FROM `match`, `team` AS team_1, `team` AS team_2 WHERE `match`.`team_1_id` = team_1.`id` AND `match`.`team_2_id` = team_2.`id`
To get a list of all the in-play events during a certain match, we would use the query below:
SELECT `event`.`event_name`, `match_event`.`event_time`, `player`.`first_name`, `player`.`last_name` FROM `match`, `match_event`, `event`, `in_match`, `player` WHERE `match_event`.`match_id` = `match`.`id` AND `event`.`id` = `match_event`.`event_id` AND `in_match`.`id` = `match_event`.`in_match_id` AND `player`.`id` = `in_match`.`player_id` AND `match`.`id` = @match ORDER BY `match_event`.`event_time` ASC
There are numerous additional queries that I can think of; it’s easy to do an analysis when you have the data. If you’ve measured and stored a large number of indicators and player performance data, you might be able to relate these parameters with a final result. I personally don’t believe in such predictions; there’s the luck factor during matches, plus numerous other factors you can’t know until the game starts. Still, if you have a large dataset and a lot of parameters, your chance of making more accurate predictions increases.
The model presented in this article allows us to store matches, match details, and a history of each player’s performance. We can also set form indicators for each player before the match. Storing enough details should provide us with more parameters on which to base our assumptions. I’m not saying that we could predict the result of the game, but we could have some fun with it.
We could also easily tweak this model to store data for other sports. These changes shouldn’t be too complex. Adding a
sport_id
attribute to the dictionaries should do the trick. Still, I think it would be wise to have a new instance for each different sport.