If you ever checked out FiveThirtyEight’s predictions you probably came across their usage of an Elo rating system. While the wikipedia page is really the best starting point to learn about the background, in short, its a rating system for teams/players (originally designed for chess) based on head-to-head match-ups.
We’re going to walkthrough the Elo rating calculation using data from the current 2019 NFL season. First, we’ll read in the data available on the workshop website that was accessed using nflscrapR
:
> nfl_games_19 <- read_csv("http://www.stat.cmu.edu/cmsac/football/data/nfl_games_2019.csv")
Parsed with column specification:
cols(
game_id = col_double(),
home_team = col_character(),
away_team = col_character(),
week = col_double(),
season = col_double(),
home_score = col_double(),
away_score = col_double()
)
Let’s take a look at the format of this data:
> head(nfl_games_19)
# A tibble: 6 x 7
game_id home_team away_team week season home_score away_score
<dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 2019090500 CHI GB 1 2019 3 10
2 2019090800 CAR LA 1 2019 27 30
3 2019090806 PHI WAS 1 2019 32 27
4 2019090805 NYJ BUF 1 2019 16 17
5 2019090804 MIN ATL 1 2019 28 12
6 2019090803 MIA BAL 1 2019 10 59
We’ll easily be able to use this data for generating Elo ratings over the course of the NFL season. The first step we need to take is create a column denoting whether the home team won (1), tied (0.5), or lost (0) with mutate
and case_when
:
Since we want to update ratings throughout the entire course of the season, we’re going to need to keep track of each team’s rating in a separate table. Plus, we need initial ratings for each team! We could proceed to use the same value for every team (typically 1500) to start. But instead we’re going to use the initial ratings from [FiveThirtyEight that are publicly available] and already saved on the workshop website:
> nfl_elo_ratings <- read_csv("http://www.stat.cmu.edu/cmsac/football/data/nfl_538_init_elo_ratings.csv")
Parsed with column specification:
cols(
team = col_character(),
elo_rating = col_double(),
week = col_double()
)
> nfl_elo_ratings
# A tibble: 32 x 3
team elo_rating week
<chr> <dbl> <dbl>
1 CHI 1589. 0
2 PHI 1582. 0
3 CAR 1519. 0
4 NYJ 1385. 0
5 MIN 1538. 0
6 CLE 1456. 0
7 MIA 1415. 0
8 JAX 1455. 0
9 LAC 1586. 0
10 SEA 1565. 0
# … with 22 more rows
We have a single rating for each team, along with a column for the week. We’re going to be updating this table incrementally for each match-up in nfl_games_19
.
We’re going to use the most basic version of Elo ratings covered in wikipedia. Let the rating for the home team be \(R_{home}\), and the away team rating be \(R_{away}\). Then the expected score for the home team is: \[ E_{home} = \frac{1}{1 + 10^{(R_{away} - R_{home}) / 400}} \] and similarly for the away team it is: \[ E_{away} = \frac{1}{1 + 10^{(R_{home} - R_{away}) / 400}} \] The 400 and 10 basically determine the scaling of the ratings and can be modified. These expected scores represent the probability of winning plus half the probability of drawing - but for our purposes, basically the probability of winning.
We then update the ratings for the home team if they scored \(S_{home}\) points: \[ R^{new}_{home} = R_{home} + K \cdot (S_{home} - E_{home}) \] where \(K\) is the update factor. For now we we’ll set this to 20, but this is the maximum number of points a team gains from winning a single game.
To simplify this process, we’re going to create functions to calculate both the expected score and new rating for a team:
> calc_expected_score <- function(team_rating, opp_team_rating) {
+ return(1 / (1 + 10^((opp_team_rating - team_rating) / 400)))
+ }
>
> calc_new_rating <- function(team_rating, observed_score, expected_score,
+ k_factor = 20) {
+ return(team_rating + k_factor * (observed_score - expected_score))
+ }
As an example calculation, in week one the Steelers lost to the Patriots 33-3. The Steelers initial rating was:
> init_pit_elo <- nfl_elo_ratings %>%
+ filter(team == "PIT") %>%
+ pull(elo_rating)
> init_pit_elo
[1] 1572.193
and the Patriots were:
> init_ne_elo <- nfl_elo_ratings %>%
+ filter(team == "NE") %>%
+ pull(elo_rating)
> init_ne_elo
[1] 1640.856
Given these ratings, the Steelers expected score was:
And their updated rating following the loss?
Now with the basics, let’s move on to perform these calculations over the entire season, updating our table to include each team’s Elo rating following every game. Basically, you can imagine a for
loop to go through each game in nfl_games_19
, looking up each team’s previous ratings and performing the above calculations.
> for (game_i in 1:nrow(nfl_games_19)) {
+
+ # Which teams are we looking at?
+ home_team <- nfl_games_19$home_team[game_i]
+ away_team <- nfl_games_19$away_team[game_i]
+ # What was the observed score by the home team?
+ home_score <- nfl_games_19$game_outcome[game_i]
+ # Week number?
+ game_week <- nfl_games_19$week[game_i]
+
+ # What was each team's rating from their latest game?
+ home_rating <- nfl_elo_ratings %>%
+ filter(team == home_team) %>%
+ arrange(desc(week)) %>%
+ slice(1) %>%
+ pull(elo_rating)
+ away_rating <- nfl_elo_ratings %>%
+ filter(team == away_team) %>%
+ arrange(desc(week)) %>%
+ slice(1) %>%
+ pull(elo_rating)
+
+ # Now get their new ratings:
+ new_home_rating <- calc_new_rating(home_rating, home_score,
+ calc_expected_score(home_rating, away_rating))
+ # Opposite for away team:
+ new_away_rating <- calc_new_rating(away_rating, 1 - home_score,
+ calc_expected_score(away_rating, home_rating))
+
+ # Finally - join to the nfl_elo_ratings table each team's new ratings for the week:
+ updated_ratings <- tibble(team = c(home_team, away_team),
+ elo_rating = c(new_home_rating, new_away_rating),
+ week = rep(game_week, 2))
+ nfl_elo_ratings <- nfl_elo_ratings %>%
+ bind_rows(updated_ratings)
+
+ }
It worked! What do our final ratings look like?
> nfl_elo_ratings %>%
+ filter(week == 8) %>%
+ arrange(desc(elo_rating))
# A tibble: 30 x 3
team elo_rating week
<chr> <dbl> <dbl>
1 NE 1677. 8
2 NO 1649. 8
3 KC 1597. 8
4 LA 1594. 8
5 SEA 1591. 8
6 IND 1583. 8
7 MIN 1566. 8
8 PHI 1562. 8
9 CHI 1562. 8
10 PIT 1554. 8
# … with 20 more rows
Let’s plot the ratings over the season:
> nfl_elo_ratings %>%
+ ggplot(aes(x = week, y = elo_rating, color = team)) +
+ geom_line() +
+ theme_bw() +
+ labs(x = "Week", y = "Elo rating",
+ title = "NFL Elo ratings in 2019 season")
There are way too many colors displayed here! Instead one could take advantage of the teamcolors
package by Ben Baumer and Gregory Matthews to highlight individual teams. This is a little more involved, while we won’t walk through this code in the workshop, here is how one could highlight each division:
> # First read in the team colors data from the website:
> nfl_team_colors <- read_csv("http://www.stat.cmu.edu/cmsac/football/data/nfl_team_colors.csv")
> nfl_team_colors <- nfl_team_colors %>%
+ filter(abbr %in% unique(nfl_elo_ratings$team)) %>%
+ mutate(primary = ifelse(abbr %in% c("OAK", "PIT", "SEA", "TEN",
+ "JAX", "NE", "ATL"),
+ secondary, primary))
>
> # Create a dataset that has each team's initial Elo rating
> nfl_team_start <- nfl_elo_ratings %>%
+ filter(week == 0) %>%
+ inner_join(nfl_team_colors, by = c("team" = "abbr")) %>%
+ arrange(desc(elo_rating))
>
> # Need ggrepel:
> library(ggrepel)
>
> division_plots <- lapply(sort(unique(nfl_team_start$division)),
+ function(nfl_division) {
+
+ # Pull out the teams in the division
+ division_teams <- nfl_team_start %>%
+ filter(division == nfl_division) %>%
+ mutate(team = fct_reorder(team, desc(elo_rating)))
+
+ # Get the Elo ratings data just for these teams:
+ division_data <- nfl_elo_ratings %>%
+ filter(team %in% division_teams$team) %>%
+ mutate(team = factor(team,
+ levels = levels(division_teams$team))) %>%
+ # Make text labels for them:
+ mutate(team_label = if_else(week == min(week),
+ as.character(team),
+ NA_character_))
+
+ # Now make the full plot
+ nfl_elo_ratings %>%
+ # Plot all of the other teams as gray lines:
+ filter(!(team %in% division_teams$team)) %>%
+ ggplot(aes(x = week, y = elo_rating, group = team)) +
+ geom_line(color = "gray", alpha = 0.5) +
+ # But display the division teams with their colors:
+ geom_line(data = division_data,
+ aes(x = week, y = elo_rating, group = team,
+ color = team)) +
+ geom_label_repel(data = division_data,
+ aes(label = team_label,
+ color = team), nudge_x = 1, na.rm = TRUE,
+ direction = "y") +
+ scale_color_manual(values = division_teams$primary, guide = FALSE) +
+ scale_x_continuous(limits = c(0, 8),
+ breaks = c(1:8)) +
+ theme_bw() +
+ labs(x = "Week", y = "Elo rating",
+ title = paste0("Division: ", nfl_division))
+ })
> # Display the grid of plots with cowplot!
> library(cowplot)
> plot_grid(plotlist = division_plots, ncol = 4, align = "hv")
Naturally there’s an R
package: elo
Complete history of FiveThirtyEight NFL Elo ratings: https://projects.fivethirtyeight.com/complete-history-of-the-nfl/
More math: https://blog.mackie.io/the-elo-algorithm
Soccer example: https://edomt.github.io/Elo-R-WorldCup/
Can incorporate margin of victory (eg https://fivethirtyeight.com/features/how-we-calculate-nba-elo-ratings/)
More generally see Bradley-Terry models: https://cran.r-project.org/web/packages/BradleyTerryScalable/vignettes/BradleyTerryScalable.html
Also see the Glicko rating system by the GOAT Mark Glickman
Tennis Elo ratings from the Stephanie Kovalchik’s awesome website: http://on-the-t.com/2017/11/18/elo-serve-return/ and http://on-the-t.com/2019/07/19/updating-player-ratings/ (and more on the site!)