Brief background

If you ever checked out FiveThirtyEight’s predictions you probably came across their usage of an Elo rating system. While the wikipedia page is really the best starting point to learn about the background, in short, its a rating system for teams/players (originally designed for chess) based on head-to-head match-ups.

NFL 2019 data

We’re going to walkthrough the Elo rating calculation using data from the current 2019 NFL season. First, we’ll read in the data available on the workshop website that was accessed using nflscrapR:

> nfl_games_19 <- read_csv("http://www.stat.cmu.edu/cmsac/football/data/nfl_games_2019.csv") 
Parsed with column specification:
cols(
  game_id = col_double(),
  home_team = col_character(),
  away_team = col_character(),
  week = col_double(),
  season = col_double(),
  home_score = col_double(),
  away_score = col_double()
)

Let’s take a look at the format of this data:

> head(nfl_games_19)
# A tibble: 6 x 7
     game_id home_team away_team  week season home_score away_score
       <dbl> <chr>     <chr>     <dbl>  <dbl>      <dbl>      <dbl>
1 2019090500 CHI       GB            1   2019          3         10
2 2019090800 CAR       LA            1   2019         27         30
3 2019090806 PHI       WAS           1   2019         32         27
4 2019090805 NYJ       BUF           1   2019         16         17
5 2019090804 MIN       ATL           1   2019         28         12
6 2019090803 MIA       BAL           1   2019         10         59

We’ll easily be able to use this data for generating Elo ratings over the course of the NFL season. The first step we need to take is create a column denoting whether the home team won (1), tied (0.5), or lost (0) with mutate and case_when:

> nfl_games_19 <- nfl_games_19 %>%
+   mutate(game_outcome = case_when(
+     home_score > away_score ~ 1,
+     home_score == away_score ~ 0.5,
+     TRUE ~ 0
+   ))

Elo rating basics

Since we want to update ratings throughout the entire course of the season, we’re going to need to keep track of each team’s rating in a separate table. Plus, we need initial ratings for each team! We could proceed to use the same value for every team (typically 1500) to start. But instead we’re going to use the initial ratings from [FiveThirtyEight that are publicly available] and already saved on the workshop website:

> nfl_elo_ratings <-  read_csv("http://www.stat.cmu.edu/cmsac/football/data/nfl_538_init_elo_ratings.csv") 
Parsed with column specification:
cols(
  team = col_character(),
  elo_rating = col_double(),
  week = col_double()
)
> nfl_elo_ratings
# A tibble: 32 x 3
   team  elo_rating  week
   <chr>      <dbl> <dbl>
 1 CHI        1589.     0
 2 PHI        1582.     0
 3 CAR        1519.     0
 4 NYJ        1385.     0
 5 MIN        1538.     0
 6 CLE        1456.     0
 7 MIA        1415.     0
 8 JAX        1455.     0
 9 LAC        1586.     0
10 SEA        1565.     0
# … with 22 more rows

We have a single rating for each team, along with a column for the week. We’re going to be updating this table incrementally for each match-up in nfl_games_19.

We’re going to use the most basic version of Elo ratings covered in wikipedia. Let the rating for the home team be \(R_{home}\), and the away team rating be \(R_{away}\). Then the expected score for the home team is: \[ E_{home} = \frac{1}{1 + 10^{(R_{away} - R_{home}) / 400}} \] and similarly for the away team it is: \[ E_{away} = \frac{1}{1 + 10^{(R_{home} - R_{away}) / 400}} \] The 400 and 10 basically determine the scaling of the ratings and can be modified. These expected scores represent the probability of winning plus half the probability of drawing - but for our purposes, basically the probability of winning.

We then update the ratings for the home team if they scored \(S_{home}\) points: \[ R^{new}_{home} = R_{home} + K \cdot (S_{home} - E_{home}) \] where \(K\) is the update factor. For now we we’ll set this to 20, but this is the maximum number of points a team gains from winning a single game.

To simplify this process, we’re going to create functions to calculate both the expected score and new rating for a team:

> calc_expected_score <- function(team_rating, opp_team_rating) {
+   return(1 / (1 + 10^((opp_team_rating - team_rating) / 400)))
+ }
> 
> calc_new_rating <- function(team_rating, observed_score, expected_score,
+                             k_factor = 20) {
+   return(team_rating + k_factor * (observed_score - expected_score))
+ }

As an example calculation, in week one the Steelers lost to the Patriots 33-3. The Steelers initial rating was:

> init_pit_elo <- nfl_elo_ratings %>%
+   filter(team == "PIT") %>%
+   pull(elo_rating)
> init_pit_elo
[1] 1572.193

and the Patriots were:

> init_ne_elo <- nfl_elo_ratings %>%
+   filter(team == "NE") %>%
+   pull(elo_rating)
> init_ne_elo
[1] 1640.856

Given these ratings, the Steelers expected score was:

> calc_expected_score(init_pit_elo, init_ne_elo)
[1] 0.4024523

And their updated rating following the loss?

> calc_new_rating(init_pit_elo, 0, 
+                 calc_expected_score(init_pit_elo, init_ne_elo))
[1] 1564.144

Elo ratings for 2019 season

Now with the basics, let’s move on to perform these calculations over the entire season, updating our table to include each team’s Elo rating following every game. Basically, you can imagine a for loop to go through each game in nfl_games_19, looking up each team’s previous ratings and performing the above calculations.

> for (game_i in 1:nrow(nfl_games_19)) {
+   
+   # Which teams are we looking at?
+   home_team <- nfl_games_19$home_team[game_i]
+   away_team <- nfl_games_19$away_team[game_i]
+   # What was the observed score by the home team?
+   home_score <- nfl_games_19$game_outcome[game_i]
+   # Week number?
+   game_week <- nfl_games_19$week[game_i]
+   
+   # What was each team's rating from their latest game?
+   home_rating <- nfl_elo_ratings %>%
+     filter(team == home_team) %>%
+     arrange(desc(week)) %>%
+     slice(1) %>%
+     pull(elo_rating)
+   away_rating <- nfl_elo_ratings %>%
+     filter(team == away_team) %>%
+     arrange(desc(week)) %>%
+     slice(1) %>%
+     pull(elo_rating)
+   
+   # Now get their new ratings:
+   new_home_rating <- calc_new_rating(home_rating, home_score, 
+                                      calc_expected_score(home_rating, away_rating))
+   # Opposite for away team:
+   new_away_rating <- calc_new_rating(away_rating, 1 - home_score, 
+                                      calc_expected_score(away_rating, home_rating))
+   
+   # Finally - join to the nfl_elo_ratings table each team's new ratings for the week:
+   updated_ratings <- tibble(team = c(home_team, away_team),
+                             elo_rating = c(new_home_rating, new_away_rating),
+                             week = rep(game_week, 2))
+   nfl_elo_ratings <- nfl_elo_ratings %>%
+     bind_rows(updated_ratings)
+   
+ }

It worked! What do our final ratings look like?

> nfl_elo_ratings %>% 
+   filter(week == 8) %>%
+   arrange(desc(elo_rating))
# A tibble: 30 x 3
   team  elo_rating  week
   <chr>      <dbl> <dbl>
 1 NE         1677.     8
 2 NO         1649.     8
 3 KC         1597.     8
 4 LA         1594.     8
 5 SEA        1591.     8
 6 IND        1583.     8
 7 MIN        1566.     8
 8 PHI        1562.     8
 9 CHI        1562.     8
10 PIT        1554.     8
# … with 20 more rows

Let’s plot the ratings over the season:

> nfl_elo_ratings %>%
+   ggplot(aes(x = week, y = elo_rating, color = team)) +
+   geom_line() +
+   theme_bw() +
+   labs(x = "Week", y = "Elo rating",
+        title = "NFL Elo ratings in 2019 season")

There are way too many colors displayed here! Instead one could take advantage of the teamcolors package by Ben Baumer and Gregory Matthews to highlight individual teams. This is a little more involved, while we won’t walk through this code in the workshop, here is how one could highlight each division:

> # First read in the team colors data from the website:
> nfl_team_colors <- read_csv("http://www.stat.cmu.edu/cmsac/football/data/nfl_team_colors.csv")
> nfl_team_colors <- nfl_team_colors %>%
+   filter(abbr %in% unique(nfl_elo_ratings$team)) %>%
+   mutate(primary = ifelse(abbr %in% c("OAK", "PIT", "SEA", "TEN",
+                                          "JAX", "NE", "ATL"), 
+                           secondary, primary))
> 
> # Create a dataset that has each team's initial Elo rating
> nfl_team_start <- nfl_elo_ratings %>%
+   filter(week == 0) %>%
+   inner_join(nfl_team_colors, by = c("team" = "abbr")) %>%
+   arrange(desc(elo_rating))
> 
> # Need ggrepel:
> library(ggrepel)
> 
> division_plots <- lapply(sort(unique(nfl_team_start$division)),
+                          function(nfl_division) {
+                            
+                            # Pull out the teams in the division
+                            division_teams <- nfl_team_start %>%
+                              filter(division == nfl_division) %>%
+                              mutate(team = fct_reorder(team, desc(elo_rating)))
+                            
+                            # Get the Elo ratings data just for these teams:
+                            division_data <- nfl_elo_ratings %>%
+                              filter(team %in% division_teams$team) %>%
+                              mutate(team = factor(team,
+                                                   levels = levels(division_teams$team))) %>%
+                              # Make text labels for them:
+                              mutate(team_label = if_else(week == min(week),
+                                                          as.character(team), 
+                                                          NA_character_))
+ 
+                            # Now make the full plot
+                            nfl_elo_ratings %>%
+                              # Plot all of the other teams as gray lines:
+                              filter(!(team %in% division_teams$team)) %>%
+                              ggplot(aes(x = week, y = elo_rating, group = team)) +
+                              geom_line(color = "gray", alpha = 0.5) +
+                              # But display the division teams with their colors:
+                              geom_line(data = division_data,
+                                        aes(x = week, y = elo_rating, group = team,
+                                            color = team)) +
+                              geom_label_repel(data = division_data,
+                                               aes(label = team_label,
+                                                   color = team), nudge_x = 1, na.rm = TRUE,
+                                               direction = "y") +
+                              scale_color_manual(values = division_teams$primary, guide = FALSE) +
+                              scale_x_continuous(limits = c(0, 8),
+                                                 breaks = c(1:8)) +
+                              theme_bw() +
+                              labs(x = "Week", y = "Elo rating",
+                                   title = paste0("Division: ", nfl_division)) 
+                          })
> # Display the grid of plots with cowplot!
> library(cowplot)
> plot_grid(plotlist = division_plots, ncol = 4, align = "hv")

NFL 2019 Elo ratings by division

More resources

Naturally there’s an R package: elo
Complete history of FiveThirtyEight NFL Elo ratings: https://projects.fivethirtyeight.com/complete-history-of-the-nfl/
More math: https://blog.mackie.io/the-elo-algorithm
Soccer example: https://edomt.github.io/Elo-R-WorldCup/
Can incorporate margin of victory (eg https://fivethirtyeight.com/features/how-we-calculate-nba-elo-ratings/)
More generally see Bradley-Terry models: https://cran.r-project.org/web/packages/BradleyTerryScalable/vignettes/BradleyTerryScalable.html
Also see the Glicko rating system by the GOAT Mark Glickman
Tennis Elo ratings from the Stephanie Kovalchik’s awesome website: http://on-the-t.com/2017/11/18/elo-serve-return/ and http://on-the-t.com/2019/07/19/updating-player-ratings/ (and more on the site!)

Introduction to Elo Ratings

Brief background

NFL 2019 data

Elo rating basics

Elo ratings for 2019 season

More resources