Advanced topics

class: center, middle, inverse, title-slide

# Advanced topics
## Multinomial logistic regression and multilevel models
### July 20th, 2022

---

## Example: NFL Expected Points

What does football __play-by-play__ data look like? Each row is a play with contextual information:

- __Possession team:__ team with the ball, on offense (opposing team is on defense)

- __Down:__ 4 downs to advance the ball 10 (or more) yards

- New set of downs, else turnover to defense
  
- __Yards to go:__ distance in yards to advance

- __Yard line:__ distance in yards away from opponent's endzone (100 to 0) - the field position

- __Time remaining:__ seconds remaining in game, each game is 3600 seconds long

- 4 quarters, halftime in between, followed by a potential overtime (900 seconds)

---

## Example: NFL Expected Points

__Drive:__ a series of plays, changes with possession and the types of scoring events:

- __No Score:__ 0 points - turnover the ball or half/game ends

- __Field Goal:__ 3 points - kick through opponent's goal post

- __Touchdown:__ 7 points - enter opponent's end zone

- __Safety:__ 2 points for opponent - tackled in own endzone

__Next Score:__ type of next score (current drive or future drives) with respect to possession team

- For: Touchdown (7), Field Goal (3), Safety (2)

- Against: -Touchdown (-7), -Field Goal (-3), -Safety (-2)

- No Score

_Note: treating point-after-touchdown attempts (PATs) separately_

---

## Example: NFL Expected Points

__Expected Points:__ Measure the value of play in terms of `$\mathbb{E}[\text{points of next scoring play}]$`

- i.e., historically, how many points have teams scored when in similar situations?
  
--

__Response__: `$Y \in$` {Touchdown (7), Field Goal (3), Safety (2), No Score (0), -Safety (-2), -Field Goal (-3), -Touchdown (-7)}

__Explanatory variables__: `$\mathbf{X} =$` {down, yards to go, yard line, ...}

Want to __estimate the probabilities__ of each scoring event to compute expected points:

- Outcome probabilities: `$P(Y = y |  \mathbf{X})$`

- Expected Points `$= E(Y| \mathbf{X}) = \sum_{y \in Y} y \cdot P(Y=y|\mathbf{X})$`

_How do we model more than two categories???_

---

## Review: logistic regression

Response variable `$Y$` has two possible values: 1 or 0, we estimate the probability

$$
p(x) = P(Y = 1 | X = x)
$$

Assuming that we are dealing with two classes, the possible observed values for `$Y$` are 0 and 1, 
$$
Y \vert x \sim {\rm Binomial}(n=1,p=\mathbb{E}[Y\vert x]) = \text{Bernoulli}(p = \mathbb{E}[Y\vert x])
$$

To limit the regression betweewn `$[0, 1]$`: use the __logit__ function, aka the __log-odds ratio__

$$
\text{logit}(p(x)) = \log \left[ \frac{p(x)}{1 - p(x)} \right] = \beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p
$$

meaning

`$$p(x) = \frac{e^{\beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p}}{1 + e^{\beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p}}$$`

---

## [Multinomial logistic regression](https://en.wikipedia.org/wiki/Multinomial_logistic_regression)

We can extend this to `$K$` classes (via the [softmax function](https://en.wikipedia.org/wiki/Softmax_function)):

`$$P(Y=k^* \mid X=x)=\frac{e^{\beta_{0 k^*}+\beta_{1 k^*} x_{1}+\cdots+\beta_{p k^*} x_{p}}}{\sum_{k=1}^{K} e^{\beta_{0 k}+\beta_{1 k} x_{1}+\cdots+\beta_{p k} x_{p}}}$$`

We only estimate coefficients for `$K - 1$` classes __relative to reference class__

For example, let `$K$` be the reference then we use `$K - 1$` logit transformations
  
  - Use `$\boldsymbol{\beta}$` for vector of coefficients and `$\mathbf{X}$` for matrix of predictors
  
`$$\begin{array}{c}
\log \Big( \frac{P(Y =1 \mid \mathbf{X})}{P(Y=K \mid \mathbf{X})} \Big) = \boldsymbol{\beta}_{1} \cdot \mathbf{X}  \\
\log \Big( \frac{P(Y=2 \mid \mathbf{X})}{P(Y=K \mid \mathbf{X})} \Big) =\boldsymbol{\beta}_{2} \cdot \mathbf{X} \\
\log \Big( \frac{P(Y=K-1 \mid \mathbf{X})}{P(Y=K \mid \mathbf{X})} \Big) =\boldsymbol{\beta}_{K-1} \cdot \mathbf{X}
\end{array}$$`

---

## [Multinomial logistic regression](https://en.wikipedia.org/wiki/Multinomial_logistic_regression) for next score

`$Y \in$` {Touchdown (7), Field Goal (3), Safety (2), No Score (0), -Safety (-2), -Field Goal (-3), -Touchdown (-7)}

`$\mathbf{X} =$` {down, yards to go, yard line, ...}

Model is specified with __six logit transformations__ relative to __No Score__:

`$$\begin{array}{c}
\log \left(\frac{P(Y=\text { Touchdown } \mid \mathbf{X})}{P(Y=\text { No Score } \mid \mathbf{X})}\right)=\mathbf{X} \cdot \boldsymbol{\beta}_{\text {Touchdown }} \\
\log \left(\frac{P(Y=\text { Field Goal } \mid \mathbf{X})}{P(Y=\text { No Score } \mid \mathbf{X})}\right)=\mathbf{X} \cdot \boldsymbol{\beta}_{\text {Field Goal }}, \\
\vdots & \\ \log \left(\frac{P(Y=-\text { Touchdown } \mid \mathbf{X})}{P(Y=\text { No Score } \mid \mathbf{X})}\right)=\mathbf{X} \cdot \boldsymbol{\beta}_{-\text {Touchdown }},
\end{array}$$`

- Model is generating probabilities, agnostic of value associated with each next score type

- Fit multinomial logistic regression model in `R` with `nnet` package

---

### NFL play-by-play data (2010 to 2020)

Initialized NFL play-by-play dataset with next score in half for each play

- Followed steps in [script by Ben Baldwin](https://github.com/nflverse/nflfastR-data/blob/master/models/model_data.R) (which copies my steps [here](https://github.com/ryurko/nflscrapR-models/blob/master/R/init_models/init_ep_fg_models.R))

```r
library(tidyverse)
nfl_ep_model_data <- read_rds(url("http://www.stat.cmu.edu/cmsac/sure/2021/materials/data/model_pbp_data.rds"))

nfl_ep_model_data <- nfl_ep_model_data %>%
  mutate(Next_Score_Half = fct_relevel(Next_Score_Half, "No_Score"),
         # log transform of yards to go and indicator for two minute warning:
         log_ydstogo = log(ydstogo),
         # Changing down into a factor variable: 
         down = factor(down))
```

How to fit the model?

```r
init_ep_model <- multinom(Next_Score_Half ~ half_seconds_remaining + yardline_100 + down + log_ydstogo + log_ydstogo*down + yardline_100*down, 
                          data = nfl_ep_model_data, maxit = 300)
```

_What does the `summary()` function return?_

---

### Leave-one-season-out cross-validation

```r
library(nnet)
init_loso_cv_preds <- 
  map_dfr(unique(nfl_ep_model_data$season), 
          function(x) {
            # Separate test and training data:
            test_data <- nfl_ep_model_data %>% filter(season == x)
            train_data <- nfl_ep_model_data %>% filter(season != x)
            
            # Fit multinomial logistic regression model:
            ep_model <- 
              multinom(Next_Score_Half ~ half_seconds_remaining + yardline_100 + down + log_ydstogo + log_ydstogo*down + yardline_100*down, data = train_data, maxit = 300)
            
            # Return dataset of class probabilities:
*           predict(ep_model, newdata = test_data, type = "probs") %>%
              as_tibble() %>%
              mutate(Next_Score_Half = test_data$Next_Score_Half,
                     season = x)
              })
```

---

### Calibration results for each scoring event

```r
ep_cv_loso_calibration_results <- init_loso_cv_preds %>%
  pivot_longer(No_Score:Touchdown,
               names_to = "next_score_type",
               values_to = "pred_prob") %>%
  mutate(bin_pred_prob = round(pred_prob / 0.05) * .05) %>%
  group_by(next_score_type, bin_pred_prob) %>%
  summarize(n_plays = n(), 
            n_scoring_event = length(which(Next_Score_Half == next_score_type)),
            bin_actual_prob = n_scoring_event / n_plays,
*           bin_se = sqrt((bin_actual_prob * (1 - bin_actual_prob)) / n_plays)) %>%
  ungroup() %>%
* mutate(bin_upper = pmin(bin_actual_prob + 2 * bin_se, 1),
*        bin_lower = pmax(bin_actual_prob - 2 * bin_se, 0))
```

---

### Calibration results for each scoring event

```r
ep_cv_loso_calibration_results %>%
  mutate(next_score_type = fct_relevel(next_score_type, "Opp_Safety", "Opp_Field_Goal", 
                                       "Opp_Touchdown", "No_Score", "Safety", "Field_Goal", "Touchdown"),
  next_score_type = fct_recode(next_score_type, "-Field Goal (-3)" = "Opp_Field_Goal", "-Safety (-2)" = "Opp_Safety", "-Touchdown (-7)" = "Opp_Touchdown",
                               "Field Goal (3)" = "Field_Goal", "No Score (0)" = "No_Score",
                               "Touchdown (7)" = "Touchdown", "Safety (2)" = "Safety")) %>%
  ggplot(aes(x = bin_pred_prob, y = bin_actual_prob)) +
  geom_abline(slope = 1, intercept = 0, color = "black", linetype = "dashed") +
  geom_smooth(se = FALSE) + 
  geom_point(aes(size = n_plays)) +
  geom_errorbar(aes(ymin = bin_lower, ymax = bin_upper)) + #coord_equal() +   
  scale_x_continuous(limits = c(0,1)) + 
  scale_y_continuous(limits = c(0,1)) + 
  labs(size = "Number of plays", x = "Estimated next score probability", 
       y = "Observed next score probability") + 
  theme_bw() + 
  theme(strip.background = element_blank(), 
        axis.text.x = element_text(angle = 90), 
        legend.position = c(1, .05), legend.justification = c(1, 0)) +
  facet_wrap(~ next_score_type, ncol = 4)
```

---

### Calibration results for each scoring event

---

## How do we evaluate players?

__Expected points added (EPA)__: change in expected points between plays

Goal: divide credit between players involved in a play, i.e. who deserves what portion of EPA?

Load dataset of 2021 passing plays:

```r
nfl_passing_plays <- 
  read_csv("http://www.stat.cmu.edu/cmsac/sure/2022/materials/data/sports/eda_projects/nfl_passing_plays_2021.csv") %>%
  # Only keep rows with passer and receiver information known:
  filter(!is.na(passer_player_id), !is.na(receiver_player_id), !is.na(epa)) %>%
  # Combine passer and receiver unique IDs:
  mutate(passer_name_id = paste0(passer_player_name, ":", passer_player_id),
         receiver_name_id = paste0(receiver_player_name, ":", receiver_player_id))
```

Data displays __group structure__ and __different levels of variation within groups__

- e.g., quarterbacks have more passing attempts than receivers have targets

Every play is a __repeated measure of performance__

- i.e., the plays (observations) are NOT independent

---

### Mixed-effects / random-effects / multilevel / hierarchical models

Example of a __varying-intercept__ model:

`$$EPA_{i} \sim Normal(Q_{q[i]} + C_{c[i]} + X_i \cdot \beta,\ \sigma_{EPA}^2), \text{ for}\ i\ =\ 1,\dots,n \text{  plays}$$`

__Groups are given a model__ - treating the levels of groups as similar to one another with __partial pooling__

`$$Q_q \sim Normal(\mu_{Q},\ \sigma_{Q}^2),\text{ for } q = 1,\dots, \text{ number of QBs}, \\
C_c \sim Normal(\mu_{C},\ \sigma_{C}^2),\text{ for } c = 1,\dots, \text{ number of receivers}$$`

Each individual estimate (e.g., `$Q_q$`) is pulled toward it's group mean (e.g., `$\mu_Q$`)

- i.e., QBs and receivers involved in fewer plays will be pulled closer to their overall group averages as compared to those involved in more plays

- serves as a form of __regularization__ of coefficient estimates

- `$Q_q$` and `$C_c$` are __random effects__, while `$\beta$` are __fixed effects__

- but these are confusing terms that [no one agrees on](https://statmodeling.stat.columbia.edu/2005/01/25/why_i_dont_use/)

---

### Fitting multilevel models with [`lme4`](https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf)

Include variables as usual but now introduce new term for varying intercepts: `(1|GROUP)`

```r
library(lme4)
*passing_lmer <- lmer(epa ~ shotgun + air_yards + (1|passer_name_id) + (1|receiver_name_id),
                     data = nfl_passing_plays)
summary(passing_lmer)
```

```
## Linear mixed model fit by REML ['lmerMod']
## Formula: epa ~ shotgun + air_yards + (1 | passer_name_id) + (1 | receiver_name_id)
##    Data: nfl_passing_plays
## 
## REML criterion at convergence: 66882.4
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -7.8846 -0.5427 -0.0490  0.5632  5.3101 
## 
## Random effects:
##  Groups           Name        Variance Std.Dev.
##  receiver_name_id (Intercept) 0.005385 0.07338 
##  passer_name_id   (Intercept) 0.008804 0.09383 
##  Residual                     2.365379 1.53798 
## Number of obs: 18056, groups:  receiver_name_id, 538; passer_name_id, 115
## 
## Fixed effects:
##              Estimate Std. Error t value
## (Intercept)  0.132989   0.032591   4.080
## shotgun     -0.132724   0.030609  -4.336
## air_yards    0.017024   0.001146  14.853
## 
## Correlation of Fixed Effects:
##           (Intr) shotgn
## shotgun   -0.782       
## air_yards -0.288  0.024
```

---

### Variance partition coefficients and intraclass correlations

We partition the variance in the response between the groupings in the data

Want to know the proportion of variance attributable to __variation within groups__ compared to __between groups__

Can compute the variance partition coefficient (VPC) or intraclass correlation (ICC):

`$$\hat{\rho}_Q = \frac{\text{Between QB variability} }{ \text{Total variability}} = \frac{\hat{\sigma}^2_Q}{\hat{\sigma}^2_Q + \hat{\sigma}^2_C + \hat{\sigma}^2_{EPA}}$$`

- Closer to 0: responses are more independent, the multilevel model structure is not as relevant

- Closer to 1: repeated observations provide no new information, multilevel group structure is important

```r
VarCorr(passing_lmer) %>% as_tibble() %>% mutate(icc = vcov / sum(vcov)) %>% dplyr::select(grp, icc)
```

```
## # A tibble: 3 x 2
##   grp                  icc
##   <chr>              <dbl>
## 1 receiver_name_id 0.00226
## 2 passer_name_id   0.00370
## 3 Residual         0.994
```

---

### Exploring the player-level effects using [`merTools`](https://cran.r-project.org/web/packages/merTools/vignettes/merToolsIntro.html)

Compare random effects with uncertainty via [parametric bootstrapping](https://cran.r-project.org/web/packages/merTools/vignettes/Using_predictInterval.html)

```r
library(merTools)
player_effects <- REsim(passing_lmer)
plotREsim(player_effects)
```

---

### Best and worst players? (by effects)

.pull-left[

```r
player_effects %>%
  as_tibble() %>%
  group_by(groupFctr) %>%
  arrange(desc(mean)) %>%
  slice(1:5, (n() - 4):n()) %>%
  ggplot(aes(x = reorder(groupID, mean))) +
  geom_point(aes(y = mean)) +
  geom_errorbar(aes(ymin = mean - 2 * sd,
                    ymax = mean + 2 * sd)) +
  facet_wrap(~groupFctr, ncol = 1, scales = "free_y") +
  geom_vline(xintercept = 0, linetype = "dashed",
             color = "red") +
  coord_flip() +
  theme_bw()
```

]
.pull-right[
<img src="21-Multinomial-multilevel_files/figure-html/unnamed-chunk-8-1.png" width="504" />

]