Introduction

Project Motivation: Investigate the impact of pandemic restrictions on hockey player development.

During 2020-2021, many hockey leagues (including NHL feeder leagues) had shortened seasons or no season due to the restrictions on play caused by the COVID-19 pandemic. Some players experiencing restrictions played in other leagues or tournaments during the 2020-21 season. Others did not play any league/tournament games during the 2020-21 season. This poses the question of whether not playing games during the 2020-2021 COVID season negatively impacted player development (or caused players to get worse).

To answer this question, we will examine data from the Ontario Hockey League, which did not play any games during the 2020-21 season. Some players from this league played in other leagues or tournaments while others did not.

Data

Our dataset is comprised of players who played in the Ontario Hockey League (OHL) during both the pre-COVID (2019-2020) and post-COVID (2021-2022) seasons. The dataset also includes information regarding other leagues they have played in during their career. There is information regarding season, team, league, points, games played, position, and drafted status. Each row is a player on a certain team in a specific season.

The data was was sourced from Elite Prospects and was supplied by our external advisor, Dr. Sam Ventura.

Snippet of Raw Data, Player Example:

ohl %>% 
  dplyr::select(name, team_name, season, league, position, 
                gp, g, a, pts, pm) %>%
  filter(name == "Shane Wright") %>%
  arrange(desc(name)) %>% 
  knitr::kable() %>% 
  kable_styling("striped")
name team_name season league position gp g a pts pm
Shane Wright Kingston Frontenacs 2019-2020 OHL F 58 39 27 66 -6
Shane Wright Kingston Frontenacs 2021-2022 OHL F 63 32 62 94 23
Shane Wright Canada Black U17 2019-2020 WHC-17 F 5 4 3 7 NA
Shane Wright Canada U18 2020-2021 WJC-18 F 5 9 5 14 12
Shane Wright Canada U20 2021-2022 WJC-20 Cancelled F 2 0 1 1 3

Wrangling

Variables added :

  • Player Performance : Approximated by points per game (PPG) in the post-COVID (2021-2022) season

    • If a player played for multiple teams during this season, PPG was averaged over both teams
  • GP : Games played (GP) in both pre-COVID season (combined if a player played for multiple teams in a season)

  • Treatment : Whether a player played at least one game during the COVID season

  • Age : The age of the player on January 1st, 2020

  • Player Quality : Approximated by player PPG in the pre-COVID season

  • Drafted : Whether a player was drafted in 2020 or earlier

  • Relative PM : Relative plus-minus (PM) is defined as a players PM relative to the average PM of their team (\(\text{Relative PM} = PM_{player} - \mu_{PM team}\))

  • Ranked PM : How a player’s PM ranks among those of their teammates

Snippet of Filtered Data:

ohl_filtered %>% 
  dplyr::select(first_name, last_name, position, drafted, ppg_19_20,
                gp_21_22, age_continuous, treatment, ppg_21_22) %>%
  filter(first_name %in% c("Shane","Brandt", "Logan", "Wyatt") & last_name %in% c("Wright", "Clarke", "Mailloux", "Johnston")) %>%
  arrange(desc(last_name)) %>%
  knitr::kable() %>% 
  kable_styling("striped")
first_name last_name position drafted ppg_19_20 gp_21_22 age_continuous treatment ppg_21_22
Shane Wright F FALSE 1.1379310 63 15.98904 Played 1.492063
Logan Mailloux D FALSE 0.0000000 12 16.71311 Played 0.750000
Wyatt Johnston F FALSE 0.5660377 68 16.63388 Played 1.823529
Brandt Clarke D FALSE 0.6666667 55 16.89315 Played 1.072727

Exploratory Data Analysis

Player Performance

# distribution of response
ggplot(ohl_filtered, aes(x = ppg_21_22)) +
  geom_density(color = "royalblue3") +
  labs(title = "Distribution of player performance in post-COVID season",
       x = "Player Performance (PPG Post-COVID Season)") +
  theme_bw()
Distribution of player performance is right-skewed

Distribution of player performance is right-skewed

Treatment vs Performance

# treatment vs ppg
ggplot(ohl_filtered, aes(x = ppg_21_22, color = treatment)) +
  geom_density() +
  labs(title = "Players who played during COVID season generally had higher PPG in\npost-COVID season",
       x = "Player Performance (PPG Post-COVID Season)",
       color = "COVID season") +
  scale_color_manual(values = c("royalblue3", "darkgoldenrod3")) +
  theme_bw()

Possible Confounding Variables for Treatment and Response

Age

# age vs ppg
ggplot(ohl_filtered, aes(x = age_continuous, y = ppg_21_22)) +
  geom_point(alpha = .5) +
  labs(title = "Weak, slightly positive relationship between player performance and age",
       x = "Age in Pre-COVID Season",
       y = "Player Performance (PPG Post-COVID Season)") +
  theme_bw() +
  geom_smooth(method = "lm") 

# age vs treatment
ggplot(ohl_filtered, aes(x = age_continuous, color = treatment)) +
  geom_density() +
  labs(title = "More 17- and 18-year-olds played during COVID season",
       x = "Age in Pre-COVID Season",
       color = "COVID Season") +
  scale_color_manual(values = c("royalblue3", "darkgoldenrod3")) +
  theme_bw() 

GP

# gp vs ppg
ggplot(ohl_filtered, aes(x = gp_21_22, y = ppg_21_22)) +
  geom_point(alpha = .5) +
  labs(title = "Positive linear relationship between player performance and GP in \npost-COVID season",
       x = "GP in Post-COVID Season", 
       y = "Player Performance (PPG Post-COVID Season)") +
  theme_bw() +
  geom_smooth(method = "lm")

ggplot(ohl_filtered, aes(x = gp_21_22, color = treatment)) +
  geom_density() +
  labs(title = "Skaters who played during COVID season played more games\nin post-COVID season",
       x = "GP in Post-COVID Season",
       color = "COVID Season") +
  scale_color_manual(values = c("royalblue3", "darkgoldenrod3")) +
  theme_bw() 

Drafted

# drafted vs ppg
ggplot(ohl_filtered) +
  geom_density(aes(x = ppg_21_22, color = drafted)) +
  labs(title = "Post-COVID season player performance generally greater for drafted players",
       x = "Player Performance (PPG in Post-COVID Season)",
       color = "Drafted") +
  scale_color_manual(values = c("royalblue3", "darkgoldenrod3")) +
  theme_bw()

# drafted status vs treatment
mosaicplot(table("Drafted" = ohl_filtered$drafted,
                 "COVID Season" = ohl_filtered$treatment),
           main = "Drafted players more likely to have played during COVID season",
           shade = TRUE)

Player Quality

# ppg 2019-20 vs ppg 21-22
ggplot(ohl_filtered, aes(x = ppg_19_20, y = ppg_21_22)) +
  geom_point(alpha = .5) +
  labs(title = "Positive linear relationship between post-COVID season player performance \nand player quality",
       x = "Player Quality (PPG in pre-COVID Season)",
       y = "Player Performance (PPG Post-COVID Season)") +
  geom_smooth(method = "lm") +
  theme_bw()

ggplot(ohl_filtered, aes(x = ppg_19_20, color = treatment)) +
  geom_density() +
  labs(title = "Skaters who played during COVID season were higher quality players",
       x = "Player Quality (PPG in Pre-COVID Season)",
       color = "COVID Season") +
  scale_color_manual(values = c("royalblue3", "darkgoldenrod3")) +
  theme_bw()

Position

# position vs ppg
ggplot(ohl_filtered) +
  geom_density(aes(x = ppg_21_22, color = position)) +
  labs(title = "Post-COVID season player performance generally greater for forwards",
       x = "Player Performance (PPG in Post-COVID Season)",
       color = "Position") +
  scale_color_manual(values = c("royalblue3", "darkgoldenrod3")) +
  theme_bw()

# position vs treatment
mosaicplot(table("Position" = ohl_filtered$position,
                 "COVID Season" = ohl_filtered$treatment),
           main = "Player position does not significantly influence probability\nof playing during COVID Season",
           shade = TRUE)

Methods

Regression

Before performing causal analysis, we wanted to determine whether playing during the COVID season had a significant effect on PPG when controlling for variables (and interactions between variables) suspected to be associated with the response through EDA. When possible, we attempted a few different measures of player quality/player performance, including PPG z-scores (to control for differences in overall scoring across seasons) and relative plus-minus (PM) scores (computes player’s PM relative to their team).

Ordinary Least Squares (OLS) : We fit OLS without interaction, because this model is the simplest and most interpretable model to assess the significance of playing during COVID while still controlling for variables that could be confounding our analysis.

Interaction OLS : We then fit OLS with interaction to control for the relationships we observed between explanatory variables in our EDA process. We also tested whether the additional complexity of this full model significantly increased our predictive power over the nested model.

Gamma : PPG is right skewed and bounded between 0 and some positive number, so we believed PPG may be Gamma-distributed. We performed Gamma regression with the log link function to see if this would more accurately model the relationship between our response and explanatory variables.

Mixed-effects Model : Lastly, we fit a mixed-effects model to determine if the effect of playing during the COVID season was significantly different across leagues. We let the slope and intercept of the regression line vary according to the league, with players who didn’t play during the COVID season all grouped into a league called “NONE”.

Causal Analysis

We attempted to estimate the causal impact of the treatment.

Matching: Matching on propensity scores allowed us to better control for confounding variables and estimate the true impact of the treatment. The results fit on matched data were compared to the results of the model fit on unmatched data. Due to the small sample size, we used a 1-to-1 ratio when matching.

BART: An appeal of BART is that it controls overfitting, which is one issue with normal additive regression trees. Each tree tries to account for something new in the model. Once all added, accurate estimates are produced. This allows for causal inferences to be drawn. All types of variable statistics were used in this model to allow for the most accurate results. In the end, the posterior draws data totaled 43,800 observations.

Results

The most appropriate regression model fit was OLS regression. The model explained approximately 59% of the variance in the data without sacrificing interpretability. Interaction OLS regression only marginally increased the variance explained and was not significantly different from OLS regression. Conditions for inference for both of these models were mostly met. Though Gamma regression fit the data well, the model conditions for inference were not met (the residuals were not gamma-distributed). Lastly, it was found that the effect of playing during COVID did not significantly differ across leagues in our mixed-effects model, so the extra complexity of the model was deemed unnecessary.

The results of the OLS regression model are as follows:

\(\widehat{\text{Player Performance}} = 0.901 + 0.0655*\text{Treatment} + 0.188*\text{Forward} + 0.862*\text{Player Quality} + 0.00559*\text{Games Played Post-COVID}\\- 0.0556*\text{Age}\)

ohl_filtered %>%
  mutate(pred_vals = predict(ohl_mlr_final)) %>%
  ggplot(aes(x = pred_vals,
             y = ppg_21_22)) +
  geom_point(alpha = 0.5) +
  geom_abline(slope = 1, intercept = 0,
              linetype = "dashed",
              color = "red",
              size = 2) +
  theme_bw() +
  labs(title = "Model fit: Actual PPG in post-COVID season vs predicted PPG in\npost-COVID season",
       x = "Predicted PPG in post-COVID season",
       y = "Actual PPG in Post-COVID Season")

  • Coefficients for position, games played in post-COVID season, and age were all found to be statistically significant from 0 at the \(\alpha = 0.05\) level.

  • The model explains approximately 59% of the variation in player performance in the post-COVID season, which is a statistically significant amount at the \(\alpha = .001\) level \(F(5, 213) = 62.78, p<.001\).

Propensity Score Matching

The first match we tried was using the nearest neighbor method. This approach does not optimize criterion, so we switched to optimal pair matching. The absolute within-pair difference of each covariate was smaller in the optimal match, hence better balanced.

opt_propensity_match <- 
  matchit(treatment ~ gp_19_20 + position + pts_19_20 + ppg_19_20 + 
            age_continuous + pm_rank_19_20, 
          data = ohl_filtered2, method = "optimal",
          distance = "gam",
          replace = FALSE, # do not reuse controls
          ratio = 1)
plot(opt_propensity_match, type = "jitter", interactive = FALSE)

The dots are aligned fairly well on this plot. They are similar in range and have minimal gaping.

plot(summary(opt_propensity_match))

The Love plot indicates that position, age, and plus-minus are well-balanced. Ideally, the all of the dots would be around the two lines. The points for distance, games played, points, and points per game are further left for the matched data than all of the data. The new model of matched data allows us to make more accurate inferences.

The following model was fit using only the matched data:

ppg_21_22 ~ position + ppg_19_20 + treatment + gp_21_22 + age_continuous + pts_19_20

opt_matched <- match.data(opt_propensity_match)

opt_match_lm <- lm(ppg_21_22 ~ position + ppg_19_20 + treatment + 
                      gp_21_22 + age_continuous,
                    data = opt_matched)
opt_matched %>%
  mutate(pred_vals_matched = predict(opt_match_lm)) %>%
  ggplot(aes(x = pred_vals_matched,
             y = ppg_21_22)) +
  geom_point(alpha = 0.5) +
  geom_abline(slope = 1, intercept = 0,
              linetype = "dashed",
              color = "red",
              size = 2) +
  theme_bw() +
  labs(title = "Matched model fit: Actual PPG in post-COVID season vs predicted PPG in\npost-COVID season",
       x = "Predicted PPG in post-COVID season",
       y = "Actual PPG in Post-COVID Season")

#summary(opt_match_lm)

\(\widehat{\text{Player Performance}} = 1.145 + 0.054*\text{Treatment} + 0.212*\text{Forward} + 0.852*\text{Player Quality} + 0.0037*\text{Games Played Post-COVID}\\- 0.0644*\text{Age}\)

  • Coefficients for position, games played in post-COVID season, and player quality were all found to be statistically significant from 0 at the \(\alpha = 0.05\) level.

  • The model explains approximately 58% of the variation in player performance in the post-COVID season, which is a statistically significant amount at the \(\alpha = .001\) level \(F(5, 128) = 37.53, p<.001\).

The linear model appears like a good fit for this data. It meets the majority of requirements for linearity. There are no large curves and the errors are evenly spread.

Distribution Plots of Individual Matched Variables

Distance
bal.plot(opt_propensity_match, var.name = "distance",
         colors = c("goldenrod1", "dodgerblue"))

Games Played
bal.plot(opt_propensity_match, var.name = "gp_19_20",
         colors = c("goldenrod1", "dodgerblue"))

Position
bal.plot(opt_propensity_match, var.name = "position",
         colors = c("goldenrod1", "dodgerblue"))

Points
bal.plot(opt_propensity_match, var.name = "pts_19_20",
         colors = c("goldenrod1", "dodgerblue"))

Points Per Game
bal.plot(opt_propensity_match, var.name = "ppg_19_20",
         colors = c("goldenrod1", "dodgerblue"))

Age
bal.plot(opt_propensity_match, var.name = "age_continuous",
         colors = c("goldenrod1", "dodgerblue"))

Plus-Minus
bal.plot(opt_propensity_match, var.name = "pm_rank_19_20",
         colors = c("goldenrod1", "dodgerblue"))

Fitting Bayesian Additive Regression Trees to Observe Treatment Effects

# fit variable selection model
# var_select_bart

# variable selection
covar_ranking <- covariate_importance(var_select_bart)
var_select <- covar_ranking %>% 
  filter(avg_inclusion >= quantile(avg_inclusion, 0.5)) %>% 
  pull(variable)

# fit a propensity score model
# prop_bart

# store propensity score in data
ohl_update$prop_score <- prop_bart$prob.train.mean

# fit the treatment effect model
# te_model

Treatment Effects

All Draws
# histogram of treatment effect (all draws)
posterior_treat_eff %>% 
  ggplot() +
  geom_histogram(aes(x = cte), bins = 50, color = "white") +
  geom_vline(xintercept = 0, color = "red", size = 1) +
  theme_bw() + ggtitle("Histogram of treatment effect (all draws)")

Mean For Each Subject
# histogram of treatment effect (mean for each subject)
posterior_treat_eff %>% summarise(cte_hat = mean(cte)) %>%
  ggplot() +
  geom_histogram(aes(x = cte_hat), bins = 60, colour = "white") + 
  geom_vline(xintercept = 0, color = "red", size = 1) +
  theme_bw() + 
  ggtitle("Histogram of treatment effect (mean for each subject)")

CIs of the CATEs
# posterior CIs of the CATEs
posterior_treat_eff %>% dplyr::select(-treatment) %>% point_interval() %>%
  arrange(cte) %>% mutate(.orow = 1:n()) %>% 
  ggplot() + 
  geom_interval(aes(x = .orow, y= cte, ymin = .lower, ymax = .upper)) +
  geom_point(aes(x = .orow, y = cte), shape = "circle open", alpha = 0.5) + 
  ylab("Median posterior CATE for each subject (95% CI)") +
  theme_bw() + coord_flip() + scale_colour_brewer() +
  theme(axis.title.y = element_blank(), 
        axis.text.y = element_blank(), 
        axis.ticks.y = element_blank(),
        legend.position = "none")

Propensity Scores

After selecting the linear regression model, we matched the data and refit the model to see if this affected the treatment impact. At alpha = 0.01, the position, age, and points variables were significant. There is no evidence that treatment is significant in the matched model.

BART

Both of the histograms showing cte are centered around 0. This indicates a lack of evidence of the treatment effect. In the graph for individual means there is a small group of players on the left, but the majority of observations are focused around 0. Because of the nature of the BART model, we were not able to isolate and individually investigate the outliers. The confidence intervals support the same conclusion. With such wide intervals, there is not sufficient support of a treatment effect.

Discussion

OLS Regression Model Interpretation

Player performance in the post-COVID season is expected to increase by .0655 PPG on average for skaters who played during COVID, when holding constant position, player quality, games played in the post-COVID season, points scored in the pre-COVID season, and continuous age.

Over the course of a 68 game season in the OHL, the treatment effect would result in only approximately 4 extra points!

In this model, the coefficient for whether someone played during the COVID-season was not statistically significantly different from zero. There is no evidence that playing during the COVID season impacted player performance in the post-COVID season for players who played in both the pre- and post-COVID seasons in the OHL.

We are 95% confident that the change in player performance in the post-COVID for players who played during the COVID season is between -0.0185 and 0.149 PPG, holding constant position, player quality, games played in the post-COVID season, points scored in the pre-COVID season, and continuous age.

This model can be used to predict player performance in the post-COVID season based on a variety of factors in the pre-, COVID, and post-COVID seasons among skaters who played in the OHL in both the pre- and post-COVID seasons.

Matched OLS Model Interpretation

Player performance in the post-COVID season is expected to increase by .054 PPG on average for skaters who played during COVID, when holding constant position, player quality, games played in the post-COVID season, points scored in the pre-COVID season, and continuous age.

Again, over the course of a 68 game season in the OHL, the treatment effect would result in only approximately 4 extra points!

In this model, the coefficient for whether someone played during the COVID-season was not statistically significantly different from zero. There is no evidence that playing during the COVID season impacted player performance in the post-COVID season for players who played in both the pre- and post-COVID seasons in the OHL.

We are 95% confident that the change in player performance in the post-COVID for players who played during the COVID season is between -0.0488 and 0.157 PPG, holding constant position, player quality, games played in the post-COVID season, points scored in the pre-COVID season, and continuous age.

BART

Both of the histograms showing cte are centered around 0. This indicates a lack of evidence of the treatment effect. In the graph for individual means there is a small group of players on the left, but the majority of observations are focused around 0. Because of the nature of the BART model, we were not able to isolate and individually investigate the outliers. The confidence intervals support the same conclusion. With such wide intervals, there is not sufficient support of a treatment effect.

Limitations

OLS Regression Model Limitations

The model conditions for inference were not all met.

  • Linearity: The relationship between our explanatory variables and post-COVID season PPG looks roughly linear. This condition is met.

  • Independence: All of the skaters are playing together in the same league and influencing each other’s player performance/PPG. Because we were using player level data, and not play level data, we could not attempt to account for this non-independence. Therefore this condition is not met.

  • Normality of Residuals: The residuals seem roughly normally distributed in the normal QQ plot. This condition is met.

  • Homogeneity of errors: There seems to be roughly constant variance in residuals across all levels of our explanatory variables. This condition is met.

plot(ohl_mlr_final, which = c(1, 2))

Player Performance

We defined player performance as PPG in the post-COVID season, but this may not adequately capture all aspects of a player’s performance.Though we control for position in our model, PPG will generally be higher for forwards than defensemen. defensive performance is not well-approximated by PPG, and this measure would bias offensive defensemen as the best-performing defensemen, even though defensive defensemen may be performing well by blocking a lot of shots, making successful breakout plays,and creating lots of turnovers. Good defensive forwards may be undervalued using this metric as well.

Player quality

We used PPG in the pre-COVID season as a proxy for player quality, even though this again does not account for all of the statistics that might measure the quality of a player. This has many of the same issues as player performance above.

Treatment Variable

Our treatment variable was defined as players who participated in at least one game during the COVID season in any league or tournament. Though we attempted to control for the quality of the league in our mixed effect model, we could have tried different groupings of leagues or implemented some league quality metric that would have better controlled for its effect. Additionally, the sample size of players in some leagues was small, so we could not get very reliable estimates of the treatment effect of that league.Future analysis could provide a more distinct view of the treatment variable in which both league quality and number of games played are thoroughly controlled for.

Future work

  • Future work could involve redefining/tweaking treatment and response variables (possibly with more detailed data) and refitting models to better estimate the effect of playing during the COVID season. A more nuanced measure that incorporates important defensive statistics like turnovers created, passes completed, shots blocked, etc. may better approximate player performance. In the future, we could study the effect of playing or not playing during the COVID season in other populations (NHL taxi squad, goalies, other minor leagues), in addition to studying the effect of taking time off due to injury/other circumstances on player performance long-term.

  • Additionally, adding information about what line somebody plays on could control for confounding in our model. Likely only the best players/first-liners were able to go to play in other leagues during the COVID season. If we could control for line status affecting who could go play during COVID, we could get a more accurate estimate of the treatment effect.

  • Another factor that could influence whether skaters played during the COVID season is drafted status prior to the COVID season. If a player was drafted prior to the OHL shutdown, their NHL team would have had more resources and power to convince a league to take that player on for the year. In this dataset, though we tried to control for this drafted effect, the number of players drafted was so small and the overlap between players who were drafted and players who were treated was so significant that we decided not to include draft status in our models.

  • In our dataset, we only studied players who played in the OHL both before and after the COVID season, but there could be a significant number of players who played during the COVID season (considered treated) and were so good that they moved on to better leagues like the AHL/NHL post-COVID season. This could be confounding our analysis, because if this were the case, then including these players would likely result in the treatment effect being significant.

  • We also did not utilize weighting in our model, which could have improved our estimate of the treatment effect. If we had weighted the model so players who played more games contributed more to the overall outcome of the model and significance of the treatment, we could better estimate the treatment effect. Player performance for players who only played a small number of games may not be as accurate as player performance for players with more games.

  • Another factor to investigate is the impact of our treatment on long-term development. Even if not playing during the COVID season negatively impacted the development of any population of players, would this impact last forever? Or is it the case that these players would catch up to their counterparts who played during the COVID season in the long-term? As time passes and more data is collected post-COVID, future work could focus on assessing the long-term impact of playing/not playing during the COVID season or not playing during other seasons.

  • We could also investigate the conditional average treatment effect (CATE) of playing during COVID more thoroughly. The time constraints and computational limits of our machines did not allow for a thorough analysis of CATE.

  • With additional play level data we could attempt to estimate the effect of player interaction and control for it in our model so the independence condition for inference is met.

Acknowledgments

Thank you to our external advisor Dr. Sam Ventura (and assistant Dominic Ventura) and our professor and advisor, Dr. Ron Yurko. We would also like to thank our TAs, Nick Kissel, Meg Ellingwood, Wanshan Li, Kenta Takatsu, and YJ Choe. Lastly, thank you to Professor Ben Baumer, Smith College and Dr. Ryne Vankrevelen, Elon University. We could not have completed this project without you all and we sincerely appreciate your help!