+ - 0:00:00
Notes for current slide
Notes for next slide

Deep Dive into Expected Goals Models

Sara Armstrong

Devin Basley

Friday, July 30

1

Motivation and Problem

  • Goals are arguably the most important statistic in hockey, because the team that has the most goals at the end of the game wins

  • Unfortunately, there aren't many goals scored in hockey so there is limited data, which means anything based on goals would have lots of variability

  • Shots could be used, but that's assuming all shots are equally valuable

  • An expected goals model estimates the probability of a goal based on location and other factors

2

Motivation and Problem

  • Goals are arguably the most important statistic in hockey, because the team that has the most goals at the end of the game wins

  • Unfortunately, there aren't many goals scored in hockey so there is limited data, which means anything based on goals would have lots of variability

  • Shots could be used, but that's assuming all shots are equally valuable

  • An expected goals model estimates the probability of a goal based on location and other factors

  • Analysts use these models to predict overall team and player performance

  • One issue with the public expected goals models is they don't factor in how shot location might change over time

3

Motivation and Problem

  • Goals are arguably the most important statistic in hockey, because the team that has the most goals at the end of the game wins

  • Unfortunately, there aren't many goals scored in hockey so there is limited data, which means anything based on goals would have lots of variability

  • Shots could be used, but that's assuming all shots are equally valuable

  • An expected goals model estimates the probability of a goal based on location and other factors

  • Analysts use these models to predict overall team and player performance

  • One issue with the public expected goals models is they don't factor in how shot location might change over time

Purpose

Search for best practices in expected goals through creating our own models

4

Data

  • Used NHL shots data from MoneyPuck from 2010-11 season to 2020-21 season

  • Each observation is a shot

    • Shot on goal, missed the net, goal
  • Only using shots during 5-on-5 play

  • Used player data to calculate each players age as of Sept. 15 of that season

Season Shooter Goalie Shooter Age Goalie Age Goal Distance Angle Home Skaters Away Skaters Rebound Rush
2019 Brady Tkachuk Frederik Andersen 19.96992 29.91114 1 4.123106 14.036243 5 5 0 0
2019 Nikita Zaitsev Frederik Andersen 27.84142 29.91114 0 74.000000 31.239215 5 5 0 0
2019 Dmytro Timashov Craig Anderson 22.92276 38.26658 0 27.000000 9.090277 5 5 0 0
5

Our Expected Goals Models

Began with a logistic regression modeling goals as a function of shot distance, shot angle, if the shot occurred on a rush, and if the shot was a rebound

Individual Season Model: ran the model separately for each season

Logistic Regression Model: ran the model over the data from all seasons

  • Looked to see if there was a trend in shot angle and distance coefficients over the seasons

Individual Season Model

  • Predictors: shot angle, shot distance, rush, rebound
  • Average Brier Score: 0.05036

Logistic Regression Model

  • Predictors: shot angle, shot distance, rush, rebound
  • Brier Score: 0.05035
6

7

8

Expected Goals Model including Player Effects

  • Used a hierarchical logistic model to predict goals and added in player effects

    • Shooter effects and goalie effects
9

Expected Goals Model including Player Effects

  • Used a hierarchical logistic model to predict goals and added in player effects

    • Shooter effects and goalie effects

Why a hierarchical model?

  • Information is pulled across all players and serves as regularization, shooters and goalies involved with fewer shots will be pulled towards group mean
10

More variation in shooter ability than goalie ability

11

Best and Worst Players by Player Effect

12

Expected Goals Model including Player Age

  • Taking the same model that was used for player effects and adding in 2 variables for shooter and goalie age

  • Used a quadratic polynomial term for ages since there were few goalie observations between the ages 18-20 and 38+

13

Model Performance

Used leave one season out cross validation to assess our four models

Calculated Brier score for overall hold out performance and across seasons

  • Individual Season Model
    • Predictors: shot angle, shot distance, rush, rebound
    • Average Brier Score: 0.05036
  • Logistic Regression Model
    • Predictors: shot angle, shot distance, rush, rebound
    • Brier Score: 0.05035
  • Player Effects Hierarchical Model
    • Predictors: shot angle, shot distance, rush, rebound, shooter effect, goalie effect
    • Brier Score: 0.05029
  • Player Effects Model with Ages (Cross Validation using 10% of data)
    • Predictors: shot angle, shot distance, rush, rebound, shooter effect, goalie effect, shooter age, goalie age
    • Brier Score: 0.05069
14

Comparing Brier Scores

15

Discussion and Next Steps

Results

  • The Player Effects Hierarchical model performed the best with a Brier Score of 0.05029
16

Discussion and Next Steps

Results

  • The Player Effects Hierarchical model performed the best with a Brier Score of 0.05029

Future Work

  • Creation of one large model allowing distance and angle coefficients to vary across seasons

  • Alternative approaches for including player ages

  • Adjustments for rink bias

  • Addition of player tracking data if/when it becomes available

17

Motivation and Problem

  • Goals are arguably the most important statistic in hockey, because the team that has the most goals at the end of the game wins

  • Unfortunately, there aren't many goals scored in hockey so there is limited data, which means anything based on goals would have lots of variability

  • Shots could be used, but that's assuming all shots are equally valuable

  • An expected goals model estimates the probability of a goal based on location and other factors

2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow