class: center, middle, inverse, title-slide # Deep Dive into Expected Goals Models ### Sara Armstrong ### Devin Basley ### Friday, July 30 --- ## Motivation and Problem - Goals are arguably the most important statistic in hockey, because the team that has the most goals at the end of the game wins - Unfortunately, there aren't many goals scored in hockey so there is limited data, which means anything based on goals would have lots of variability - Shots could be used, but that's assuming all shots are equally valuable - An expected goals model estimates the probability of a goal based on location and other factors -- - Analysts use these models to predict overall team and player performance - One issue with the public expected goals models is they don't factor in how shot location might change over time -- ### Purpose Search for best practices in expected goals through creating our own models --- ## Data - Used NHL shots data from MoneyPuck from 2010-11 season to 2020-21 season - Each observation is a shot - Shot on goal, missed the net, goal - Only using shots during 5-on-5 play - Used player data to calculate each players age as of Sept. 15 of that season | Season|Shooter |Goalie | Shooter Age| Goalie Age| Goal| Distance| Angle| Home Skaters| Away Skaters| Rebound| Rush| |------:|:---------------|:-----------------|-----------:|----------:|----:|---------:|---------:|------------:|------------:|-------:|----:| | 2019|Brady Tkachuk |Frederik Andersen | 19.96992| 29.91114| 1| 4.123106| 14.036243| 5| 5| 0| 0| | 2019|Nikita Zaitsev |Frederik Andersen | 27.84142| 29.91114| 0| 74.000000| 31.239215| 5| 5| 0| 0| | 2019|Dmytro Timashov |Craig Anderson | 22.92276| 38.26658| 0| 27.000000| 9.090277| 5| 5| 0| 0| --- ## Our Expected Goals Models Began with a logistic regression modeling goals as a function of shot distance, shot angle, if the shot occurred on a rush, and if the shot was a rebound #### Individual Season Model: ran the model separately for each season #### Logistic Regression Model: ran the model over the data from all seasons - Looked to see if there was a trend in shot angle and distance coefficients over the seasons Individual Season Model - Predictors: shot angle, shot distance, rush, rebound - Average Brier Score: __0.05036__ Logistic Regression Model - Predictors: shot angle, shot distance, rush, rebound - Brier Score: __0.05035__ --- ## Distance Coefficient trends over season  --- ## Angle Coefficient trends over season  --- ## Expected Goals Model including Player Effects - Used a hierarchical logistic model to predict goals and added in player effects - Shooter effects and goalie effects -- __Why a hierarchical model?__ - Information is pulled across all players and serves as regularization, shooters and goalies involved with fewer shots will be pulled towards group mean --- ## More variation in shooter ability than goalie ability <img src="data:image/png;base64,#XGoal-10-min_files/figure-html/unnamed-chunk-2-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Best and Worst Players by Player Effect <img src="data:image/png;base64,#XGoal-10-min_files/figure-html/unnamed-chunk-3-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Expected Goals Model including Player Age - Taking the same model that was used for player effects and adding in 2 variables for shooter and goalie age - Used a quadratic polynomial term for ages since there were few goalie observations between the ages 18-20 and 38+ .pull-left[  ] .pull-right[  ] --- ## Model Performance #### Used leave one season out cross validation to assess our four models Calculated Brier score for overall hold out performance and across seasons - Individual Season Model - Predictors: shot angle, shot distance, rush, rebound - Average Brier Score: __0.05036__ - Logistic Regression Model - Predictors: shot angle, shot distance, rush, rebound - Brier Score: __0.05035__ - Player Effects Hierarchical Model - Predictors: shot angle, shot distance, rush, rebound, shooter effect, goalie effect - Brier Score: __0.05029__ - Player Effects Model with Ages (Cross Validation using 10% of data) - Predictors: shot angle, shot distance, rush, rebound, shooter effect, goalie effect, shooter age, goalie age - Brier Score: __0.05069__ --- ## Comparing Brier Scores <img src="data:image/png;base64,#XGoal-10-min_files/figure-html/unnamed-chunk-4-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Discussion and Next Steps __Results__ - The Player Effects Hierarchical model performed the best with a Brier Score of __0.05029__ -- __Future Work__ - Creation of one large model allowing distance and angle coefficients to vary across seasons - Alternative approaches for including player ages - Adjustments for rink bias - Addition of player tracking data if/when it becomes available