Introduction

Goaltending in the NHL is of vital importance. A goalie’s job is simple: stop the other team from scoring. Good (and timely) goaltending can be the difference between a mediocre team and a team that goes on a deep run in the playoffs. Many of the best “underdog” or “cinderella” playoff runs can be attributed to a goaltender getting hot. For example, just this year, the 8-seed Florida Panthers made a run to the Stanley Cup Final on the back of Sergei Bobrovsky, who put up remarkable numbers in the playoffs.

Individual goaltending is hard, if not impossible, to predict on a year to year basis. The “best” goalies seem to change each year (a total of 12 different goalies have been Vezina trophy finalists over the past 5 seasons, with only 2 appearing more than once). It isn’t uncommon for a goalie to follow up a spectacular year with an unremarkable one, or vice versa. Take Jakob Markstrom, for example, who in 2021-22 was a finalist for the Vezina trophy with a .922 save percentage (SV%) and a 2.22 goals against average (GAA). The following season, he fell off a cliff, posting an .892 SV% and a 2.92 GAA. However, while individual goaltending may be hard to predict, one might expect league averages to be similar on a year to year basis. However, over the last handful of seasons, a striking anomaly has made itself present.

Save Percentage by Year

Plot

Table
Season SV SOG G SV%
2012 66638 72945 6307 0.9135
2013 38143 41827 3684 0.9119
2014 67333 73682 6349 0.9138
2015 67050 73307 6257 0.9146
2016 66587 72784 6197 0.9149
2017 67638 74046 6408 0.9135
2018 73774 80872 7098 0.9122
2019 72371 79540 7169 0.9099
2020 61522 67638 6116 0.9096
2021 46995 51764 4769 0.9079
2022 74837 82511 7674 0.9070
2023 73728 81535 7807 0.9042

As you can see, league-wide save percentage has been on a significant decline since 2016. In just eight seasons, save percentage has fallen from nearly .915 to a mere .904 in 2022-23. What’s going on here? That is what we will attempt to uncover.

Data Summary

We used two data sets for our analysis. The first was pulled from Hockey Reference and includes yearly goalie stats for every goalie that has appeared in an NHL game from 2011-12 to 2022-23. Important variables in this data set include season, games played (GP), games started (GS), goals against (GA), shots against (SA), saves (SV), and save percentage (SV%).

Snippet of Hockey Reference Goalie Data
Player Age Team Season GP GS W L OT GA SA SV SV% GAA
Connor Hellebuyck 29 WPG 2023 64 64 37 25 2 157 1964 1807 0.920 2.49
Juuse Saros 27 NSH 2023 64 63 33 23 7 171 2099 1928 0.919 2.69
Alexandar Georgiev 26 COL 2023 62 62 40 16 6 156 1904 1748 0.918 2.53
Jake Oettinger 24 DAL 2023 62 61 37 11 11 144 1776 1632 0.919 2.37
Ilya Sorokin 27 NYI 2023 62 60 31 22 7 140 1838 1698 0.924 2.34
Jordan Binnington 29 STL 2023 61 60 27 27 6 194 1826 1632 0.894 3.31

Our second dataset is event-level play by play data courtesy of the hockeyR package (https://github.com/danmorse314/hockeyR) created by Dan Morse. The package scrapes and cleans play by play data from the NHL Website. We used it to scrape data from the 2011-12 to 2022-23 season, and the resulting data set had over 4 million rows of event data. Important variables in this data set include game ID, event type, x and y coordinates of each event, strength code, shot distance, and shot angle (the full data set consists of over 100 attributes).

Snippet of hockeyR Play by Play Data

event_type event_team event_team_type period period_time game_seconds_remaining shot_distance shot_angle x_fixed y_fixed
GAME_SCHEDULED NA NA 1 00:00 3600 NA NA NA NA
FACEOFF Boston Bruins home 1 00:00 3600 NA NA 0 0
GIVEAWAY Philadelphia Flyers away 1 00:14 3586 NA NA -13 1
SHOT Boston Bruins home 1 00:47 3553 58.7 13.8 32 -14
MISSED_SHOT Boston Bruins home 1 01:01 3539 37.1 3.1 52 -2
HIT Philadelphia Flyers away 1 01:12 3528 NA NA 10 -36

Note: Any time a season is referenced, it refers to the year the season concluded (for example, the 2011-12 season will appear as ‘2012’)

Exploratory Data Analysis

There are many potential explanations for the drastic decline in save percentage. The simplest hypothesis is that goalies have gotten worse. While this could be the case, our inclination is to believe that something else could be going on. So going forward, we will operate under the assumption that the relative skill of goaltenders has not changed.

Hypothesis #1: Worse Goalies are Playing More

Our first idea at what could be explaining the decline is simple: worse goalies are playing more. Each team has two goalies on the roster. Typically, one goalie will be the designated “starter” and get the majority of starts, while the “backup” will play a handful of games to help the starter get some rest. In some cases, a team will run a “tandem” where both their goalies are of a similar level and get a similar amount of starts, although this is much less common. Simply put, it’s a possibility that teams are giving their starters more rest, which could lead to save percentage dropping on average.

Percentage of Games Started by a Starter

Plot

Table
Season Avg Starts Games in Season Pct Started
2012 56.83333 82.00000 0.6930894
2013 32.09375 48.00000 0.6686198
2014 49.40000 82.00000 0.6024390
2015 54.83333 82.00000 0.6686992
2016 51.86667 82.00000 0.6325203
2017 54.61290 82.00000 0.6660110
2018 53.32258 82.00000 0.6502754
2019 50.83871 82.00000 0.6199843
2020 42.06452 70.58065 0.5959781
2021 32.54839 56.00000 0.5812212
2022 48.38235 82.00000 0.5900287
2023 47.71875 82.00000 0.5819360

Note: You’ll notice that in the 2020 season, the number of games played in the season was approximately 70.58. This is due to the season being cut short for the COVID-19 pandemic, leaving teams with a differing number of games played.

We observe that this is indeed the case. In the figure above, a starter was considered to be the goalie who played in the most games for a team over the course of a season. In general, the percentage of games a starter plays in has declined in a similar manner to save percentage. While this may be a contributing factor to the overall decline in save percentage, the figure below shows that there is something else going on.

Unsurprisingly, it is clear that starters’ save percentages are better than the backups’ (any non-starter). But, even so, for both backups and starters, a decline in save percentage is still present. Even if we narrow it down to the top 15 starters by year based on save percentage, we still observe a decline:

The mean difference in save percentage between starters and non-starters in approximately 0.0064, and the standard deviation is 0.0026. After calculating a confidence interval, we can say with 95% confidence that the “true” difference in save percentage is between 0.0014 and 0.0115. If we assume that this “true” difference is constant across all seasons, we can calculate how much we expect save percentage to have fallen off given the proportion of games started by a starter/backup. The percentage of games that backups are starting has increased by 11 percentage points. Therefore, we would expect a total decline in overall save percentage between 0.0002 and 0.0013. This expected decline would make up between 1.5 and 12.1 percent of the total observed decline in save percentage (0.011). So while this is playing into the decline, the majority of the drop off is being caused by some other phenomenon.

Hypothesis 2: Increase in Power Plays

We then shifted our attention to power plays. A power play consists of one team having more players on the ice than the other team due to penalties. Is it possible that an increase in power plays is contributing to a decline in save percentage? As you can see in the figures below, the answer is a resounding no. In fact, power play opportunities have also been trending down, until a recent spike in 2023. Power play shots per game hit a low in 2019 before bouncing back and has been rising since, despite the decline in power play opportunities.

Power Plays by Year

Opportunities

Shots

We also observe a decline in save percentage for both even strength and power play situations. Interestingly, the decline in save percentage for even strength seems to start much later than that of power plays. Power play save percentage seems to start a decline after 2014, whereas save percentage for even strength remains mostly constant through 2017 before taking a drop.

Hypothesis 3: Shooters are Becoming More Effective

Up to this point, we have ignored a key aspect of scoring goals: the shooter. After all, shooting percentage (goals/shots on goal) and save percentage will always add up to 1. If goalies aren’t getting worse, it’s not crazy to think that the decline in save percentage is more to do with shooters becoming more effective.

Not only do we want to know if this is the case, but we want to understand how they’ve gotten better. To accomplish this, we turned to building an expected goals model. An expected goals (xG) model is a prediction model that calculates the likelihood of a shot being a goal given shots with similar characteristics in the past. Often times, these predictive models are trained on many seasons of data. However, our aim was to build an xG model on each season of data to analyze the coefficients for certain shot attributes. This is so that we can observe if the way players are scoring goals has fundamentally changed over time.

Methods

Data Transformation

To get the play by play data ready to build the model, we first filtered out any shootout and penalty shot events and filtered by fenwick events (shots, missed shots, goals). We then created the following variables (with help from code given by the hockeyR package):

  • Rush - TRUE when preceded by a neutral/defensive zone event 4 or less seconds prior
  • Rebound - TRUE when preceded by a fenwick event 2 or less seconds prior
  • Angle Change - Difference in angle from previous shot for shots classified as rebounds
  • Cross Ice - TRUE when preceded by an offensive zone event on the opposite side of the ice 2 or less second prior
  • Forecheck - TRUE when preceded by a takeaway/giveaway in the offensive zone 2 or less seconds prior

We believe that these are important factors to consider as to how goals are scored. We understand that creating these variables has limitations. Not every shot will be perfectly classified using this method, but we believe that it will be good enough to show any patterns in goal scoring.

Model Building

Once the data were cleaned and transformed, we built logistic models for even strength xG and power play xG separately. We decided to withhold building a shorthanded model given the infrequency and inherent strangeness of shorthanded shot attempts. Here are the explanatory variable we used in each model

  • Shot Distance
  • Shot Angle
  • Rush Event (Boolean)
  • Rebound Event (Boolean)
  • Rebound Event X Angle Change (Interaction Term)
  • Cross Ice Event (Boolean)
  • Forecheck - Even strength only (Boolean)

Note: You’ll notice that we did not use the forechecking varibale for the power play model. A very small percentage of fenwick events on the power play were classified as forechecking. This makes intuitive sense as teams generally try for controlled zone entries on the power play as opposed to a “dump and chase”.

After building the framework for the models, we trained a model for every individual season of data going back to the 2011-12 season.

Results

Shot Distance

Coefficients

Frequency

Save Percentage

We can see that for all years, the distance coefficient is negative for both even strength and power play. This means that as distance from the net increases, the log odds of scoring a goal decreases, which makes intuitive sense (shots further away yield less goals). In the even strength model, the distance coefficient shows no signs of systematic change across seasons. The power play coefficients increase from 2012 to 2019 before dropping back down, although there is overlap in all of the error bars, so it could just be noise. Overall, these coefficients show that the likelihood of scoring a goal based on shot distance alone hasn’t much changed.

Interestingly, the data show a significant spike in the percentage of fenwick events in the 0-10 foot range over the past few seasons for both even strength and power play. On the power play, there appears to be a steady increase in 20-30 and 30-40 ft shots.

Even more interesting is when we dive into the save percentage on shots in certain ranges. For most bins of distances on both even strength and power play, save percentage has stayed relative consistent. However, we see the decline in the 20-30 and 30-40 ft range for both strengths, and in the 10-20 ft range for power play. This effect is even more apparent when expanding the bin range to 20:

Over time, goalies have been saving 20-40 ft shots less and less. It is unlikely that the play of goalies has declined just for shots of this distance. What’s more likely is that shooters have gotten better at shooting these shots and they are taking more of them, as can be seen on power plays.

Shot Angle

Coefficients

Frequency

Save Percentage

Similar to the distance coefficients, the shot angle coefficients for both even strength and power play are negative across all seasons. Shot angle is measured in degrees from the middle of the ice (a shot with angle 0 is right in front of the goalie, a shot with angle 90 is from the goal line). Therefore, it is no surprise that as angle increases, the log odds of scoring decreases. What’s interesting is the general upward trends (closer to 0) in the coefficients in both strength states. In general, this means that increasing shot angle is more likely to be a goal than it once was. Once again, this could show that shooters have improved at scoring from “tougher” angles. We can see that save percentage has declined for most angles above 20 degrees (even strength only, power play seems to be quite random).

Rebound Events

Coefficients

Frequency

Save Percentage

The rebound coefficients are positive across all seasons prior to 2023 for both even strength and power play. This means that a rebound shot will increase the log odds of scoring. However, these coefficients have dropped over time. For even strength, there was a steady drop until 2019 where a significant drop occurred (2018 and 2019 have no overlap of 95% confidence intervals). This was followed by another massive drop in 2023, where it fell negative. This means that a rebound had almost no postive or negative effect on the likelihood of a goal. Similar patterns can be observed in the power play model.

This large drop off in 2023 coincides with a spike in the frequency of rebound events. However, this spike is only accounted by missed shots. Shots and goals have remained steady. This is a strange anomaly, but it could be that the NHL’s play by play data has become more accurate over time. If this is the case, the coefficient for 2023 may be closer to the “true” effect that rebounds have on scoring.

Rush Events

Coefficients

Frequency

Save Percentage

The rush coefficients are positive across all season, which means shots defined as a rush lead to a higher chance of a goal. However, the rush coefficient shows almost no pattern across the coefficients, other than a drop in even strength for 2023, which again could point to some systematic change in the way the data is collected.

Cross Ice Events

Coefficients

Frequency

Save Percentage

Cross Ice coefficients are positive for all seasons and show no patterns. Once again we observe a spike in frequency in 2023. For even strength, save percentage seems to be gradually improving for cross ice events.

Forecheck Events

Coefficients

Frequency

Save Percentage

Forecheck coefficients (even strength only) are positive for all seasons but show no pattern over time.

Angle Change on Rebound Shots

Coefficients

Frequency

The angle change coefficients for rebound shots is slightly positive for all seasons, meaning that a rebound with a higher angle change from the previous shot is more likely to end up in a goal. The coefficients for even strength fell and then came back up, however, this is likely just due to randomness.

At this point, we have seen signs that seem to indicate an increase in shot quality from shooters. To officially put this to the test, we built another logistic model using the same variables, but with one key difference: instead of building the model on each individual season, we trained the model on every season before 2018 (in most cases, we observed the most drastic changes 2018 and beyond). We then used this model to predict expected goals and expected fenwick save percentage (1 - Average xG). We were curious to know if on average, expected goals was increasing given shot data from before the decline. If this is the case, we can safely say that shot quality has increased.

Note: Fenwick Save Percentage includes missed shots in the calculation, yielding a higher percentage (missed shots will never be goals)

As you can see, when we apply this model to recent seasons, we see a decline in expected fenwick save percentage very similar to the decline in observed save percentage. This shows that, on average, shot quality (as determined by 2012-17) has increased. The same trend can be observed on the power play:

Limitations

The nature of the play by play data should be of some concern in regards to the results. First of all, we are unsure how accurate the data truly are. Some anomalies such as the spike in rebound percentage could point to the fact that the data are becoming more accurate than they once were in regards to factors such as distance, angle, and time of the game. On top of that, without true labels defining the binary descriptors such as rebound, rush, cross ice event, it is impossible to accurately assign these. We assigned these variables based on strict definitions, but in reality these events can take many different forms.

Our external advisor, Sam Ventura of the Buffalo Sabres, was kind enough to provide a sample of privately collected shot data. Due to the nature of how it is tracked, this data is much more accurate and includes true labels for the binary variables used in our model. The privately tracked data includes data from the 2016 season onwards.

The figure above shows that overall, 5v5 fenwick save percentage is pretty consistent across both data sets, with the private data being slightly higher across all years. This is likely a result of more precision in tracking missed shots within the private data (more missed shots -> higher fenwick save percentage)

The figures below show how the frequency of our predictor variables differs between the two data sets.

Frequncy of Predictor Variables

Distance

Angle

Rebound

Rush

Cross Ice

Forecheck

As you can see, there are some notable vast differences. Specifically in the rush, cross ice, and forechecking variables, the privately tracked data has many more observations labeled. It’s no surprise that our “definitions” for these variable were unable to see even close to the full picture. Regardless of the differences, we were curious to see if our model results could hold water. So, we trained the “same” logistic model this time using the privately tracked data (same variables used in the model despite the differences in labeling). Here’s how the coefficients compare:

Coefficients

Distance

Angle

Rebound

Rush

Cross Ice

Forecheck

In general, it appears as if most of our variables follow the same pattern for both the private and public data. The largest difference is the rush coefficients, and given how differently they were labeled, this is not a shock. Overall, these results are promising that our analysis on the public data has some merit. If our theory is correct in that the NHL continues to become more accurate with their event tracking, it will be interesting to run this again on new years of data.

Conclusion

The decline in save percentage in the NHL brought up many hypotheses as to what could be causing it. Of course, it may be easy to assume goaltending may just be getting worse. However, we chose to give goalies the benefit of doubt and tried to uncover something else that may be driving this trend. We first examined if this decline is due to starting goalies playing less. While it is true that backups have gotten more and more starts as time goes on, the difference in quality between starters and backups isn’t great enough for this to be the driving force. We next proposed the idea of more power plays, but immediately shot it down once the data showed that power play opportunities have actually been trending down. Our last hypothesis was that shot quality had improved over time. After building expected goal models on each season, we found some interesting trends in how goals were/are being scored. Save percentage has dropped for shots in the 20-40 foot range, as well as for shots with an angle of greater than 20 degrees.

After restructuring the model to be trained on the years before the decline, we applied it to recent years of data and observe a leap in expected goals, which shows that shooters are taking higher quality shots. We are very curious to see if this trend continues and how far it goes in future years. Or will it bounce back? Time will tell.

Although we found some interesting trends, it is clear that our data is quite imperfect. Thankfully, Sam Ventura of the Buffalo Sabres was able to provide us with a sample of privately tracked data so we could compare the results. While the variables aren’t one to one, we believe our results were close enough to hold water.

We are very excited to continue working with Sam on this project, so expect more from us soon.

Acknowledgements

We’d like to give a huge thank you to Sam Ventura for his guidance throughout the research process. We’d also like to thank our program instructors Meg Ellingwood and Shamindra Shrotriya as well as our TA Quang Nguyen.