Goaltending in the NHL is of vital importance. A goalie’s job is simple: stop the other team from scoring. Good (and timely) goaltending can be the difference between a mediocre team and a team that goes on a deep run in the playoffs. Many of the best “underdog” or “cinderella” playoff runs can be attributed to a goaltender getting hot. For example, just this year, the 8-seed Florida Panthers made a run to the Stanley Cup Final on the back of Sergei Bobrovsky, who put up remarkable numbers in the playoffs.
Individual goaltending is hard, if not impossible, to predict on a year to year basis. The “best” goalies seem to change each year (a total of 12 different goalies have been Vezina trophy finalists over the past 5 seasons, with only 2 appearing more than once). It isn’t uncommon for a goalie to follow up a spectacular year with an unremarkable one, or vice versa. Take Jakob Markstrom, for example, who in 2021-22 was a finalist for the Vezina trophy with a .922 save percentage (SV%) and a 2.22 goals against average (GAA). The following season, he fell off a cliff, posting an .892 SV% and a 2.92 GAA. However, while individual goaltending may be hard to predict, one might expect league averages to be similar on a year to year basis. However, over the last handful of seasons, a striking anomaly has made itself present.
Season | SV | SOG | G | SV% |
---|---|---|---|---|
2012 | 66638 | 72945 | 6307 | 0.9135 |
2013 | 38143 | 41827 | 3684 | 0.9119 |
2014 | 67333 | 73682 | 6349 | 0.9138 |
2015 | 67050 | 73307 | 6257 | 0.9146 |
2016 | 66587 | 72784 | 6197 | 0.9149 |
2017 | 67638 | 74046 | 6408 | 0.9135 |
2018 | 73774 | 80872 | 7098 | 0.9122 |
2019 | 72371 | 79540 | 7169 | 0.9099 |
2020 | 61522 | 67638 | 6116 | 0.9096 |
2021 | 46995 | 51764 | 4769 | 0.9079 |
2022 | 74837 | 82511 | 7674 | 0.9070 |
2023 | 73728 | 81535 | 7807 | 0.9042 |
As you can see, league-wide save percentage has been on a significant decline since 2016. In just eight seasons, save percentage has fallen from nearly .915 to a mere .904 in 2022-23. What’s going on here? That is what we will attempt to uncover.
We used two data sets for our analysis. The first was pulled from Hockey Reference and includes yearly goalie stats for every goalie that has appeared in an NHL game from 2011-12 to 2022-23. Important variables in this data set include season, games played (GP), games started (GS), goals against (GA), shots against (SA), saves (SV), and save percentage (SV%).
Player | Age | Team | Season | GP | GS | W | L | OT | GA | SA | SV | SV% | GAA |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Connor Hellebuyck | 29 | WPG | 2023 | 64 | 64 | 37 | 25 | 2 | 157 | 1964 | 1807 | 0.920 | 2.49 |
Juuse Saros | 27 | NSH | 2023 | 64 | 63 | 33 | 23 | 7 | 171 | 2099 | 1928 | 0.919 | 2.69 |
Alexandar Georgiev | 26 | COL | 2023 | 62 | 62 | 40 | 16 | 6 | 156 | 1904 | 1748 | 0.918 | 2.53 |
Jake Oettinger | 24 | DAL | 2023 | 62 | 61 | 37 | 11 | 11 | 144 | 1776 | 1632 | 0.919 | 2.37 |
Ilya Sorokin | 27 | NYI | 2023 | 62 | 60 | 31 | 22 | 7 | 140 | 1838 | 1698 | 0.924 | 2.34 |
Jordan Binnington | 29 | STL | 2023 | 61 | 60 | 27 | 27 | 6 | 194 | 1826 | 1632 | 0.894 | 3.31 |
Our second dataset is event-level play by play data courtesy of the hockeyR package (https://github.com/danmorse314/hockeyR) created by Dan Morse. The package scrapes and cleans play by play data from the NHL Website. We used it to scrape data from the 2011-12 to 2022-23 season, and the resulting data set had over 4 million rows of event data. Important variables in this data set include game ID, event type, x and y coordinates of each event, strength code, shot distance, and shot angle (the full data set consists of over 100 attributes).
event_type | event_team | event_team_type | period | period_time | game_seconds_remaining | shot_distance | shot_angle | x_fixed | y_fixed |
---|---|---|---|---|---|---|---|---|---|
GAME_SCHEDULED | NA | NA | 1 | 00:00 | 3600 | NA | NA | NA | NA |
FACEOFF | Boston Bruins | home | 1 | 00:00 | 3600 | NA | NA | 0 | 0 |
GIVEAWAY | Philadelphia Flyers | away | 1 | 00:14 | 3586 | NA | NA | -13 | 1 |
SHOT | Boston Bruins | home | 1 | 00:47 | 3553 | 58.7 | 13.8 | 32 | -14 |
MISSED_SHOT | Boston Bruins | home | 1 | 01:01 | 3539 | 37.1 | 3.1 | 52 | -2 |
HIT | Philadelphia Flyers | away | 1 | 01:12 | 3528 | NA | NA | 10 | -36 |
Note: Any time a season is referenced, it refers to the year the season concluded (for example, the 2011-12 season will appear as ‘2012’)
There are many potential explanations for the drastic decline in save percentage. The simplest hypothesis is that goalies have gotten worse. While this could be the case, our inclination is to believe that something else could be going on. So going forward, we will operate under the assumption that the relative skill of goaltenders has not changed.
Our first idea at what could be explaining the decline is simple: worse goalies are playing more. Each team has two goalies on the roster. Typically, one goalie will be the designated “starter” and get the majority of starts, while the “backup” will play a handful of games to help the starter get some rest. In some cases, a team will run a “tandem” where both their goalies are of a similar level and get a similar amount of starts, although this is much less common. Simply put, it’s a possibility that teams are giving their starters more rest, which could lead to save percentage dropping on average.
Season | Avg Starts | Games in Season | Pct Started |
---|---|---|---|
2012 | 56.83333 | 82.00000 | 0.6930894 |
2013 | 32.09375 | 48.00000 | 0.6686198 |
2014 | 49.40000 | 82.00000 | 0.6024390 |
2015 | 54.83333 | 82.00000 | 0.6686992 |
2016 | 51.86667 | 82.00000 | 0.6325203 |
2017 | 54.61290 | 82.00000 | 0.6660110 |
2018 | 53.32258 | 82.00000 | 0.6502754 |
2019 | 50.83871 | 82.00000 | 0.6199843 |
2020 | 42.06452 | 70.58065 | 0.5959781 |
2021 | 32.54839 | 56.00000 | 0.5812212 |
2022 | 48.38235 | 82.00000 | 0.5900287 |
2023 | 47.71875 | 82.00000 | 0.5819360 |
Note: You’ll notice that in the 2020 season, the number of games played in the season was approximately 70.58. This is due to the season being cut short for the COVID-19 pandemic, leaving teams with a differing number of games played.
We observe that this is indeed the case. In the figure above, a starter was considered to be the goalie who played in the most games for a team over the course of a season. In general, the percentage of games a starter plays in has declined in a similar manner to save percentage. While this may be a contributing factor to the overall decline in save percentage, the figure below shows that there is something else going on.
Unsurprisingly, it is clear that starters’ save percentages are better than the backups’ (any non-starter). But, even so, for both backups and starters, a decline in save percentage is still present. Even if we narrow it down to the top 15 starters by year based on save percentage, we still observe a decline:
The mean difference in save percentage between starters and non-starters in approximately 0.0064, and the standard deviation is 0.0026. After calculating a confidence interval, we can say with 95% confidence that the “true” difference in save percentage is between 0.0014 and 0.0115. If we assume that this “true” difference is constant across all seasons, we can calculate how much we expect save percentage to have fallen off given the proportion of games started by a starter/backup. The percentage of games that backups are starting has increased by 11 percentage points. Therefore, we would expect a total decline in overall save percentage between 0.0002 and 0.0013. This expected decline would make up between 1.5 and 12.1 percent of the total observed decline in save percentage (0.011). So while this is playing into the decline, the majority of the drop off is being caused by some other phenomenon.
We then shifted our attention to power plays. A power play consists of one team having more players on the ice than the other team due to penalties. Is it possible that an increase in power plays is contributing to a decline in save percentage? As you can see in the figures below, the answer is a resounding no. In fact, power play opportunities have also been trending down, until a recent spike in 2023. Power play shots per game hit a low in 2019 before bouncing back and has been rising since, despite the decline in power play opportunities.
We also observe a decline in save percentage for both even strength and power play situations. Interestingly, the decline in save percentage for even strength seems to start much later than that of power plays. Power play save percentage seems to start a decline after 2014, whereas save percentage for even strength remains mostly constant through 2017 before taking a drop.
Up to this point, we have ignored a key aspect of scoring goals: the shooter. After all, shooting percentage (goals/shots on goal) and save percentage will always add up to 1. If goalies aren’t getting worse, it’s not crazy to think that the decline in save percentage is more to do with shooters becoming more effective.
Not only do we want to know if this is the case, but we want to understand how they’ve gotten better. To accomplish this, we turned to building an expected goals model. An expected goals (xG) model is a prediction model that calculates the likelihood of a shot being a goal given shots with similar characteristics in the past. Often times, these predictive models are trained on many seasons of data. However, our aim was to build an xG model on each season of data to analyze the coefficients for certain shot attributes. This is so that we can observe if the way players are scoring goals has fundamentally changed over time.
To get the play by play data ready to build the model, we first filtered out any shootout and penalty shot events and filtered by fenwick events (shots, missed shots, goals). We then created the following variables (with help from code given by the hockeyR package):
We believe that these are important factors to consider as to how goals are scored. We understand that creating these variables has limitations. Not every shot will be perfectly classified using this method, but we believe that it will be good enough to show any patterns in goal scoring.
Once the data were cleaned and transformed, we built logistic models for even strength xG and power play xG separately. We decided to withhold building a shorthanded model given the infrequency and inherent strangeness of shorthanded shot attempts. Here are the explanatory variable we used in each model
Note: You’ll notice that we did not use the forechecking varibale for the power play model. A very small percentage of fenwick events on the power play were classified as forechecking. This makes intuitive sense as teams generally try for controlled zone entries on the power play as opposed to a “dump and chase”.
After building the framework for the models, we trained a model for every individual season of data going back to the 2011-12 season.
We can see that for all years, the distance coefficient is negative for both even strength and power play. This means that as distance from the net increases, the log odds of scoring a goal decreases, which makes intuitive sense (shots further away yield less goals). In the even strength model, the distance coefficient shows no signs of systematic change across seasons. The power play coefficients increase from 2012 to 2019 before dropping back down, although there is overlap in all of the error bars, so it could just be noise. Overall, these coefficients show that the likelihood of scoring a goal based on shot distance alone hasn’t much changed.
Interestingly, the data show a significant spike in the percentage of fenwick events in the 0-10 foot range over the past few seasons for both even strength and power play. On the power play, there appears to be a steady increase in 20-30 and 30-40 ft shots.
Even more interesting is when we dive into the save percentage on shots in certain ranges. For most bins of distances on both even strength and power play, save percentage has stayed relative consistent. However, we see the decline in the 20-30 and 30-40 ft range for both strengths, and in the 10-20 ft range for power play. This effect is even more apparent when expanding the bin range to 20:
Over time, goalies have been saving 20-40 ft shots less and less. It is unlikely that the play of goalies has declined just for shots of this distance. What’s more likely is that shooters have gotten better at shooting these shots and they are taking more of them, as can be seen on power plays.
Similar to the distance coefficients, the shot angle coefficients for both even strength and power play are negative across all seasons. Shot angle is measured in degrees from the middle of the ice (a shot with angle 0 is right in front of the goalie, a shot with angle 90 is from the goal line). Therefore, it is no surprise that as angle increases, the log odds of scoring decreases. What’s interesting is the general upward trends (closer to 0) in the coefficients in both strength states. In general, this means that increasing shot angle is more likely to be a goal than it once was. Once again, this could show that shooters have improved at scoring from “tougher” angles. We can see that save percentage has declined for most angles above 20 degrees (even strength only, power play seems to be quite random).
The rebound coefficients are positive across all seasons prior to 2023 for both even strength and power play. This means that a rebound shot will increase the log odds of scoring. However, these coefficients have dropped over time. For even strength, there was a steady drop until 2019 where a significant drop occurred (2018 and 2019 have no overlap of 95% confidence intervals). This was followed by another massive drop in 2023, where it fell negative. This means that a rebound had almost no postive or negative effect on the likelihood of a goal. Similar patterns can be observed in the power play model.
This large drop off in 2023 coincides with a spike in the frequency of rebound events. However, this spike is only accounted by missed shots. Shots and goals have remained steady. This is a strange anomaly, but it could be that the NHL’s play by play data has become more accurate over time. If this is the case, the coefficient for 2023 may be closer to the “true” effect that rebounds have on scoring.
The rush coefficients are positive across all season, which means shots defined as a rush lead to a higher chance of a goal. However, the rush coefficient shows almost no pattern across the coefficients, other than a drop in even strength for 2023, which again could point to some systematic change in the way the data is collected.
Cross Ice coefficients are positive for all seasons and show no patterns. Once again we observe a spike in frequency in 2023. For even strength, save percentage seems to be gradually improving for cross ice events.
Forecheck coefficients (even strength only) are positive for all seasons but show no pattern over time.
The angle change coefficients for rebound shots is slightly positive for all seasons, meaning that a rebound with a higher angle change from the previous shot is more likely to end up in a goal. The coefficients for even strength fell and then came back up, however, this is likely just due to randomness.
At this point, we have seen signs that seem to indicate an increase in shot quality from shooters. To officially put this to the test, we built another logistic model using the same variables, but with one key difference: instead of building the model on each individual season, we trained the model on every season before 2018 (in most cases, we observed the most drastic changes 2018 and beyond). We then used this model to predict expected goals and expected fenwick save percentage (1 - Average xG). We were curious to know if on average, expected goals was increasing given shot data from before the decline. If this is the case, we can safely say that shot quality has increased.
Note: Fenwick Save Percentage includes missed shots in the calculation, yielding a higher percentage (missed shots will never be goals)
As you can see, when we apply this model to recent seasons, we see a decline in expected fenwick save percentage very similar to the decline in observed save percentage. This shows that, on average, shot quality (as determined by 2012-17) has increased. The same trend can be observed on the power play:
The nature of the play by play data should be of some concern in regards to the results. First of all, we are unsure how accurate the data truly are. Some anomalies such as the spike in rebound percentage could point to the fact that the data are becoming more accurate than they once were in regards to factors such as distance, angle, and time of the game. On top of that, without true labels defining the binary descriptors such as rebound, rush, cross ice event, it is impossible to accurately assign these. We assigned these variables based on strict definitions, but in reality these events can take many different forms.
Our external advisor, Sam Ventura of the Buffalo Sabres, was kind enough to provide a sample of privately collected shot data. Due to the nature of how it is tracked, this data is much more accurate and includes true labels for the binary variables used in our model. The privately tracked data includes data from the 2016 season onwards.
The figure above shows that overall, 5v5 fenwick save percentage is pretty consistent across both data sets, with the private data being slightly higher across all years. This is likely a result of more precision in tracking missed shots within the private data (more missed shots -> higher fenwick save percentage)
The figures below show how the frequency of our predictor variables differs between the two data sets.
As you can see, there are some notable vast differences. Specifically in the rush, cross ice, and forechecking variables, the privately tracked data has many more observations labeled. It’s no surprise that our “definitions” for these variable were unable to see even close to the full picture. Regardless of the differences, we were curious to see if our model results could hold water. So, we trained the “same” logistic model this time using the privately tracked data (same variables used in the model despite the differences in labeling). Here’s how the coefficients compare:
In general, it appears as if most of our variables follow the same pattern for both the private and public data. The largest difference is the rush coefficients, and given how differently they were labeled, this is not a shock. Overall, these results are promising that our analysis on the public data has some merit. If our theory is correct in that the NHL continues to become more accurate with their event tracking, it will be interesting to run this again on new years of data.
The decline in save percentage in the NHL brought up many hypotheses as to what could be causing it. Of course, it may be easy to assume goaltending may just be getting worse. However, we chose to give goalies the benefit of doubt and tried to uncover something else that may be driving this trend. We first examined if this decline is due to starting goalies playing less. While it is true that backups have gotten more and more starts as time goes on, the difference in quality between starters and backups isn’t great enough for this to be the driving force. We next proposed the idea of more power plays, but immediately shot it down once the data showed that power play opportunities have actually been trending down. Our last hypothesis was that shot quality had improved over time. After building expected goal models on each season, we found some interesting trends in how goals were/are being scored. Save percentage has dropped for shots in the 20-40 foot range, as well as for shots with an angle of greater than 20 degrees.
After restructuring the model to be trained on the years before the decline, we applied it to recent years of data and observe a leap in expected goals, which shows that shooters are taking higher quality shots. We are very curious to see if this trend continues and how far it goes in future years. Or will it bounce back? Time will tell.
Although we found some interesting trends, it is clear that our data is quite imperfect. Thankfully, Sam Ventura of the Buffalo Sabres was able to provide us with a sample of privately tracked data so we could compare the results. While the variables aren’t one to one, we believe our results were close enough to hold water.
We are very excited to continue working with Sam on this project, so expect more from us soon.
We’d like to give a huge thank you to Sam Ventura for his guidance throughout the research process. We’d also like to thank our program instructors Meg Ellingwood and Shamindra Shrotriya as well as our TA Quang Nguyen.