In the 2022-2023 NHL hockey season Connor McDavid and Michael Bunting were credited for leading the league in penalties drawn with 45 and 43, respectively. However, does this also make them the top players in drawing power plays, or were their penalties drawn offset by penalties taken? This report focuses on finding what players in the league are the most successful in drawing power plays.
A power play in hockey occurs when one or more players from the opposing team are serving penalties, granting the advantaged team an increased number of players on the ice. This numerical advantage allows the advantaged team to gain a favorable position to potentially score goals and gain control of the game.
Being on a power play in hockey is valuable because it increases scoring opportunities compared to playing at even strength. Having an advantage of an extra player on the ice allows teams to have a higher chance of creating choice-scoring opportunities and converting those into goals. Last season the league’s average power-play percentage of 21.1% exemplifies the usefulness of using the numerical advantage, highlighting that power plays significantly elevate a team’s goal-scoring potential compared to their performance at even strength. As a result, teams appreciate power plays because of the advantage, and knowing what players aid in putting their team on a power play is beneficial.
For this project, we downloaded the play-by-play hockey data from the r package hockeyR. We used the play-by-play data from the past ten NHL hockey seasons, 2013-2023.
The play-by-play data from the hockeyR package contains a total of 111 variables and over 4.1 million entries. To streamline our analysis, we extracted 34 relevant variables from the built-in hockeyR dataset, utilizing only approximately 29 of them for our current investigations. We also created 11 new variables necessary for our analysis.
Variables | Description |
---|---|
game_id | Unique ID for Game |
period | Number representing the period |
event | The event that is happening at that given moment |
type | Gives types for certain events such as penalty type |
description | String describing event |
event_team | Team executing the event |
period_seconds | Seconds remaining in the period |
strength_state | The id for strength during event (PP, EV, or SH) |
p1_name | First player involved in event |
p1_id | Unique ID of first player |
p2_name | Second player involved in event |
p2_id | Unique ID of second player |
p3_name | Third player involved in event |
p3_id | Unique ID of third player |
home_skaters | Number of players on the ice for the home team, excluding goalie |
away_skaters | Number of players on the ice for the away team, excluding goalie |
home_team | Home team abbreviation |
away_team | Away team abbreviation |
season | Seasons years |
season_type | Letter representing either regular season or playoffs |
Variables | Description |
---|---|
row_num | Row number (fixed) |
p1_team | Team of first player |
p2_team | Team of second player |
strength_diff | Difference in number of players on the ice (home - away) |
Penalty_on | During grouped time player who took a penalty |
Penalty_on_id | Unique ID of penalty on player |
Penalty_on_team | Team of penalty on player |
Drew_by | During grouped time player who drew a penalty |
Drew_by_id | Unique ID of drew by player |
Drew_by_team | Team of drew by player |
PP_penalty_player | During grouped time player whose penalty drawn resulted in a power play |
PP_penalty_player_id | Unique ID of power play penalty player |
PP_penalty_team | Team of power play penalty player |
In order to create the variables regarding penalty on and drew by, there was an ample amount of wrangling the data. Intuitively, you may think that player 1 would be the player who took the penalty, and player 2 is the player who drew the penalty. This is not the case, as there are penalties where player 2 is not the player who drew the penalty - it may be the player who served the penalty. We accounted for this by detecting keywords such as “against” and “served by” in the description column to decide which players are assigned the penalty on or drawing the penalty.
Most of the penalty events have a pattern in the description column. There are four different patterns. In order to get the penalty on and drew by players in a penalty event, we created the variables depending on the combinations of “against” and “served by” detected in the description. Table 3 shows examples of description patterns with the corresponding players involved in the event, as well as the results of penalty on and drew by variables.
Description | Player 1 | Player 2 | Player 3 | Penalty On | Drew By |
---|---|---|---|---|---|
Ryan Johansen Slashing against Noah Gregor | Ryan.Johansen | Noah.Gregor | NA | Ryan Johansen | Noah Gregor |
Kiefer Sherwood Delaying Game | Kiefer.Sherwood | NA | NA | Kiefer Sherwood | NA |
Nick Ritchie Roughing against Evgeni Malkin served by Zack Kassian | Nick.Ritchie | Evgeni.Malkin | Zack.Kassian | Nick Ritchie | Evgeni Malkin |
Jack Campbell Delaying Game - Puck over glass served by Connor McDavid | Jack.Campbell | Connor.McDavid | NA | Jack Campbell | NA |
IMPORTANT: For this particular project, players are credited only once for a penalty on, drawn by, and power play drawn when multiple penalties occur simultaneously. In other words, we are treating multiple penalties that happen at the same time as one event. This approach is driven by our interest in determining which players draw the most power plays. Our focus is on the number of times a player draws penalties and whether this changes the game state to a power play. We apply this logic consistently to penalties on, penalties drawn, and power plays drawn. Therefore, if a player is guilty of multiple penalties or draws multiple penalties at one time in a game, they are credited once respectively for penalty on and drew by. We do this because even if a player draws multiple penalties at once during a game, this will not lead to multiple power plays - it leads to a single power play. However, it may lead to a change in the strength of the teams’ power play, depending on the situation.
Period | Seconds | Penalty On | Penalty On ID | Drew by | Drew by ID | Drew PP | Drew PP ID |
---|---|---|---|---|---|---|---|
3 | 697 | Jason Zucker | 8475722 | Brendan Lemieux | 8477962 | Brendan Lemieux | 8477962 |
3 | 697 | Brendan Lemieux | 8477962 | Tristan Jarry | 8477465 | NA | NA |
3 | 697 | Jason Zucker | NA | Brendan Lemieux | NA | NA | NA |
In order to calculate the per 60 statistics for the counts we found, we had to have players’ games played and time on ice statistics. We scraped this data from Natural Stat Trick.
Natural stat trick provides players’ time on ice and games played information for the regular season and playoffs. The site also has the option of splitting the data by teams for the players, so you can see a player’s statistics for the teams they played on that season if they are being traded during the season.
Variables | Description |
---|---|
player_id | Players unique 8-digit ID |
player | Full name of player |
team | Team player was on when procuring these statistics |
position | Position of player (forward, defense, or goalie) |
TOI_reg | Time on ice for the regular season |
GP_reg | Games played in the regular season |
TOI_po | Time on ice for the playoffs |
GP_po | Games played in the playoffs |
TOI_tot | Time on ice total (Regular Season + Playoffs) |
GP_tot | Games played total (Regular Season + Playoffs) |
season | Years of season |
After calculating the counts using the play-by-play data and joining these counts with the players statitics for that teams, we then calculated the per 60 metrics for penalties on, drew by, and power plays drawn.
The equation to calculate this is \(Per \space 60 \space Stat \space = \space \displaystyle \frac{Total \space Count}{Total \space Time \space On \space Ice} \space * \space 60\)
Variables | Description |
---|---|
season | Years of season |
season_type | Letter representing season type (R, P, T) |
player_id | Players unique 8-digit ID |
name | Full name of player |
team | Team player was on when procuring these statistics |
Penalty_on_count | Raw count of penalty on |
Drew_by_count | Raw count of drew by |
PP_draw_count | Raw count of power plays drawn |
Penalties_sum | Sum of penalties |
position | Position of player (forward, defense, or goalie) |
TOI_reg | Time on ice for the regular season |
GP_reg | Games played in the regular season |
TOI_po | Time on ice for the playoffs |
GP_po | Games played in the playoffs |
TOI_tot | Time on ice total (Regular Season + Playoffs) |
GP_tot | Games played total (Regular Season + Playoffs) |
Penalty_on_per_60 | Per 60 metric for penalty on |
Drew_by_per_60 | Per 60 metric for drew by |
PP_draw_per_60 | Per 60 metric for power plays drawn |
season_type_words | String representing season type |
Since we created a new per 60 metric to apply to players, we were interested in seeing the repeatability or consistency of this metric. Does power plays drawn per 60 for a player one season indicate how his metric for the next season?
In order to evaluate how repeatable this metric is, we chose to do a simple linear regression model. Since paired data are required to run the linear model, we filtered out the data where the power plays drawn per 60 metrics are NAs or players who never drew power plays. We also specifically only ran this for the regular season stats for players who played more than ten games and are not goalies. After running this model, we looked at the r-squared outputs from season to season to interpret how much of the variability of the power plays drawn per 60 metric is accounted for by the players’ previous season metric.
The simple linear regression model: \(PP \space Drawn_{x} = \beta_{0} \space + \space \beta_{1} \space(PP \space Drawn_{x - 1})\) where x is response season, and x - 1 is the previous season (explanatory).
We are thrilled to introduce our interactive Shiny app, which provides a comprehensive overview of players’ performance in drawing power plays and penalties, along with teams’ patterns in taking and drawing penalties over the past ten seasons. You can access the app here.
This report specifically focuses on analyzing the players’ and teams’ performance during the 2022-2023 regular season. We kindly encourage you to explore the Shiny app for a more in-depth analysis of the past ten seasons. The app allows you to dive deeper into the data and explore various player and team statistics aspects in the past ten seasons through interactive visualizations and tables.
Table 5 shows the r-squared values after running the linear regression model from season to season. These results can be interpreted as, on average, 30.5% of the variability in the power player drawn per 60 metrics we created is accounted for by the previous season metric.
Response Season | Explanatory Season | R-Squared |
---|---|---|
2022-2023 | 2021-2022 | 0.324 |
2021-2022 | 2020-2021 | 0.279 |
2020-2021 | 2019-2020 | 0.313 |
2019-2020 | 2018-2019 | 0.301 |
2018-2019 | 2017-2018 | 0.284 |
2017-2018 | 2016-2017 | 0.323 |
2016-2017 | 2015-2016 | 0.327 |
2015-2016 | 2014-2015 | 0.281 |
2014-2015 | 2013-2014 | 0.311 |
Out of curiosity, we added a positions explanatory variable to the model (forwards or defense players since goalies were already filtered out). We were interested in seeing if this increases the r-squared.
This changes our model equation to: \(PP \space Drawn_{x} = \beta_{0} \space + \space \beta_{1} \space(PP \space Drawn_{x - 1}) \space + \space \beta_{2} \space(Position_{x - 1})\) still x is response season, and x - 1 is the previous season (explanatory).
These results show that position does increase the r-squared. The average r-squared increases by almost 7% at 37.4%. Hence, on average 37.4% of variability in the power plays drawn per 60 metrics is explained by the previous years metric and the players position of the previous year.
Response Season | Explanatory Season | R-Squared |
---|---|---|
2022-2023 | 2021-2022 | 0.329 |
2021-2022 | 2020-2021 | 0.282 |
2020-2021 | 2019-2020 | 0.322 |
2019-2020 | 2018-2019 | 0.328 |
2018-2019 | 2017-2018 | 0.289 |
2017-2018 | 2016-2017 | 0.326 |
2016-2017 | 2015-2016 | 0.345 |
2015-2016 | 2014-2015 | 0.292 |
2014-2015 | 2013-2014 | 0.323 |
According to the Penalties Table, the regular season of 2022-2023 witnessed the following top 5 players with the highest power play drawn per 60 rates (for players with more than 10 games played):
On the other hand, the top 5 players with the highest raw count of power play drawn during the regular season 2022-2023 were (for players with more than 10 games played):
Moving on to penalty draws, the top 5 players with the highest penalty drawn per 60 were:
Finally, the top 5 players with the highest penalty drawn raw count were:
According to the bar charts, the top 5 types of penalties drawn that lead to power plays in the regular season of 2022-2023 are as follows:
These penalty types are considered clear-cut and obvious to the referees, resulting in the most power play opportunities.
It is worth noting that these statistics are generated based on players’ data. If a player draws a penalty leading to a power play, that specific penalty type will be counted once. Therefore, the sum of these statistics will be greater than the actual power play that happened since all penalties in the same event that lead to a power play are credited.
While the top players with the highest powerplay-drawn count match the players with the most penalty-drawn, the raw count alone does not accurately reflect a player’s ability to draw power plays or penalties since players who spend more ice time usually have more opportunities to draw penalties and subsequently generate power plays. To account for this, we calculate the power play and penalty draw per 60 rates. This standardized approach balances the effect of players’ time on ice and provides a more accurate understanding of their ability to draw power plays and penalties.
Upon analyzing the per 60 rates, we find that none of the top 5 players based on power play drawn raw count rank in the top 5 for power play drawn per 60. This emphasizes the importance of considering the per-60 rate when evaluating a player’s performance in this aspect.
For example, Mark Friedman from the Pittsburgh Penguins played only 23 games in the 2022-2023 regular season, with a powerplay-drawn raw count of 10, which places him at rank 222 based on powerplay-drawn raw count. However, when considering his lower time on ice, Friedman emerges as a top 3 player in drawing power plays per 60. Therefore, analyzing the per 60 rates provides a more comprehensive and fair evaluation of players’ contributions to power plays.
From the scatter plots, two distinct patterns emerge, providing valuable insights into players’ behavior regarding penalties. Firstly, there is a clear positive correlation between the penalty draw count/per 60 and penalty take count/per 60. Players who draw more penalties are also more likely to take penalties. This correlation intuitively aligns with our expectations, serving as a self-check to validate the credibility of our findings.
The more significant observation lies in the relationship between players’ positions and their involvement in penalties. When considering players who take the same number of penalties, forwards draw a higher number of penalties compared to defense players. This pattern is evident in both the per 60 rate plot and the raw count plot. Forwards are more active in generating penalty opportunities due to their roles on ice. Furthermore, when examining the top players who excel in drawing power plays (both in per 60 and raw count), we find that they are predominantly forwards. This finding reinforces the idea that forwards significantly impact their team’s power play opportunities. Their ability to draw penalties and create power play situations can be a game-changer, highlighting the importance of offensive contributions from forwards.
The League Overview (raw count) scatterplot shows a clear positive correlation between the penalty-taken and drawn counts. This phenomenon can be attributed to the NHL league’s game management approach, wherein the referees strive to balance the penalty numbers for both teams within a game. Therefore, analyzing players’ capability in drawing actual power plays emerges as a valuable source of information for teams seeking a competitive edge.
Moreover, the game management strategy opens up the possibility of predicting the timing of upcoming penalties and the players likely to be called upon, which could be a future route of this research project.
To ensure data relevance in the playoffs, we adopt a 2-game played threshold for filtering players, considering the fewer games played during this period than the regular season (which has a 10-game threshold). However, this smaller threshold might introduce biased per-60 rates for players with limited ice time. Moreover, both the regular season and playoff thresholds could unintentionally exclude players with exceptional penalty-drawing or powerplay skills but limited games played.
The penalty-drawn and powerplay-drawn data are obtained through string analysis of the description column, which unfortunately contains inconsistent documentation, in the hockeyR dataset. As a result, the dataset may contain errors related to penalty count and types.
Our current focus lies in modeling the time distribution of each type of penalty called during the period, alongside the time distribution of goals occurring during power plays. By leveraging this data, we aim to predict potential increases or losses in power play opportunities that could significantly impact game success if teams are allowed to complete their penalty time after the end of a period or game.
We are currently developing a penalty types breakdown for individual players’ powerplay-drawn, building upon our existing bar charts. In essence, we aim to identify the top penalty types contributing to each player’s powerplay-drawn. Our prototype for this analysis takes shape as follows:
Another crucial aspect of our work involves modeling the patterns of types of penalties called for each season. Our objective is to predict the penalty that could be the most frequently called in the upcoming season.
We are committed to regularly updating and enhancing the ShinyApp, optimizing the user experience, and ensuring its relevance and accuracy.
We will modify the data pipeline and resolve existing data-related issues in the current dataset.
We express our deepest gratitude to our project advisors, Katerina Wu, Caleb Peña, and Jacob Pavlovich from the Pittsburgh Penguins, for generously dedicating their time to meet with us and providing invaluable guidance throughout the project.
We sincerely thank our instructors, Meg Ellingwood, and Shamindra Shrotriya, for their support and encouragement. Special thanks go to our TAs Quang Nguyen and Nick Kissel for their help with our inquiries and Dr. Ron Yurko for overseeing this program.
The slope here demonstrated the relationships between Penalty Taken Per 60 and Penalty Drawn Per 60, grouping by teams. In other words, for each team in the Regular Season 2022-2023, we fit a linear regression model expressed as \(\hat{\text{Penalty Drawn per 60}} = \beta_0 + \beta_1\text{Penalty Taken per 60}\)
Team | Slope |
---|---|
PIT | 1.0538653 |
BOS | 1.0030632 |
WSH | 0.8891008 |
In the League Overview per 60 Scatterplot, we can observe variations in how different teams perform in drawing penalties. The Pittsburgh Penguins, Boston Bruins, and Washington Capitals exhibit a particularly noteworthy behavior, with the highest slope coefficient for penalties taken per 60 (the penalties drawn per 60 as the response). Taking the Pittsburgh Penguins as an example, if the average number of penalties taken per 60 by the Penguins increases by 1, the estimated number of penalties drawn per 60 will increase by 1.054 based on the data in the regular season 2022-2023. Since the slope of the Penguins is higher than other teams, given the same amount of penalty-taken per 60, the Penguins are estimated to have a higher penalty-drawn per 60, thus having more opportunities in drawing a power play.
Katherine (Shaojun) Gong, Mount Holyoke College, gong24s@mtholyoke.edu
Bethany Gozalez, University of Indianapolis, bgon5505@gmail.com