Introduction

In the 2022-2023 NHL hockey season Connor McDavid and Michael Bunting were credited for leading the league in penalties drawn with 45 and 43, respectively. However, does this also make them the top players in drawing power plays, or were their penalties drawn offset by penalties taken? This report focuses on finding what players in the league are the most successful in drawing power plays.

A power play in hockey occurs when one or more players from the opposing team are serving penalties, granting the advantaged team an increased number of players on the ice. This numerical advantage allows the advantaged team to gain a favorable position to potentially score goals and gain control of the game.

Being on a power play in hockey is valuable because it increases scoring opportunities compared to playing at even strength. Having an advantage of an extra player on the ice allows teams to have a higher chance of creating choice-scoring opportunities and converting those into goals. Last season the league’s average power-play percentage of 21.1% exemplifies the usefulness of using the numerical advantage, highlighting that power plays significantly elevate a team’s goal-scoring potential compared to their performance at even strength. As a result, teams appreciate power plays because of the advantage, and knowing what players aid in putting their team on a power play is beneficial.

Data

HockeyR Data

For this project, we downloaded the play-by-play hockey data from the r package hockeyR. We used the play-by-play data from the past ten NHL hockey seasons, 2013-2023.

The play-by-play data from the hockeyR package contains a total of 111 variables and over 4.1 million entries. To streamline our analysis, we extracted 34 relevant variables from the built-in hockeyR dataset, utilizing only approximately 29 of them for our current investigations. We also created 11 new variables necessary for our analysis.

Table 1. hockeyR Variables
Variables Description
game_id Unique ID for Game
period Number representing the period
event The event that is happening at that given moment
type Gives types for certain events such as penalty type
description String describing event
event_team Team executing the event
period_seconds Seconds remaining in the period
strength_state The id for strength during event (PP, EV, or SH)
p1_name First player involved in event
p1_id Unique ID of first player
p2_name Second player involved in event
p2_id Unique ID of second player
p3_name Third player involved in event
p3_id Unique ID of third player
home_skaters Number of players on the ice for the home team, excluding goalie
away_skaters Number of players on the ice for the away team, excluding goalie
home_team Home team abbreviation
away_team Away team abbreviation
season Seasons years
season_type Letter representing either regular season or playoffs
Table 2. Created Variables
Variables Description
row_num Row number (fixed)
p1_team Team of first player
p2_team Team of second player
strength_diff Difference in number of players on the ice (home - away)
Penalty_on During grouped time player who took a penalty
Penalty_on_id Unique ID of penalty on player
Penalty_on_team Team of penalty on player
Drew_by During grouped time player who drew a penalty
Drew_by_id Unique ID of drew by player
Drew_by_team Team of drew by player
PP_penalty_player During grouped time player whose penalty drawn resulted in a power play
PP_penalty_player_id Unique ID of power play penalty player
PP_penalty_team Team of power play penalty player

Creating Penalty On, Drew by, Power Play Penalty Player

In order to create the variables regarding penalty on and drew by, there was an ample amount of wrangling the data. Intuitively, you may think that player 1 would be the player who took the penalty, and player 2 is the player who drew the penalty. This is not the case, as there are penalties where player 2 is not the player who drew the penalty - it may be the player who served the penalty. We accounted for this by detecting keywords such as “against” and “served by” in the description column to decide which players are assigned the penalty on or drawing the penalty.

Most of the penalty events have a pattern in the description column. There are four different patterns. In order to get the penalty on and drew by players in a penalty event, we created the variables depending on the combinations of “against” and “served by” detected in the description. Table 3 shows examples of description patterns with the corresponding players involved in the event, as well as the results of penalty on and drew by variables.

Table 3. Despcription Patterns
Description Player 1 Player 2 Player 3 Penalty On Drew By
Ryan Johansen Slashing against Noah Gregor Ryan.Johansen Noah.Gregor NA Ryan Johansen Noah Gregor
Kiefer Sherwood Delaying Game Kiefer.Sherwood NA NA Kiefer Sherwood NA
Nick Ritchie Roughing against Evgeni Malkin served by Zack Kassian Nick.Ritchie Evgeni.Malkin Zack.Kassian Nick Ritchie Evgeni Malkin
Jack Campbell Delaying Game - Puck over glass served by Connor McDavid Jack.Campbell Connor.McDavid NA Jack Campbell NA

IMPORTANT: For this particular project, players are credited only once for a penalty on, drawn by, and power play drawn when multiple penalties occur simultaneously. In other words, we are treating multiple penalties that happen at the same time as one event. This approach is driven by our interest in determining which players draw the most power plays. Our focus is on the number of times a player draws penalties and whether this changes the game state to a power play. We apply this logic consistently to penalties on, penalties drawn, and power plays drawn. Therefore, if a player is guilty of multiple penalties or draws multiple penalties at one time in a game, they are credited once respectively for penalty on and drew by. We do this because even if a player draws multiple penalties at once during a game, this will not lead to multiple power plays - it leads to a single power play. However, it may lead to a change in the strength of the teams’ power play, depending on the situation.

Table 3. Example of Penalties that occured at the same time
Period Seconds Penalty On Penalty On ID Drew by Drew by ID Drew PP Drew PP ID
3 697 Jason Zucker 8475722 Brendan Lemieux 8477962 Brendan Lemieux 8477962
3 697 Brendan Lemieux 8477962 Tristan Jarry 8477465 NA NA
3 697 Jason Zucker NA Brendan Lemieux NA NA NA

Natural Stat Trick Data

In order to calculate the per 60 statistics for the counts we found, we had to have players’ games played and time on ice statistics. We scraped this data from Natural Stat Trick.

Natural stat trick provides players’ time on ice and games played information for the regular season and playoffs. The site also has the option of splitting the data by teams for the players, so you can see a player’s statistics for the teams they played on that season if they are being traded during the season.

Table 4. Natural Stat Trick Variables
Variables Description
player_id Players unique 8-digit ID
player Full name of player
team Team player was on when procuring these statistics
position Position of player (forward, defense, or goalie)
TOI_reg Time on ice for the regular season
GP_reg Games played in the regular season
TOI_po Time on ice for the playoffs
GP_po Games played in the playoffs
TOI_tot Time on ice total (Regular Season + Playoffs)
GP_tot Games played total (Regular Season + Playoffs)
season Years of season

Player Penalty Statistic Data

After calculating the counts using the play-by-play data and joining these counts with the players statitics for that teams, we then calculated the per 60 metrics for penalties on, drew by, and power plays drawn.

The equation to calculate this is \(Per \space 60 \space Stat \space = \space \displaystyle \frac{Total \space Count}{Total \space Time \space On \space Ice} \space * \space 60\)

Table 5. Data Set Used to Create ShinyApp
Variables Description
season Years of season
season_type Letter representing season type (R, P, T)
player_id Players unique 8-digit ID
name Full name of player
team Team player was on when procuring these statistics
Penalty_on_count Raw count of penalty on
Drew_by_count Raw count of drew by
PP_draw_count Raw count of power plays drawn
Penalties_sum Sum of penalties
position Position of player (forward, defense, or goalie)
TOI_reg Time on ice for the regular season
GP_reg Games played in the regular season
TOI_po Time on ice for the playoffs
GP_po Games played in the playoffs
TOI_tot Time on ice total (Regular Season + Playoffs)
GP_tot Games played total (Regular Season + Playoffs)
Penalty_on_per_60 Per 60 metric for penalty on
Drew_by_per_60 Per 60 metric for drew by
PP_draw_per_60 Per 60 metric for power plays drawn
season_type_words String representing season type

Methods

Since we created a new per 60 metric to apply to players, we were interested in seeing the repeatability or consistency of this metric. Does power plays drawn per 60 for a player one season indicate how his metric for the next season?

In order to evaluate how repeatable this metric is, we chose to do a simple linear regression model. Since paired data are required to run the linear model, we filtered out the data where the power plays drawn per 60 metrics are NAs or players who never drew power plays. We also specifically only ran this for the regular season stats for players who played more than ten games and are not goalies. After running this model, we looked at the r-squared outputs from season to season to interpret how much of the variability of the power plays drawn per 60 metric is accounted for by the players’ previous season metric.

The simple linear regression model: \(PP \space Drawn_{x} = \beta_{0} \space + \space \beta_{1} \space(PP \space Drawn_{x - 1})\) where x is response season, and x - 1 is the previous season (explanatory).

Results

Shiny App

We are thrilled to introduce our interactive Shiny app, which provides a comprehensive overview of players’ performance in drawing power plays and penalties, along with teams’ patterns in taking and drawing penalties over the past ten seasons. You can access the app here.

This report specifically focuses on analyzing the players’ and teams’ performance during the 2022-2023 regular season. We kindly encourage you to explore the Shiny app for a more in-depth analysis of the past ten seasons. The app allows you to dive deeper into the data and explore various player and team statistics aspects in the past ten seasons through interactive visualizations and tables.

Metric Stickiness

Table 5 shows the r-squared values after running the linear regression model from season to season. These results can be interpreted as, on average, 30.5% of the variability in the power player drawn per 60 metrics we created is accounted for by the previous season metric.

Table 6. Regression Results
Response Season Explanatory Season R-Squared
2022-2023 2021-2022 0.324
2021-2022 2020-2021 0.279
2020-2021 2019-2020 0.313
2019-2020 2018-2019 0.301
2018-2019 2017-2018 0.284
2017-2018 2016-2017 0.323
2016-2017 2015-2016 0.327
2015-2016 2014-2015 0.281
2014-2015 2013-2014 0.311

Out of curiosity, we added a positions explanatory variable to the model (forwards or defense players since goalies were already filtered out). We were interested in seeing if this increases the r-squared.

This changes our model equation to: \(PP \space Drawn_{x} = \beta_{0} \space + \space \beta_{1} \space(PP \space Drawn_{x - 1}) \space + \space \beta_{2} \space(Position_{x - 1})\) still x is response season, and x - 1 is the previous season (explanatory).

These results show that position does increase the r-squared. The average r-squared increases by almost 7% at 37.4%. Hence, on average 37.4% of variability in the power plays drawn per 60 metrics is explained by the previous years metric and the players position of the previous year.

Table 7. Regression including Position Results
Response Season Explanatory Season R-Squared
2022-2023 2021-2022 0.329
2021-2022 2020-2021 0.282
2020-2021 2019-2020 0.322
2019-2020 2018-2019 0.328
2018-2019 2017-2018 0.289
2017-2018 2016-2017 0.326
2016-2017 2015-2016 0.345
2015-2016 2014-2015 0.292
2014-2015 2013-2014 0.323

Top Power Play Drawers

Penalties Table

According to the Penalties Table, the regular season of 2022-2023 witnessed the following top 5 players with the highest power play drawn per 60 rates (for players with more than 10 games played):

  1. Jakub Lauko (BOS) - 2.65
  2. Kevin Rooney (CGY) - 1.99
  3. Mark Friedman (PIT) - 1.81
  4. Joey Anderson (TOR) - 1.79
  5. Klim Kostin (EDM) - 1.78

On the other hand, the top 5 players with the highest raw count of power play drawn during the regular season 2022-2023 were (for players with more than 10 games played):

  1. Connor McDavid (EDM) - 38
  2. Nazem Kadri (CGY) - 37
  3. Elias Pettersson (VAN) - 35
  4. Nikita Kucherov (TBL) - 34
  5. Brad Marchand (BOS), Troy Terry (ANA), and Tim Stützle (OTT) - 30

Moving on to penalty draws, the top 5 players with the highest penalty drawn per 60 were:

  1. Wayne Simmonds (TOR) - 4.03
  2. Givani Smith (FLA) - 3.1
  3. Jakub Lauko (BOS) - 2.94
  4. Brendan Lemieux (LAK) - 2.62
  5. Jonah Gadjovich (SJS) - 2.48

Finally, the top 5 players with the highest penalty drawn raw count were:

  1. Connor McDavid (EDM) - 44
  2. Brad Marchand (BOS) - 41
  3. Elias Pettersson (VAN) and Michael Bunting (TOR) - 39
  4. Nazem Kadri (CGY), Pierre-Luc Dubois (WPG), and Brady Tkachuk (OTT) - 38
  5. Matthew Tkachuk (FLA) and Nikita Kucherov (TBL) - 36

Bar Charts

Power Play per 60
Power Play Raw Count
Offsetting Penalty Drawn per 60

Penalty Types and Power Play

According to the bar charts, the top 5 types of penalties drawn that lead to power plays in the regular season of 2022-2023 are as follows:

  1. Tripping - 1655
  2. Hooking - 1117
  3. Interference - 806
  4. Holding - 757
  5. High Sticking - 745

These penalty types are considered clear-cut and obvious to the referees, resulting in the most power play opportunities.

It is worth noting that these statistics are generated based on players’ data. If a player draws a penalty leading to a power play, that specific penalty type will be counted once. Therefore, the sum of these statistics will be greater than the actual power play that happened since all penalties in the same event that lead to a power play are credited.

Positions, Penalties and Power Play

Scatter Plot for Penalties per 60 Rates

Scatter Plot for Penalties Raw Count

League Overview

Raw Count Plot

Per 60 Plot

Discussion

Analyzing Player’s Performance

While the top players with the highest powerplay-drawn count match the players with the most penalty-drawn, the raw count alone does not accurately reflect a player’s ability to draw power plays or penalties since players who spend more ice time usually have more opportunities to draw penalties and subsequently generate power plays. To account for this, we calculate the power play and penalty draw per 60 rates. This standardized approach balances the effect of players’ time on ice and provides a more accurate understanding of their ability to draw power plays and penalties.

Upon analyzing the per 60 rates, we find that none of the top 5 players based on power play drawn raw count rank in the top 5 for power play drawn per 60. This emphasizes the importance of considering the per-60 rate when evaluating a player’s performance in this aspect.

For example, Mark Friedman from the Pittsburgh Penguins played only 23 games in the 2022-2023 regular season, with a powerplay-drawn raw count of 10, which places him at rank 222 based on powerplay-drawn raw count. However, when considering his lower time on ice, Friedman emerges as a top 3 player in drawing power plays per 60. Therefore, analyzing the per 60 rates provides a more comprehensive and fair evaluation of players’ contributions to power plays.

Positional Influence on Penalties and Power Play Opportunities

From the scatter plots, two distinct patterns emerge, providing valuable insights into players’ behavior regarding penalties. Firstly, there is a clear positive correlation between the penalty draw count/per 60 and penalty take count/per 60. Players who draw more penalties are also more likely to take penalties. This correlation intuitively aligns with our expectations, serving as a self-check to validate the credibility of our findings.

The more significant observation lies in the relationship between players’ positions and their involvement in penalties. When considering players who take the same number of penalties, forwards draw a higher number of penalties compared to defense players. This pattern is evident in both the per 60 rate plot and the raw count plot. Forwards are more active in generating penalty opportunities due to their roles on ice. Furthermore, when examining the top players who excel in drawing power plays (both in per 60 and raw count), we find that they are predominantly forwards. This finding reinforces the idea that forwards significantly impact their team’s power play opportunities. Their ability to draw penalties and create power play situations can be a game-changer, highlighting the importance of offensive contributions from forwards.

League Overview

The League Overview (raw count) scatterplot shows a clear positive correlation between the penalty-taken and drawn counts. This phenomenon can be attributed to the NHL league’s game management approach, wherein the referees strive to balance the penalty numbers for both teams within a game. Therefore, analyzing players’ capability in drawing actual power plays emerges as a valuable source of information for teams seeking a competitive edge.

Moreover, the game management strategy opens up the possibility of predicting the timing of upcoming penalties and the players likely to be called upon, which could be a future route of this research project.

Limitations

  • To ensure data relevance in the playoffs, we adopt a 2-game played threshold for filtering players, considering the fewer games played during this period than the regular season (which has a 10-game threshold). However, this smaller threshold might introduce biased per-60 rates for players with limited ice time. Moreover, both the regular season and playoff thresholds could unintentionally exclude players with exceptional penalty-drawing or powerplay skills but limited games played.

  • The penalty-drawn and powerplay-drawn data are obtained through string analysis of the description column, which unfortunately contains inconsistent documentation, in the hockeyR dataset. As a result, the dataset may contain errors related to penalty count and types.

Future Steps

  • Our current focus lies in modeling the time distribution of each type of penalty called during the period, alongside the time distribution of goals occurring during power plays. By leveraging this data, we aim to predict potential increases or losses in power play opportunities that could significantly impact game success if teams are allowed to complete their penalty time after the end of a period or game.

  • We are currently developing a penalty types breakdown for individual players’ powerplay-drawn, building upon our existing bar charts. In essence, we aim to identify the top penalty types contributing to each player’s powerplay-drawn. Our prototype for this analysis takes shape as follows:

  • Another crucial aspect of our work involves modeling the patterns of types of penalties called for each season. Our objective is to predict the penalty that could be the most frequently called in the upcoming season.

  • We are committed to regularly updating and enhancing the ShinyApp, optimizing the user experience, and ensuring its relevance and accuracy.

  • We will modify the data pipeline and resolve existing data-related issues in the current dataset.

Acknowledgement

We express our deepest gratitude to our project advisors, Katerina Wu, Caleb Peña, and Jacob Pavlovich from the Pittsburgh Penguins, for generously dedicating their time to meet with us and providing invaluable guidance throughout the project.

We sincerely thank our instructors, Meg Ellingwood, and Shamindra Shrotriya, for their support and encouragement. Special thanks go to our TAs Quang Nguyen and Nick Kissel for their help with our inquiries and Dr. Ron Yurko for overseeing this program.

References

Shiny App Inspiration

The Masters

Appendix

League Overview Slope Analysis

The slope here demonstrated the relationships between Penalty Taken Per 60 and Penalty Drawn Per 60, grouping by teams. In other words, for each team in the Regular Season 2022-2023, we fit a linear regression model expressed as \(\hat{\text{Penalty Drawn per 60}} = \beta_0 + \beta_1\text{Penalty Taken per 60}\)

Teams with Highest Slope for Penalties Taken per 60 in Regular Season 2022-2023
Team Slope
PIT 1.0538653
BOS 1.0030632
WSH 0.8891008

In the League Overview per 60 Scatterplot, we can observe variations in how different teams perform in drawing penalties. The Pittsburgh Penguins, Boston Bruins, and Washington Capitals exhibit a particularly noteworthy behavior, with the highest slope coefficient for penalties taken per 60 (the penalties drawn per 60 as the response). Taking the Pittsburgh Penguins as an example, if the average number of penalties taken per 60 by the Penguins increases by 1, the estimated number of penalties drawn per 60 will increase by 1.054 based on the data in the regular season 2022-2023. Since the slope of the Penguins is higher than other teams, given the same amount of penalty-taken per 60, the Penguins are estimated to have a higher penalty-drawn per 60, thus having more opportunities in drawing a power play.

Contact Information

Katherine (Shaojun) Gong, Mount Holyoke College,

Bethany Gozalez, University of Indianapolis,