In ice hockey, when a player commits a penalty they are removed from the ice and their team must continue the game with one less player on the ice than the opposition for the duration of the penalty (typically 2 minutes). This period of time is known as a power play for the offense (team with more players), and as a penalty kill for the defense (team with less players). During power plays, there is a greatly increased scoring rate for the offensive team as a result of their player advantage, thus these periods are considered very important to a hockey team’s success from both the offensive and defensive perspective. With this in mind, we set out to study what leads to effective power play offense, and defense.
Since the numerical disadvantage that a team on the penalty kill faces rules out a man-marking defensive approach, teams typically assume a compact defensive formation in an attempt to prevent dangerous passes and shots near their net. Since a penalty kill is all about stalling the offense until your extra player is allowed back on the ice, any defensive action that stalls the offense is viewed as a positive play. This forces the offense to keep a balance between maintaining possession through low risk passes and trying to create openings for quality shots through higher risk passes. In this report we explore and analyze passes during the powerplay, and how defensive actions impact their success rate.
The Big Data Cup is an annual hockey data competition hosted by Stathletes in which they provide data for contestants to create projects answering pressing questions in hockey. The 2022 and 2024 editions of the Big Data Cup contained play-by-play data for 29 women’s hockey games, including data from the 2018 Winter Olympic in PyeongChang, 2022 Winter Olympics in Beijing and the 2019 and 2024 Rivalry Series between the USA and Canada. The 2022 Big Data Cup also contained player-tracking data from the 2022 Winter Olympics in Beijing collected through computer vision using the main TV broadcast angle.
The play-by-play data for each game covers every event that took place during the action. The event types included in the dataset are: Shot, Pass, Puck Recovery, Zone Entry, Faceoff Win, Penalty Taken, Takeaway, and Dump In/Out. For each event the data also contains information such as which teams were playing, the time the event occurred during the game, how many players were on the ice, which player/players were involved in the event and their location on the ice, the event type, and whether the event was successful.
The player-tracking data covers all power plays during 6 of the 2018 Winter Olympic women’s hockey games. Since the tracking data was generated by computer vision using the main TV broadcast angle, if a player is outside of the camera’s view their location is not tracked at that moment. We quickly noticed that the data set contained many flaws and inaccuracies including players being incorrectly identified and players being at incorrect coordinates (including coordinates out of bounds). Fortunately, we found that Alon Harell, Robyn Ritchie, and Phil Shreaves’ cleaned and corrected the data in their 2022 Big Data Cup submission. Our work uses their cleaned versions of the player-tracking data.1
To better understand the dynamics of power plays and penalty kills, we conducted exploratory data analysis using the play-by-play data from the 2022 Beijing Winter Olympics. The plot visualizes direct (tape-to-tape) passes made during power play situations. In the plot, the right side of the rink is the offensive zone. Each arrow represents a pass, with the arrow tracing from the origin of the pass to its intended target. The color of the arrows shows which teams are making the passes.
The plot provides insights into how teams use their numerical advantage on the powerplay to set up offensive strategies. We observed that the majority of passes came from the offensive zone, indicating that the numerical advantage allows teams to retain possession of the puck in the offensive zone and from there pass to create openings and scoring chances. The plot also reveals variations in passing patterns across teams, with some teams attempting to set up short passes near the net and others relying on longer passes along the outside of the zone to create scoring chances.
We also looked into if certain zones around the rink have higher average pass completion probabilities than others. The plot below is a heat map that displays the true pass success percentage of different locations around the ice. In the plot the left hand end of the rink represents the defensive zone and the right hand end of the rink represents the offensive zone.
The plot shows that the largest concentration of low percentage pass locations are located around the net on the offensive side of the rink. This makes sense as there is likely to be a higher concentration of defenders around the passer in those locations than in other locations spread around the rink.
We also wanted to view the most common event sequences leading to positive outcomes in the data. Sequences of events were defined as those starting with “Faceoff Win,” “Takeaway,” or “Puck Recovery” events and ending with “Incomplete Play,” “Shot,” “Goal,” “Dump In/Out,” or “Penalty” events. We found the most frequent sequences are:
As our EDA shows, passing tendencies vary across teams. However, we wanted to see if there were any pass types that were prevalent across teams and games. From our EDA we noticed that the start and end locations of passes seemed to be somewhat dependent. Because of this, we decided that using a clustering algorithm would be an effective way to categorize common pass types.
To identify common pass types, we used the play-by-play data from all 29 games and used the start and end locations of each pass. From here our goal was to capture the general patterns across different teams and tournaments to identify categories of frequently attempted passes.
Before starting the categorization process we first normalized our data and ensured that all the offensive and defensive zones were consistently defined in the same orientation. Additionally, we made sure to account for symmetry across the wings so that similar passes made near the left or right boards would be placed in the same cluster. For the categorization of passes, we used the K-Means clustering algorithm and clustered based on 4 variables: x-coordinate of the passer, y-coordinate of the passer, x-coordinate of the intended receiver, and y-coordinate of the intended receiver.
The plot below illustrates passes made directly following a puck recovery. Each pass is represented by an arrow, with colors indicating the type of cluster of each pass.
Through clustering, we identified 10 distinct pass clusters as displayed on the plot above:
Since we found that the majority of power play passes occur in the offensive zone during our exploratory data analysis, the most relevant types of passes for our project are those located in the offensive zone.
D to D passes are called that because they are made from one defenseman to another across the top of the offensive zone. These passes are typically less obstructed by defenders and are used as a low risk way to switch the side of the ice that the attack is coming from and to induce movement from the defense. D to F passes are passes from defensemen to forwards in the offensive zone. These passes bring the puck closer to the net and increase offensive threat, but are also riskier passes as the defenders are closer to the puck. Possession passes, like D to D passes are meant to maintain possession while switching the side of the ice that the attack is coming from, but these are usually from behind the opponent’s net and are typically more heavily pressured and obstructed since they frequently result from a deflected pass or shot. The Cross-Ice pass is a pass typically from a forward to another forward on the other side of the ice. The Cross-Ice pass is high risk since it requires sending the puck directly through the defense, but is also very high reward because, if the pass gets through, the defense will be out of position to stop an ensuing shot attempt.
To analyze the impact of different factors on the success of passes during power plays we implemented a binary classification model to predict the pass completion probability. The primary objective was to understand how factors such as pass distance, the number of defenders near the puck, and pass type influence the likelihood of pass completion. An XGBoost classifier was chosen for its ability to handle complex interactions between features and provide high accuracy. The model’s performance was evaluated using cross-validation to ensure reliability in its predictions.
The data preparation involved appending play-by-play data with team roster information and then merging with tracking data from overlapping time frames.
To handle the imbalance of the success and failure rates of passes in the data, we applied resampling techniques. We utilized both oversampling of the minority class and undersampling of the majority class to address the class imbalance and create a more balanced dataset when training the model.
Another key step before we began modeling was to create predictor variables from the data. Some of the factors that stood out to us as being potentially impactful on the probability of pass completion included: how obstructed the passing lane (the path the puck travels between the passer and the intended target) was, if the passer was being pressured by a defender, if the intended target of the pass was being covered by a defender, and the distance of the pass.
Our predictor variables were defined as: the distance from the closest defender to the passing lane (calculated as the Euclidean distance from the defender and point closest to them on the passing lane), the number of defenders within 5.5 feet of the passing lane, whether or not a defender was within 5.5 feet of the passer at the time of the pass, whether or not a defender was within 5.5 feet of the intended target at the time of the pass, and the distance between the passer and the intended target.
We are interested in predicting the probability of pass success based on the following variables:
To understand the model’s behavior and the relative importance of each pass type in the context of the predictions we plotted the SHAP Dependence plot. This plot allows us to visualize the influence of different pass types as well as the distance from the closest defender to the passing lane on an XGBoost model’s predictions.
We observed that Cross Ice passes have the most significant positive impact on the model’s prediction. Conversely, In Front of the Net, Breakout, and Offensive Regroup passes show lower SHAP values, indicating a smaller or negative impact.
Additionally, the color gradient reveals that shorter distances are concentrated in the middle SHAP value range, which means shorter passes have a more consistently moderate impact, while longer passes are distributed across high and low SHAP values showing they have a varied influence on the model’s prediction, either significantly increasing or decreasing the predicted outcome. To evaluate the impact of these variables on pass success, we computed SHAP (SHapley Additive exPlanations) values. SHAP values give information on how each variable contributes to the model’s predictions.
Based on the above waterfall plot for a single prediction, the most significant contributors to the model, in order of importance, are:
All these variables have a positive impact on the probability of a successful pass. We proceeded to use these key variables for our XGBoost model.
To evaluate the performance of our XGBoost model, we employed a 5-fold cross-validation approach. We divided the dataset into 5 equally sized subsets, with four subsets serving as training data and the remaining subset serving as a test dataset. We then rotated the datasets so that each of the 5 datasets served as the testing dataset once. This method allows us to prevent overfitting by using different subsets of the data for training and testing. The cross-validation results show that the model achieved a stabilized test log loss of 0.2773 after the 20th iteration. The relatively low log loss ensures that the model is stable and generalizes well across different subsets of the data.
Using the XGBoost model we were able to derive the following three metrics:
Using these metrics we were able to measure and rank individual and team power play/penalty kill performance. Below are team and individual rankings for Average Pass-Completion Probability, and team and individual Defensive Contributions Above Average. Additionally, we created animations of power play goals overlaid with completion probability labels for each pass.
Canada scores on the USA to take a 1-0 lead in the first period of a 4-2 group stage win on February 8, 2022.
Switzerland scores on Canada to make the score 1-5 in the first period of a 10-3 Semifinal loss on February 14, 2022.
Finland scores on Switzerland to extend their lead to 3-0 in the third period of a 4-0 Bronze medal game win on February 16, 2022.
Finland scores on Switzerland to extend their lead to 4-0 in the third period of a 4-0 Bronze medal game win on February 16, 2022.
The team-level Defensive Contribution Above Average graph shows Canada as the clear best defensive team on power plays during the 2022 Winter Olympics in Beijing in which they won the gold medal. The offensive metric of pass-completion probability on the other hand, though useful, should not be interpreted as a ranking for best passing players and teams, but rather as a measure of a combination of their aggressiveness and decision-making. Although it is important to maintain possession of the puck by making high-percentage passes, the goal animations show that, oftentimes, the passes that lead to goals have lower probability of success. In the case of our team rankings, we can see that the USA’s selection of passes was most likely very conservative whereas Finland repeatedly opted to attempt more high-risk passes.
Despite our work yielding general trends and insights into team and individual tendencies in power play passes and defensive contributions on the penalty kill, there are still limitations to our work. The biggest limitation is the limited sample size of player tracking data. Despite having a large amount of play-by-play data, the Big Data Cup repositories contained very limited player-tracking data, and since the predictor variables used in our model were derived from the player-tracking data, our model could only be built on a fraction of the total passes. As player tracking data becomes more accurate and publicly available it is our hope that we, and others, are able to improve and expand on this work.
Beyond fine-tuning our existing pass-completion probability model, future work may be able to take into account other potential pass-completion factors to add nuance. A first step in that may be calculating individual defensive contributions for all defenders on the ice rather than the nearest defender to the passing lane, or taking into account defensive player momentum and speed at the time of the pass.
Other possible directions for future work would be to weigh defensive contributions by the value/threat of a pass or by accounting for other passing lanes being defended. This value/threat of a pass may be calculated by using the expected goals (xG) of a shot taken from the position of the pass’s intended target or the expected possession added value (xPAV) of a pass completion. This leads to our second suggestion of accounting for other passing lanes defended. A defender’s contribution may seem small if they are far away from the selected pass lane, but that may be because they are defending a different, more dangerous pass. Thus we may more accurately credit defenders for their off-puck contributions and lead to better player evaluation.
We are extremely grateful to our project advisor, Sam Ventura, from the Buffalo Sabres, for all the time, advice and guidance he gave us throughout the duration of this project.
We also want to thank our program director Ron Yurko, program instructor Quang Nguyen, and program TA’s Daven Lagu, Yuchen Chen and JungHo Lee for sharing with us their experience and knowledge in the realm of statistical research.
To assess the model’s ability to distinguish between successful and unsuccessful passes, we split the data into training and test sets using a 70:30 ratio. We plotted the Receiver Operating Characteristic (ROC) curve and calculated its Area Under the Curve (AUC) value. The AUC value of 0.9561 indicates our Pass Completion Probability has strong performance.
Furthermore, the confusion matrix below evaluates the accuracy, sensitivity, and other metrics of the model. The model has a high accuracy of 89.23%, indicating that the model performs well as a classifier, with a good balance in identifying true positives and true negatives.
Frithjof Sanger, Carnegie Mellon University, fsanger@andrew.cmu.edu
Ian A. Peréz, University of Arizona, ianaperez@arizona.edu
Christina Vu, Texas Christian University, ngoc2003@gmail.com