Introduction

For our project, we decided to create several visualizations on the NBA Player Performance Stats dataset. This dataset can be found here: https://www.kaggle.com/datasets/iabdulw/nba-player-performance-stats

This dataset consists of 649 NBA rows of performance stats for players and teams in the NBA. Each row corresponds to a player but there can be multiple rows with the same player. This would occur when the same player was on different teams during different seasons. For each of the players, the dataset includes the following 29 variables:

Player: string - name of the player, Pos (Position): string - position played by the player, Age: integer - age of the player as of February 1, 2023, Tm (Team): string - team the player belongs to, G (Games Played): integer - number of games played by the player, GS (Games Started): integer - number of games started by the player, MP (Minutes Played): integer - total minutes played by the player, FG (Field Goals): integer - number of field goals made by the player, FGA (Field Goal Attempts): integer - number of field goal attempts by the player, FG% (Field Goal Percentage): float - percentage of field goals made by the player, 3P (3-Point Field Goals): integer - number of 3-point field goals made by the player, 3PA (3-Point Field Goal Attempts): integer - number of 3-point field goal attempts by the player, 3P% (3-Point Field Goal Percentage): float - percentage of 3-point field goals made by the player, 2P (2-Point Field Goals): integer - number of 2-point field goals made by the player, 2PA (2-point Field Goal Attempts): integer - number of 2-point field goal attempts by the player, 2P% (2-Point Field Goal Percentage): float - percentage of 2-point field goals made by the player, eFG% (Effective Field Goal Percentage): float - effective field goal percentage of the player, FT (Free Throws): integer - number of free throws made by the player, FTA (Free Throw Attempts): integer - number of free throw attempts by the player, FT% (Free Throw Percentage): float - percentage of free throws made by the player, ORB (Offensive Rebounds): integer - number of offensive rebounds by the player, DRB (Defensive Rebounds): integer - number of defensive rebounds by the player, TRB (Total Rebounds): integer - total rebounds by the player, AST (Assists): integer - number of assists made by the player, STL (Steals): integer - number of steals made by the player, BLK (Blocks): integer - number of blocks made by the player, TOV (Turnovers): integer - number of turnovers made by the player, PF (Personal Fouls): integer - number of personal fouls made by the player, PTS (Points): integer - total points scored by the player

Research Questions

Given the introductory information on our dataset, we would like to utilize some visualizations to answer the following questions. The first question we would like to explore is (1) what patterns of playing style can be identified amongst NBA players from the NBA Player Performance Stats dataset? The second question we are interested in exploring is (2) how are 2 point field goals associated with NBA players’ playing experience (number of games played, minutes played)? The last question we would like to explore is (3) how is the age of an NBA player associated with the 3 point field goals that they score?

Exploration and Visualization

Question 1

We wanted to learn about what patterns of playing style can be identified amongst NBA players from the NBA Player Performance Stats dataset, which suggests that we should examine a Principal Component Analysis on the 26 quantitative variables in this dataset (all variables other than Pos, Tm and division). We know that the principal components are linear combinations of these 26 variables. In order to identify which principal components would provide us with the most meaningful information, we plotted a scree plot.

Looking at the above scree plot, we can see that the elbow appears to be around the 4th principal component. However, we chose to further investigate the first three principal components for the sake of ease and since they appear to have the largest variance differences between them. By this, we start by identifying the top five most relevant variables making up these three principal components.

Principal Component 1:

##        FG       PTS       FGA      X2PA        MP 
## 0.2747644 0.2734788 0.2667576 0.2624751 0.2620917

Principal Component 2:

##       FG.       ORB      X3PA      X2P.       X3P 
## 0.4049951 0.3753415 0.3194117 0.3190018 0.3066483

Principal Component 3:

##      eFG.      X3P.       FG.       X3P      X2P. 
## 0.5474232 0.5365536 0.3254993 0.2673446 0.2353949

From the above display, we can see that Principal Component 1 prioritizes Field Goals, Points, FieldGoal Attempts, 2 pointer Attempts, and Minutes Played. Principal Component 2 prioritizes Field Goal make percentage, Offensive Rebounds, 3 pointer attempts, 2 pointer make percentage, and 3 pointers made. Principal Component 3 prioritizes effective field goal percentage, 3 pointers percentage made, field goal percentage made, 3 pointers made, effective 2 point percentage. From this, we can see that it appears that each principal component corresponds to a different playing style. We can further assess these patterns by plotting the different principal components against each other.

The above graph plots principal components 1, 2, and 3 against each other. We can see from the graph between PC 1 and PC2 that the cluster finds itself around 0, but then the tail end when PC1 increases also increases PC2. This leads us to believe that while there is an even spread among PC1 and PC2, their play styles complement each other if there is more of those components. Contrastingly to the PC1 and PC2 players, when looking at the relationship between PC1 and PC3 players, we can see that as we have a higher PC1 player, the PC3 productivity goes down, suggesting that these play styles might not complement one another. Interestingly enough however, PC2 and PC3 players have no direct influence on one another, they both maintain a stead linear relationship through the PCA plot, showing no complimentary benefits or detriments.

We can see from the above analysis that is appears that the top three principal components correspond to different playing styles with Principal Component 1 prioritizing Field Goals, Points, FieldGoal Attempts, 2 pointer Attempts, and Minutes Played, Principal Component 2 prioritizing Field Goal make percentage, Offensive Rebounds, 3 pointer attempts, 2 pointer make percentage, and 3 pointers made and Principal Component 3 prioritizing effective field goal percentage, 3 pointers percentage made, field goal percentage made, 3 pointers made, effective 2 point percentage. We can see from the different PCA plots that the playing styles in principal components 1 and 2 appear to complement each other while those in components 1 and 3 might not. Furthermore, it appears that principal components 2 and 3 provide no beneficial nor detrimental affects on each other.

Question 2

We wanted to learn about how are 2 point field goals associated with NBA players’ playing experience (number of games played, minutes played). In order to ensure that there no differences that need to be accounted for, we start off by plotting the number of points scored for the different positions between the two different conferences (East and West). This would allow us to compare distributions and identify any strong differences that we might need to account for.

From the above graph, we can see that comparing the East and West Coast players, it seem that there are overall more consistent Shooting Guards and Power Forwards since there appear to be more of them scoring points in the SG and PF graphs for the West Coast. Additionally, there seems to be about even performance of PGs and Cs, with a slight edge from the West Coast. Finally, Small Forwards are experiencing more success in the West Coast than the East Coast as well, but only by a small margin. As a result, there are no stark differences that need to be accounted for when analyzing the points (2 point field goals and 3 point field goals which are analyzed in question 3).

When looking at player experience, there are two variables of interest to consider: minutes played and number of games played. We will start off by looking at the minutes played and how that might be associated with the number of 2 point field goals. We can start off by looking at a basic scatter plot looking at the number of the 2 point field goals per game versus minutes played.

We can see from the above graph that players who play more minutes per game tend to score more points on average. To explore this more, we chose to make a heat map of the 2 point field goals per game by position and minutes played. We felt that separating by position might provide us more insight on the question at hand.

From the above graph, we can see that (1) players who play more minutes per game tend to score more points on average, regardless of their position and (2) point guards tend to score the greatest points per game on average, whereas centers tend to score the least points per game, on average. This is shown by the fact that the darker blue cells are mostly located in the PG row and the lighter cells are mostly located in the C row. We can also see that (3) some positions have greater variance in points per game than others. For example, the SF and PF rows have a wider range of colors, indicating a wider range of points per game for players in these positions compared to other positions. From this graph as well, we are able to see that irrespective of the position being considered, greater player experience appears to be associated with more 2 point field goals scored.

We can now consider the second element of player experience, the number of games played. In order to see how this variable might be associated with the number of 2 point field goals scored, we decided to create a contour plot looking at these two variables.

In the above graph, we compare the number of games played with the number of 2 point field goals. From the contour plot, we observe that there are 2 modes - there is 1 concentration of data points around 1 game played with approximately 2 attempted field goals. These individuals are likely bench players as they’re not on the court and are less likely to be receiving the ball when they are. We observe another concentration of data points around 55 games played with players there shooting much more frequently compared to players who play less games. Overall, we observe that players who are in more games tend to shoot more often, likely indicative of their success and confidence in the game.

Question 3

We wanted to learn about how the age of an NBA player is associated with the 3 point field goals that they score. We saw in our analysis for question 2 that there are no differences that need to be accounted for amongst points scored between conference and positions considered. However, we decided that it would be helpful to look at the distribution of the age of players between different regions to see if there are any differences that need to be accounted for with respect to this variable.

From the above graph, we can see that on average, the age distributions for all players across different divisions are skewed to the right with a strong preference for younger players. Most distributions are unimodal, however there are some bimodal distributions. Based on the different facets, we observe that each of the different regional divisions have different distributions pertaining to age and the associated position. Most notably we observe that there is a general plateauing once players hit 35 years although there are a few players in their 40s. Overall, it appears that all the regions have similar age distributions ensuring that there are no differences that need to be accounted for prior to our analysis.

To explore our question, we decided to first look at a plot of 3 point field goals against age. For this plot, we facetted by team but to ensure a clear graph without 31 different plots, we chose to look at the team that scored the most 3 pointers per region. This would allow a cross region comparison but also allow us to hold the teams to a common standard rather than picking a random team from each region.

We can see from this graph that it appears that for all the teams being observed, either the number of 3 point field goals increases with age or stays the same. We can see that only for MIA and MIL the change appears to be nonexistent or very small. For the remaining 4 teams, the relationship appears to be pretty strong and linear. Exploring this further, we chose to make a heat map exploring the overall relationship between Age and 3 point field goals scored.

From the heat map above, it appears that there is a similar strong density for players that score more 3 point field goals across all ages. It also appears that the density is less for players that score fewer 3 point field goals between the ages of 20 and 25. It appears that the density of players scoring any number of 3 pointers is similar around the age of 40. Therefore, we can see that there might not be a very strong relationship between number of 3 pointers and age as we might have initially thought.

To explore this further, we decided to carry out a regression analysis between these two variables (age and number of 3 point field goals scored).

## 
## Call:
## lm(formula = X3P ~ Age, data = nba)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.6047 -0.6977 -0.1551  0.4908  3.6027 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.009295   0.212882  -0.044    0.965    
## Age          0.038429   0.008061   4.767 2.33e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8731 on 622 degrees of freedom
##   (25 observations deleted due to missingness)
## Multiple R-squared:  0.03525,    Adjusted R-squared:  0.0337 
## F-statistic: 22.73 on 1 and 622 DF,  p-value: 2.326e-06

Upon looking at the results of the linear regression model, we can see that there is not a very strong linear relationship between age and number of 3 point field goals scored. From the results above, we can see that the multiple R-squared value is 0.03535 which means that only 0.03535 of the variance in number of 3 point field goals scored is explained by age in this regression model. As a result, there is not a strong linear relationship between age and number of three point field goals scored.

Conclusion

From our various visualizations, we can see that there are many possible relationships between the different variables and information in this dataset. We can understand that there can be multiple playing styles that could complement each other, not complement each other or have no impact on each other. We are also able to see that the player performance in terms of 2 point field goals scored appears to be positively associated with player experience (minutes played and number of games played). We are also able to see that there does not appear to be a linear relationship between age and number of 3 point field goals scored.

Along with our three research questions, there are many additional questions that could be further explored through this data set. One such area could be how position affects the number or percent of field goals scored. Furthermore, it also might be worth examining if there is a different kind of relationship between age and number of three point field goals scored (non linear). Another topic of interest could be to look at the relationship between the number of assists and steals made by a player and how that might relate to their performance. This dataset has a total of 29 variables and so, there are endless number of additional questions and topics that could be further examined.