Motivation

Understanding the factors that influence an NFL team’s win/loss percentage is critical for fans, a team’s decision making, and intrigued 36-315 students studying this dataset. When analyzing these variable factors, we can uncover patterns and correlations that contribute to a team’s success or a team’s downfall. This report aims to bridge specific gaps between raw data statistics from NFL seasons and their variable’s impact on outcomes, which helps us understand specific dynamics of winning professional football games.

Dataset

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

nfl_data <- read.csv("nfl_dataset.csv")

This dataset contains comprehensive statistics for NFL teams spanning the 2003-2023 seasons. On a broad perspective, the dataset gives us each team’s record and performance within specific seasons. More specifically, there is data on 672 teams across 21 seasons. The data file , “team_stats_2003_2023.csv” contains 35 column variables with specific parameter observations. The data includes the season’s year with just the year the team played as the observations. The team name which gives the team’s name throughout each season as the observation. The number of wins each team had in the season, the number of losses and ties the teams endured in a season, the win loss percentage of each team, which is measured by the proportion of wins to losses. The number of total points a team scored in a season, the number of points the team has allowed in a season (the number of total points other teams have scored against them). The dataset gives us the point differential which just takes the number of points scored minus the points allowed (positive is good, negative is bad). We have the margin of victory, the number of games played, the total offensive yards gained in a season, the number of offensive plays run in a season, and the average yards per play offensively in a season. The dataset also included the number of team turnovers lost and fumbles lost. The dataset also gives us more specific observations such as first downs gained in a season, passes completed, pass attempts, passing yards, passing touchdowns, interceptions thrown, average number of yards gained per reception, and passing first downs gained. The graph also gives us rushing yards in a season, rushing touchdowns in a season, rushing yards per attempt, and rushing first downs. The dataset also gives us observations on penalties such as penalties committed, total number of penalty yards committed, and the number of first downs allowed by penalties. Finally, the dataset gives us the percentage of drives ending in a score, the percentage of drives ending in a turnover, and the number of expected points contributed by offense.

Research Question 1

How do offensive efficiency metrics (e.g., yards per play, turnovers, scoring percentage) relate to team success (e.g., win-loss percentage) in the NFL?

This question is well-motivated because it ties into strategic decisions in football, such as prioritizing offense, managing turnovers, and maximizing scoring opportunities. Insights here could guide team management and coaching strategies.

Graph 1: PCA Biplot of Offensive Metrics and Team Performance

This PCA biplot visualizes the relationship between offensive metrics (e.g., yards per play, turnovers, scoring percentage, and win-loss percentage) and NFL team performance. The biplot displays the first two principal components (PC1 and PC2), which explain the greatest variance in the dataset. Each point represents an NFL team, with ellipses clustering teams based on their offensive profiles.

library(ggplot2)
library(FactoMineR)
library(factoextra)

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

offensive_metrics <- nfl_data[, c("win_loss_perc", "yds_per_play_offense", "turnovers", "score_pct")]

offensive_metrics <- offensive_metrics %>%
  na.omit() %>%
  mutate(across(everything(), as.numeric))

pca_result <- PCA(offensive_metrics, graph = FALSE)

fviz_pca_biplot(pca_result, 
                label = "var",
                habillage = as.factor(nfl_data$team),
                addEllipses = TRUE, # Group ellipses
                gradient.cols = c("blue", "green", "red")) +
  labs(title = "PCA Biplot of Offensive Metrics",
       x = "Principal Component 1",
       y = "Principal Component 2") +
  theme_minimal()

## Too few points to calculate an ellipse

## Too few points to calculate an ellipse

The PCA biplot illustrates the relationship between offensive metrics and NFL team performance. Teams with high scoring percentages and yards per play are positioned along the positive direction of PC1, indicating their strong offensive efficiency and correlation with better win-loss records. Conversely, turnovers are negatively aligned with PC1, reflecting their detrimental impact on team success. The ellipses cluster teams with similar offensive characteristics, highlighting shared performance trends among teams with overlapping ellipses. However, clear distinctions between clusters are less pronounced, suggesting that multiple factors beyond the metrics analyzed contribute to team success.

Statistical Analysis

To assess the relationship between offensive metrics (e.g., scoring percentage, yards per play, turnovers) and win-loss percentage, we can perform a multiple linear regression.

regression_model <- lm(win_loss_perc ~ score_pct + yds_per_play_offense + turnovers, data = nfl_data)

summary(regression_model)

## 
## Call:
## lm(formula = win_loss_perc ~ score_pct + yds_per_play_offense + 
##     turnovers, data = nfl_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46290 -0.09987  0.00223  0.09421  0.47394 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           0.0639721  0.0738091   0.867    0.386    
## score_pct             0.0169984  0.0015149  11.221  < 2e-16 ***
## yds_per_play_offense -0.0051152  0.0179955  -0.284    0.776    
## turnovers            -0.0046924  0.0009925  -4.728 2.77e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1385 on 668 degrees of freedom
## Multiple R-squared:  0.4787, Adjusted R-squared:  0.4764 
## F-statistic: 204.5 on 3 and 668 DF,  p-value: < 2.2e-16

The regression analysis investigates the relationship between several offensive metrics and team performance, as measured by win-loss percentage. The model includes three key predictors: scoring percentage (score_pct), yards per offensive play (yds_per_play_offense), and turnovers.

The results show that scoring percentage has a statistically significant positive relationship with win-loss percentage. Specifically, for each 1% increase in scoring percentage, the win-loss percentage increases by approximately 0.017. This suggests that teams that score more efficiently are more likely to win. The p-value for this variable is highly significant (less than 2e-16), meaning it is a strong predictor of team success. On the other hand, yards per play shows no significant relationship with win-loss percentage, as evidenced by a p-value of 0.776, much greater than the typical threshold of 0.05. This suggests that, within this model, yards per play does not have a meaningful impact on a team’s win-loss record when accounting for other factors like scoring and turnovers.

Turnovers, however, do have a significant negative relationship with win-loss percentage. For each additional turnover, win-loss percentage decreases by approximately 0.0047. The p-value for turnovers is 2.77e-06, which is highly significant and indicates that minimizing turnovers is crucial for improving team performance. The overall model fit is decent, with an R-squared value of 0.4787, meaning that approximately 48% of the variance in win-loss percentage can be explained by these three predictors. The F-statistic is also significant (p-value < 2.2e-16), indicating that the model as a whole is statistically meaningful.

In conclusion, scoring percentage and turnovers emerge as the most significant predictors of team success, while yards per play does not appear to contribute meaningfully to explaining win-loss performance. Teams that are more efficient at scoring and commit fewer turnovers are more likely to achieve a higher win-loss percentage.

Graph 2: Correlation Heat Map Between Offensive Metrics and Team Success

The correlation heat map visualizes the relationships between four key variables: win-loss percentage (win_loss_perc), yards per play on offense (yds_per_play_offense), turnovers, and scoring percentage (score_pct). The color gradient indicates the strength and direction of correlations, with red representing positive correlations, blue representing negative correlations, and white indicating weak or no correlation.

library(reshape2)

## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

key_variables <- nfl_data[, c("win_loss_perc", "yds_per_play_offense", "turnovers", "score_pct")]

cor_matrix <- cor(key_variables, use = "complete.obs")

cor_data <- melt(cor_matrix)

ggplot(cor_data, aes(Var1, Var2, fill = value)) +
  geom_tile(color = "white") +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", midpoint = 0) +
  labs(title = "Correlation Heat Map of Key Variables",
       x = "Variables",
       y = "Variables",
       fill = "Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

The heat map reveals several important relationships among the variables analyzed. There is a strong positive correlation between scoring percentage (score_pct) and win-loss percentage (win_loss_perc), indicating that teams with higher scoring efficiency are more likely to achieve better win-loss records. Turnovers, on the other hand, show a moderate negative correlation with win-loss percentage, suggesting that higher turnover rates are associated with poorer team performance. Additionally, yards per play on offense (yds_per_play_offense) demonstrates a moderate positive correlation with win-loss percentage, highlighting that offensive efficiency contributes to team success, albeit to a lesser degree than scoring percentage. Finally, turnovers are negatively correlated with other offensive metrics, such as scoring percentage and yards per play, emphasizing that minimizing turnovers can improve both offensive efficiency and overall team performance.

Graph 3: Time Series Reflecting the Trends in Win-Loss Percentage Over Time

nfl_data <- nfl_data %>%
  mutate(efficiency_group = ifelse(yds_per_play_offense >= median(yds_per_play_offense, na.rm = TRUE), 
                                   "High Efficiency", "Low Efficiency"))

time_series_data <- nfl_data %>%
  group_by(year, efficiency_group) %>%
  summarise(avg_win_loss = mean(win_loss_perc, na.rm = TRUE))

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

ggplot(time_series_data, aes(x = year, y = avg_win_loss, color = efficiency_group, group = efficiency_group)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  labs(title = "Time Series of Win-Loss Percentage by Offensive Efficiency",
       x = "Year",
       y = "Average Win-Loss Percentage",
       color = "Efficiency Group") +
  theme_minimal()

The graph displays the time series of average win-loss percentage grouped by offensive efficiency (classified as “High Efficiency” or “Low Efficiency”) across multiple years. Teams with higher offensive efficiency consistently achieved higher win-loss percentages compared to teams with lower offensive efficiency. The red line, representing high-efficiency teams, shows a steady performance advantage over the blue line, which represents low-efficiency teams. Notably, there are fluctuations over time for both groups, but the gap between the two groups remains evident throughout the years.

This trend highlights the critical role of offensive efficiency in team success, as teams with higher yards per play on offense are more likely to have better overall performance.

The PCA biplot reveals that scoring percentage and yards per play positively align with team success, as represented by win-loss percentage. Conversely, turnovers negatively affect success. However, the clustering of teams shows overlapping ellipses, indicating that offensive metrics alone cannot entirely differentiate between team success levels. The claims made in the analysis are accurate and well-supported by the PCA biplot. Moving on to the next graph, we can see that the observation about turnovers being detrimental to success aligns well with the data and reinforces the research question’s focus on offensive efficiency’s role in team performance. However, the lack of pronounced clustering implies the need to consider additional factors beyond the provided metrics, as acknowledged in the analysis. Looking at the regression analysis, Scoring percentage and turnovers significantly predict win-loss percentage. Specifically: Scoring Percentage: Each 1% increase corresponds to a 0.017 increase in win-loss percentage (highly significant). Turnovers: Each additional turnover leads to a 0.0047 decrease in win-loss percentage (also highly significant). Yards Per Play: Not significantly related to win-loss percentage when other variables are considered. The regression analysis is robust and supports the claims, as the relationships are statistically significant, and the R-squared value (0.4787) indicates a moderate model fit. The findings align closely with the research question, highlighting scoring efficiency and turnover minimization as critical strategies for success. Moving on to the second graph displayed, The heat map shows strong positive correlations between scoring percentage and win-loss percentage and moderate negative correlations between turnovers and win-loss percentage. A moderate positive correlation exists between yards per play and win-loss percentage, although it is less impactful compared to scoring percentage. The overall evaluation here is that the claims here are consistent with the findings in the regression and PCA analyses. The visual representation reinforces the relationships and adds clarity to the role of turnovers and scoring efficiency. The alignment with the research question is strong, as the focus remains on quantifying offensive metrics’ impact on success. To wrap up this research question, with the time series graph, we can conclude that teams with high offensive efficiency consistently perform better in terms of win-loss percentage over time. While performance fluctuates for both high- and low-efficiency teams, the gap between them persists across multiple seasons.This analysis supports the research question by demonstrating the long-term impact of offensive efficiency on team success. The consistent gap between the two groups highlights offensive efficiency’s strategic importance and aligns well with the study’s focus on metrics guiding team management.

Research Question 2

How does an NFL offense’s balance between running and passing the ball correlate with the team’s success (win-loss percentage)?

The question of offensive balance is of paramount importance to every team in the NFL, as coaches must decide between rushing and passing the football on every play. A better understanding of the importance of these play types could promote more optimized offensive game plans, leading cooperating teams to more wins over their competition.

Graph 1: Heatmap of Rushing Play Percentage vs Winning Percentage

nfl_data <- nfl_data %>%
  mutate(rush_perc = 100 * rush_att/(rush_att + pass_att)) %>%
  mutate(win_loss_perc = win_loss_perc * 100)

nfl_data %>%
  ggplot(aes(x = rush_perc, y = win_loss_perc)) +
  stat_density2d(aes(fill = after_stat(density)), geom = "tile", contour = FALSE) +
  geom_point(alpha = 0.4) +
  geom_smooth(method = "lm", se = FALSE, color = "gray20") +
  scale_fill_gradient(low = "white", high = "darkred") +
  labs(title = "Heatmap of Rushing Play Percentage vs Winning Percentage",
       subtitle = "*Only counting rushing and passing plays",
       x = "Percentage of Rushing Attempts in a Season",
       y = "Win %")

## `geom_smooth()` using formula = 'y ~ x'

The preceding heat map highlights the tendency for NFL teams to hover around a 45-55 split between rushing and passing plays, with many teams dipping as low as 40-60. This clump of teams hovers around a 50% win-loss ratio for their respective seasons, which suggests that organizations are finding moderate success by following the established balance between run and pass offense. However, the accompanying line-of-best-fit notes a positive correlation between a team’s percentage of rushing plays and its overall winning percentage. In fact, the model predicts that a team running 55% of offensive plays as rushes to win more than 10% more games than those running the aforementioned 45-55 split. While many factors are at play in a team’s success, this graph suggests that NFL teams would record more success, on average, should they prioritize rushing plays more and passing plays less.

Graph 2: Scatterplots of Rushing Yards and Passing Yards per Play versus Win-Loss Percentage

library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

g1 <- nfl_data %>%
  ggplot(aes(x = rush_yds_per_att, y = win_loss_perc)) +
  geom_smooth(method = "lm", se = FALSE, color = "darkgreen") +
  geom_point(alpha = 0.3) +
  labs(title = "Net Yards per Play, by Play Type",
       x = "Rushing Yards per Attempt",
       y = "Win/Loss %")
g2 <- nfl_data %>%
  ggplot(aes(x = pass_net_yds_per_att, y = win_loss_perc)) +
  geom_smooth(method = "lm", se = FALSE, color = "darkgreen") +
  geom_point(alpha = 0.3) +
  labs(title = "", x = "Passing Yards per Attempt",
       y = "Win/Loss %")
grid.arrange(g1, g2, ncol = 2)

## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

From the results of the heat map alone, it appeared that rushing plays were more beneficial to an NFL team’s success. As such, one would expect that a team’s average yardage per rush would be a strong factor in determining the team’s likelihood of winning in any given season. From the leftmost scatter plot above, we do observe that more rushing yards per play is correlated with greater success, but the slope of the correlation is weak and the pattern between the variables is messy. In fact, as we look to the right, there is a much stronger and more positive correlation between a team’s passing yards per play and its win-loss ratio. Without extrapolating the data, the passing model predicts the most efficiently-passing teams to garner more than a 50% higher win-loss ratio than the least efficient. This gap is enormous, especially in relation to the mere 10% discrepancy in the rushing model. As such, it’s reasonable to conclude that per-play passing yardage is a major factor in determining the success of NFL teams.

Graph 3: Pairs Plot of Rush Percent, Average Passing Yards, and Win-Loss Percentage

library(GGally)

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

nfl_important_data <- nfl_data %>%
  select(win_loss_perc, rush_perc, pass_net_yds_per_att)
ggpairs(nfl_important_data, columns = 1:3) +
labs(title = "Pairs Plot of Rush Percent, Average Passing Yards, and Win-Loss Percentage")

We’ve observed two notably positive factors in relation to the research question: Rushing play percentage (rush_perc) and yards per passing play (pass_net_yds_per_att). A higher value for each of these factors independently correlated with higher win rates overall, begging the question: do rushing tendencies and efficient passing offenses also correlate with each other? Fundamentally, this claim makes sense, as opposing defenses forced to guard the rush could be more exposed to larger passing plays; however, the data disagrees. The pairs plot above displays a pattern-less scatter plot between the two factors, along with a correlation coefficient of a mere -0.030 (an extremely weak, slightly negative correlation). Teams that rush more often than their peers do not garner more yards per pass, on average. But, in an apparent contradiction, the pairs plot shows that both rushing percent and average passing yardage do in fact correlate significantly with win-loss percentage, with correlation coefficients of 0.287 and 0.617 respectively. By interpreting the pairs plot, we reach the conclusion that, on average, an NFL team can expect greater success if it increases its proportion of running plays or increases its average passing yardage, but not necessarily both. Regarding the pass/rush balance as a whole, it would appear that passing offense is both slightly overused and perhaps undervalued.

Starting off with the first graph of the heatmap of rushing play percentage vs win percentage, teams running a higher percentage of rushing plays (e.g., 55%) tend to achieve better win-loss percentages compared to those with a balanced or pass-heavy approach (e.g., 45-55%). The heatmap and line-of-best-fit suggest that prioritizing rushing plays correlates with a ~10% increase in win percentage. The claim is supported by the heatmap and fits within the research question’s scope. However, it is important to consider that correlation does not imply causation. The conclusion aligns with the research question by addressing how the balance between running and passing impacts success, but it may overlook confounding variables like game situations and team strategy. Now, looking at the second graph which is also a heatmap of fumbles and interceptions vs win loss percentage, we can state that while rushing play percentage shows a weak correlation with win-loss percentage, passing efficiency (yards per pass play) has a much stronger relationship. Teams with more efficient passing offenses have win-loss ratios exceeding 50% higher than less efficient teams, indicating that passing efficiency is a critical factor in success. Overall, the analysis correctly highlights the stronger relationship between passing efficiency and success compared to rushing efficiency. The claim about the importance of passing aligns well with the data, and it challenges the prior emphasis on rushing by introducing a more statistically significant factor. This provides a nuanced perspective within the research question framework. Finally, looking at the last graph of the pairs plots, we can conclude that Both rushing play percentage (correlation coefficient: 0.287) and average passing yardage (correlation coefficient: 0.617) positively correlate with win-loss percentage, suggesting either strategy can independently contribute to success.However, rushing frequency and passing yardage have almost no correlation (-0.030), indicating teams do not simultaneously excel at both.The data suggests that efficient passing is undervalued relative to its contribution to success, while rushing remains a modest but viable strategy. This analysis is well-supported as the lack of correlation between rushing frequency and passing yardage highlights the independence of these strategies. The emphasis on efficient passing as the stronger predictor is valid and aligns with the earlier findings in Graph 2. This interpretation answers the research question comprehensively, showing that balance alone is less important than maximizing efficiency in one of these domains.

Research Question 3

How do penalties, interceptions, and fumbles have an impact on a team’s win-loss percentage in the NFL?

This question is compelling because it incorporates multiple aspects of the game including penalties, interceptions, and fumbles, that have a direct affect on a team’s performance. These elements are all controllable, and even one mistake can have a major impact on the outcome of a game. These errors can either provide a setback for the team’s offense or give the defense a chance to capitalize. Fans, coaches, and analysts can use the results to understand how crucial these mistakes are, forcing teams to focus on limiting them in future games to improve their chances of winning games. Additionally, through the data, we can analyze whether penalties, fumbles, or interceptions have the most significant impact on the team’s success.

Graph 1: Scatterplot of Penalties per Game versus Win-Loss Percentage

ggplot(nfl_data, aes(x = penalties/g, y = win_loss_perc)) +
  geom_point() +
  geom_smooth(method = "loess", se = FALSE) + 
  labs(title = "Scatterplot of Penalties per Game versus Win-Loss Percentage",
       x = "Penalties per Game",
       y = "Win-Loss Percentage")

## `geom_smooth()` using formula = 'y ~ x'

The scatterplot shows that as the number of penalties per game increases, the team’s win-loss percentage generally decreases. However, there is a noticeable dip in the curve at around 6 penalties per game, where the win-loss percentage slightly increases to just above .500. The highest concentration of points falls around 5 and 7 penalties per game, with a few low outliers under 3 penalties and high outliers above 9 penalties. There is a high variability in the data as seen through the wide spread between the data points and the LOESS line. The LOESS curve was chosen for its flexibility and ability to accurately capture the trends in the data. The slight curve of the line indicates a weak negative relationship between penalties per game and win-loss percentage.

Graph 2: Heatmap of Fumbles and Interceptions versus Win-Loss Percentage

ggplot(nfl_data, aes(x = fumbles_lost, y = pass_int, fill = win_loss_perc)) +
  geom_tile() +
  scale_fill_gradient(low = "red", high = "green") +
  labs(title = "Heatmap of Fumbles and Interceptions versus Win-Loss Percentage", 
       x = "Fumbles Lost", y = "Pass Interceptions", fill = "Win-Loss Percentage")

The heatmap generally shows that teams with the fewest combined amount of fumbles and turnovers tend to have the highest win-loss percentage. The highest concentration of data points seems to be around 5 to 15 fumbles lost and 10 to 20 pass interceptions throughout the course of the season. In this main cluster of points, they are a combination of different win-loss percentages, but for the most part, they are in the middle of the pack (.250 to .750). There are a few high outliers for fumbles (over 20) and interceptions (over 25); for these teams, the win-loss percentage is low (usually less than .250). As expected, as the number of interceptions and fumbles increase, the team’s win-loss percentage decreases.

Graph 3: Pairs Plot of Penalties, Interceptions, Fumbles, and Win-Loss Percentage

data.subset <- nfl_data %>%
  dplyr::select(penalties, pass_int, fumbles_lost, win_loss_perc)

ggpairs(data.subset, columns = 1:4) + 
  labs(title = "Pairs Plot of Penalties, Interceptions, Fumbles, and Win-Loss Percentage")

The pairs plot reveals that while the distributions of win-loss percentage and penalties are reasonably normal, both pass interceptions and fumbles lost are right-skewed. This suggests that a few NFL teams have significantly higher numbers of fumbles and interceptions compared to the majority of the league. There is a slight negative correlation between the number of penalties and a team’s win-loss percentage, indicating that teams that commit more penalties and make more mistakes tend to lose more games. A stronger negative correlation is observed for pass interceptions and fumbles lost with win-loss percentage, showing the impact that turnovers have on the team’s overall success. Also, it is interesting to see a positive correlation between pass interceptions and fumbles lost; this could indicate that turnovers on offense may have a cascading effect on the team’s overall performance. A team’s success is reliant on limiting penalties, interceptions, and fumbles throughout both individual games and the entire season.

Looking at the first graph of the scatterplot, we can conclude that the scatterplot indicates a weak negative relationship between penalties per game and win-loss percentage. Teams committing fewer penalties tend to have slightly better win-loss records, though there is considerable variability in the data. Interestingly, there is a minor positive dip near six penalties per game, where win-loss percentages slightly rise above 0.500. The claim about the weak negative relationship is supported by the scatterplot and the LOESS curve. While the relationship is not strong, the analysis effectively highlights the variability in the data and acknowledges outliers. This aligns with the research question by showing how penalties can impact team success, even if the effect is not as pronounced as turnovers. Moving on to the heamap, we can state that this graph shows that teams with fewer combined fumbles and interceptions have higher win-loss percentages. Most teams fall within a moderate range of 5–15 fumbles and 10–20 interceptions and exhibit mixed success. However, extreme outliers (e.g., more than 20 fumbles or 25 interceptions) consistently correlate with very poor win-loss percentages (below 0.250).The analysis accurately reflects the heatmap’s data, showing the detrimental impact of turnovers on team success. The correlation aligns with the research question, emphasizing the need to minimize turnovers for better performance. The claim is valid and well-supported by the data. Finally, looking at the pairs plots, Both interceptions and fumbles lost have a stronger negative correlation with win-loss percentage than penalties, emphasizing turnovers’ critical role in determining success. We can also see that penalties show only a slight negative correlation with win-loss percentage, making them a lesser factor compared to turnovers and finally the positive correlation between interceptions and fumbles suggests that teams prone to turnovers in one category may also struggle with turnovers in the other, potentially compounding their performance issues. Overall the pairs plot provides compelling evidence for the relationships between these variables. The analysis highlights turnovers as more significant than penalties, which is consistent with the other graphs and aligns well with the research question. The observation about the cascading effect of turnovers adds depth to the interpretation.

Conclusion

There are many unanswered questions here that we have yet to explore, such as the impact of other contextual factors (e.g., weather, home-field advantage) on win-loss percentage, or how How the relationships between offensive metrics, turnovers, and penalties evolve in postseason games versus regular season games. We left these questions unanswered due to specific limitations within our data report. We face many data limitations as access to more granular data was limited to us while working within the dataset we were assigned to. Also time constraints play a roll too. If we wanted to do more comprehensive modeling, or more advanced statistical techniques, we would have needed much more time to do more thorough research to create more advanced and clear data visualizations. Overall, with these limitations being stated it allows for future statisticians to look at our work and therefore add more to our research given the unanswered research questions and limitations we have faced above.

Breaking down the play: Exploring Factors Impacting NFL Team Win/Loss Percentages

Bear Bottonari, Avery Campbell, Savannah Gibbs, Yeoni Rhee