Motivation

Understanding NFL player performance is a cornerstone of effective team strategy, in-depth fan engagement, and data-driven analysis. The ability to dissect the nuances of player contributions across positions and game situations can benefit analysts, coaches, and fantasy football enthusiasts alike. Leveraging the nflverse load_player_stats dataset, this project delves into a comprehensive range of metrics providing insights into player impact. This project aims to explore the factors contributing to a player’s offensive success, uncover long-term trends in player performance, and explore how player performance changes between regular and postseason games. Through our study, we aim to attain an enhanced understanding of player dynamics and how they shape team outcomes.

Dataset Description

The nflverse load_player_stats dataset provides a detailed snapshot of NFL player activities, capturing a wealth of metrics relevant to offensive performance. Key variables include:

  • Passing Metrics: Passing yards, completions, air yards, and touchdowns
  • Rushing Metrics: Rushing yards, touchdowns, attempts, and success rate
  • Receiving Metrics: Receptions, targets, yards, and efficiency metrics such as YAC (Yards After Catch)
  • Advanced Metrics: Expected Points Added (EPA), fantasy points, and win probability metrics
  • Contextual Data: Game type (regular vs. postseason), season, and team

This dataset offers robust opportunities to explore player contributions and trends, making it a valuable resource for our analysis.

The complete list of variables and a head of the data can be viewed below:

library(nflreadr)
player_stats <- load_player_stats(seasons = TRUE)
names(player_stats)
##  [1] "player_id"                   "player_name"                
##  [3] "player_display_name"         "position"                   
##  [5] "position_group"              "headshot_url"               
##  [7] "recent_team"                 "season"                     
##  [9] "week"                        "season_type"                
## [11] "opponent_team"               "completions"                
## [13] "attempts"                    "passing_yards"              
## [15] "passing_tds"                 "interceptions"              
## [17] "sacks"                       "sack_yards"                 
## [19] "sack_fumbles"                "sack_fumbles_lost"          
## [21] "passing_air_yards"           "passing_yards_after_catch"  
## [23] "passing_first_downs"         "passing_epa"                
## [25] "passing_2pt_conversions"     "pacr"                       
## [27] "dakota"                      "carries"                    
## [29] "rushing_yards"               "rushing_tds"                
## [31] "rushing_fumbles"             "rushing_fumbles_lost"       
## [33] "rushing_first_downs"         "rushing_epa"                
## [35] "rushing_2pt_conversions"     "receptions"                 
## [37] "targets"                     "receiving_yards"            
## [39] "receiving_tds"               "receiving_fumbles"          
## [41] "receiving_fumbles_lost"      "receiving_air_yards"        
## [43] "receiving_yards_after_catch" "receiving_first_downs"      
## [45] "receiving_epa"               "receiving_2pt_conversions"  
## [47] "racr"                        "target_share"               
## [49] "air_yards_share"             "wopr"                       
## [51] "special_teams_tds"           "fantasy_points"             
## [53] "fantasy_points_ppr"
head(player_stats)
##     player_id player_name   player_display_name position position_group
## 1: 00-0000003        <NA> Abdul-Karim al-Jabbar       RB             RB
## 2: 00-0000003        <NA> Abdul-Karim al-Jabbar       RB             RB
## 3: 00-0000003        <NA> Abdul-Karim al-Jabbar       RB             RB
## 4: 00-0000003        <NA> Abdul-Karim al-Jabbar       RB             RB
## 5: 00-0000003        <NA> Abdul-Karim al-Jabbar       RB             RB
## 6: 00-0000003        <NA> Abdul-Karim al-Jabbar       RB             RB
##    headshot_url recent_team season week season_type opponent_team completions
## 1:         <NA>         MIA   1999    1         REG           DEN           0
## 2:         <NA>         MIA   1999    2         REG           ARI           0
## 3:         <NA>         MIA   1999    4         REG           BUF           0
## 4:         <NA>         CLE   1999    7         REG            LA           0
## 5:         <NA>         CLE   1999    8         REG            NO           0
## 6:         <NA>         CLE   1999    9         REG           BAL           0
##    attempts passing_yards passing_tds interceptions sacks sack_yards
## 1:        0             0           0             0     0          0
## 2:        0             0           0             0     0          0
## 3:        0             0           0             0     0          0
## 4:        0             0           0             0     0          0
## 5:        0             0           0             0     0          0
## 6:        0             0           0             0     0          0
##    sack_fumbles sack_fumbles_lost passing_air_yards passing_yards_after_catch
## 1:            0                 0                 0                         0
## 2:            0                 0                 0                         0
## 3:            0                 0                 0                         0
## 4:            0                 0                 0                         0
## 5:            0                 0                 0                         0
## 6:            0                 0                 0                         0
##    passing_first_downs passing_epa passing_2pt_conversions pacr dakota carries
## 1:                   0          NA                       0   NA     NA      16
## 2:                   0          NA                       0   NA     NA       9
## 3:                   0          NA                       0   NA     NA       3
## 4:                   0          NA                       0   NA     NA       6
## 5:                   0          NA                       0   NA     NA      13
## 6:                   0          NA                       0   NA     NA       9
##    rushing_yards rushing_tds rushing_fumbles rushing_fumbles_lost
## 1:            60           1               0                    0
## 2:            33           0               0                    0
## 3:             2           0               0                    0
## 4:            27           0               0                    0
## 5:            39           0               0                    0
## 6:            23           0               0                    0
##    rushing_first_downs rushing_epa rushing_2pt_conversions receptions targets
## 1:                   4   6.2487711                       0          1       1
## 2:                   1  -1.4349502                       0          3       4
## 3:                   0  -1.5399517                       0          0       1
## 4:                   0   0.2160509                       0          2       2
## 5:                   2  -2.9722589                       0          0       0
## 6:                   1  -1.7450201                       0          1       2
##    receiving_yards receiving_tds receiving_fumbles receiving_fumbles_lost
## 1:               7             0                 0                      0
## 2:              18             0                 0                      0
## 3:               0             0                 0                      0
## 4:               8             0                 0                      0
## 5:               0             0                 0                      0
## 6:               2             0                 0                      0
##    receiving_air_yards receiving_yards_after_catch receiving_first_downs
## 1:                   0                           0                     0
## 2:                   0                           0                     1
## 3:                   0                           0                     0
## 4:                   0                           0                     0
## 5:                   0                           0                     0
## 6:                   0                           0                     0
##    receiving_epa receiving_2pt_conversions racr target_share air_yards_share
## 1:     0.2923782                         0    0   0.05263158             NaN
## 2:     0.3770089                         0    0   0.11764706             NaN
## 3:    -0.6995777                         0   NA   0.02380952             NaN
## 4:    -0.2284540                         0    0   0.05000000             NaN
## 5:            NA                         0   NA           NA              NA
## 6:    -1.1106944                         0    0   0.06250000             NaN
##    wopr special_teams_tds fantasy_points fantasy_points_ppr
## 1:  NaN                 0           12.7               13.7
## 2:  NaN                 0            5.1                8.1
## 3:  NaN                 0            0.2                0.2
## 4:  NaN                 0            3.5                5.5
## 5:   NA                 0            3.9                3.9
## 6:  NaN                 0            2.5                3.5

Research Questions

From our motivation, and dataset, the three research questoins we aim to address through this project are:

  1. What are the key factors contributing to a player’s offensive success across different positions?
  2. How have player performance metrics changed over time?
  3. How does a player’s contribution to their team’s performance change between regular and postseason games?

Question 1: What are the key factors contributing to a player’s offensive success across different positions?

This question focuses on exploratory data analysis, investigating associations between metrics that measure offensive success—such as fantasy points—and other key variables like passing yards, rushing yards, and receiving yards. The goal is to uncover how these variables interrelate and contribute to offensive performance across positions.

Firstly, we’re gonna produce a simple bar plot to understand the spread of positions which will allow us to better interpret other graphs.

library(ggplot2)
library(dplyr)

# Count positions
position_counts <- player_stats %>%
  count(position) %>%
  arrange(desc(n))

# bar plot
ggplot(position_counts, aes(x = reorder(position, -n), y = n, fill = position)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Number of Data Points per Position",
    x = "Position",
    y = "Count",
    fill = "Position"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

From this we see that the main focus of this data comes from offensive “skill positions” which are the positions that are making the plays that gain yards and score points. The data does seem to contain some other positions each with very few data points, likely due to edge case plays like turnovers or special teams. Fromt his we can conclude that we should be focusing on positions like receivers, backs, quarterbacks, and tight ends. Further, our analysis will have to do with fantasy football which also focuses on offensive skill positions as points are not even really given to individual defensive players in the system.

Thus we will clean our data to get rid of these positions. ALso we will combine fullback into runnin back as they are efficetivly the same position, (a full back is a type if running back).

player_stats <- player_stats %>%
  filter(position %in% c("WR", "RB", "TE", "QB", "FB")) %>%
  mutate(
    position = ifelse(position == "FB", "RB", position)
  )

Next we want to look at how distinguishible these positions are from each other utilizing PCA.

# Filter data for skill positions and relevant variables
pca_data <- player_stats %>%
  filter(position %in% c("QB", "RB", "WR", "TE")) %>%
  mutate(
    passing_yards = ifelse(is.na(passing_yards), 0, passing_yards),
    rushing_yards = ifelse(is.na(rushing_yards), 0, rushing_yards),
    receiving_yards = ifelse(is.na(receiving_yards), 0, receiving_yards),
    fantasy_points = ifelse(is.na(fantasy_points), 0, fantasy_points)
  ) %>%
  select(position, passing_yards, rushing_yards, receiving_yards, fantasy_points)

pca_matrix <- pca_data %>%
  select(-position) %>%
  scale()

pca_result <- prcomp(pca_matrix, center = TRUE, scale. = TRUE)

pca_scores <- as.data.frame(pca_result$x) %>%
  mutate(position = pca_data$position)

# Plot PCA
ggplot(pca_scores, aes(x = PC1, y = PC2, color = position)) +
  geom_point(alpha = 0.7, size = 3) +
  labs(
    title = "PCA of Skill Positions",
    x = "Principal Component 1",
    y = "Principal Component 2",
    color = "Position"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    plot.title = element_text(size = 14, face = "bold")
  )

Our PCA provides some interesting insight. We see that the positions are significantly distinguishable from each other except for tight ends and wide receivers which follow similar trends. This makes a lot of sense as tight ends are effectively wide receivers, just a special type that blocks more. However, blocking is not accounted for in this data so they are effetively both types of receivers.

Next we will create scatterplots to understand how each of these positions factor into 3 of the most important statistics in football, passing, receiving, and rushing yards. We will compare them in the context of fantasy points, a statistics that is used to measure the perfoamnce of a player during a given game.

yard_data <- data.frame(
  fantasy_points = rep(player_stats$fantasy_points, 3),
  #We repeat the 'fantasy_points' column three times for the three yard types
  yard_type = rep(c("Passing", "Rushing", "Receiving"), each = nrow(player_stats)),
  yard_value = c(player_stats$passing_yards, player_stats$rushing_yards, player_stats$receiving_yards),
  #We repeat the 'position' column three times to match the rows in the new data frame
  position = rep(player_stats$position, 3))

ggplot(yard_data, aes(x = yard_value, y = fantasy_points, color = position)) +
  geom_point(alpha = 0.7) +
  facet_wrap(~yard_type) +
  labs(title = "Fantasy Points by Player Position vs. Offensive Yards by Type",
    x = "Yard Value (Passing, Rushing, Receiving)",
    y = "Fantasy Points",
    color = "Position")

The graph illustrates the relationship between fantasy points and offensive yardage, segmented by yard type—passing, rushing, and receiving. Each panel represents one type of yardage and highlights how different player positions contribute to their respective yard categories. Fantasy points are displayed on the y-axis, while offensive yardage is on the x-axis. The data points are color-coded by player position, such as Quarterbacks (QBs), Running Backs (RBs), and Wide Receivers (WRs), providing a clear visualization of positional trends and contributions to offensive performance.

Fantasy points serve as a comprehensive metric for player success because they integrate multiple aspects of offensive performance into a single, interpretable value. This makes them an ideal proxy for evaluating how effectively a player contributes to their team’s offense, and to fans they are a standardized and widely accepted measure of a player’s output.

In the first panel, which represents passing yards, Quarterbacks (QBs) dominate the distribution. The majority of players with high passing yardage also have high fantasy points, indicating a strong positive correlation between these two metrics. This reinforces the role of QBs as primary contributors to passing offenses. In the second panel, receiving yards are predominantly associated with Wide Receivers (WRs) and Tight Ends (TEs). There is noticeable variability in fantasy points even for players with similar receiving yardage. This suggests that other factors, such as touchdowns or yards after catch, may also play a significant role in determining fantasy performance. The final panel focuses on rushing yards, where Running Backs (RBs) stand out as the key contributors. Here, a clear upward trend is visible, showing that players with higher rushing yardage tend to earn significantly more fantasy points.

This plot reinforces our findings from the previous plot, also establishing the specific variable that is the mainly relevent variable for each of the positions.

options(scipen = 999) #Avoiding scientific notation to create a better comprehensible table
summary_stats <- player_stats %>%
  group_by(position) %>%
  summarize(
    avg_passing_yards = mean(passing_yards, na.rm = TRUE),
    avg_rushing_yards = mean(rushing_yards, na.rm = TRUE),
    avg_receiving_yards = mean(receiving_yards, na.rm = TRUE),
    avg_fantasy_points = mean(fantasy_points, na.rm = TRUE),
    total_players = n())

print(summary_stats)
## # A tibble: 4 × 6
##   position avg_passing_yards avg_rushing_yards avg_receiving_yards
##   <chr>                <dbl>             <dbl>               <dbl>
## 1 QB                197.                11.0                 0.161
## 2 RB                  0.0434            33.8                13.6  
## 3 TE                  0.135              0.125              25.9  
## 4 WR                  0.122              0.996              41.4  
## # ℹ 2 more variables: avg_fantasy_points <dbl>, total_players <int>

To complement the visual insights from the graph, the summary statistics provide a detailed breakdown of offensive yard contributions and fantasy points across player positions. Quarterbacks (QBs) lead in passing yards, with an average of 193.5 yards per game, significantly higher than any other position. They also average 13.7 fantasy points, emphasizing their pivotal role in passing offenses. Running Backs (RBs) dominate rushing yardage with an average of 38.6 yards per game and contribute 13.1 receiving yards, resulting in an average of 7.2 fantasy points. These statistics highlight the importance of RBs in both rushing and receiving facets of the game.

Other positions contribute less prominently to offensive yardage. Fullbacks (FBs) average 7.6 receiving yards and 1.4 rushing yards, resulting in a modest 1.39 fantasy points. Similarly, Punters (P) and Cornerbacks (CB) contribute minimally to offensive yardage, with negligible averages across categories. This makes sense as these are not even offensive positions. Wide Receivers and Tight Ends, often expected to excel in receiving yards, appear underrepresented in this specific dataset.

Tying back to the research question, our findings provide clear insights into the key factors contributing to a player’s offensive success across different positions.

library(tidyr)
heatmap_data <- player_stats %>%
  filter(position_group %in% c("QB", "RB", "WR", "TE")) %>%
  group_by(position) %>%
  summarise(
    avg_passing_yards = mean(passing_yards, na.rm = TRUE),
    avg_passing_tds = mean(passing_tds, na.rm = TRUE),
    avg_rushing_yards = mean(rushing_yards, na.rm = TRUE),
    avg_rushing_tds = mean(rushing_tds, na.rm = TRUE),
    avg_receiving_yards = mean(receiving_yards, na.rm = TRUE),
    avg_receiving_tds = mean(receiving_tds, na.rm = TRUE),
    avg_fantasy_points = mean(fantasy_points, na.rm = TRUE)
  ) %>%
  pivot_longer(cols=starts_with("avg_"),names_to="metric",values_to="value")

ggplot(heatmap_data, aes(x = metric, y = position, fill = value)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low="lightblue", high="darkblue", name="Avg Value") +
  labs(title = "Offensive Success Metrics by Position",x="Metric",y="Position") +
  theme(axis.text.x = element_text(angle=45, hjust=1))

One of the biggest aspects of the NFL is the fantasy football that comes with it. With 10s of millions of people competing with their friends, family, or just other people on the internet, there is a lot of interest in picking the best possible team to avoid the dreaded least place punishment that many leagues play with. The plot above helps us analyze the importance of different positions in fantasy football, and the specific stats that are most important for that position.

Firstly we see the value that is what seems to be by far the single most important metric for success in a fantasy football team, the pasing yards of your quarterback. It is hard to dispute that quarterback is the single most important offensive position in which the main responsibility is efficient passing of the ball. Our data backs this up as the average value of fantasy points as the passing metric for the quarterback has an average value around 3 times the next highest value. The takeaway from this? When playing fantasy football make sure to have a quarterback you can trust. Further, fantasy football is created to closely emulate the actual game, so this also highlights the importance of the quarterback position on any given team.

ggplot(player_stats, aes(x = position, y = fantasy_points, fill = position)) +
  geom_boxplot() +
  labs(
    title = "Fantasy Points Across Positions",
    x = "Position",
    y = "Fantasy Points"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1)
  )

This box plot provides us some more insight on the fantasy points provided by different positions. One new takeaway from this graph is that although quarterbacks have higher average fantasy points, the decision of your fantasy running back could actually be more important. This is because the outliers in running back seem to stretch further and be significantly more in number. This means that selecting the right running back vs an average running back would benefit your team a lot more then selecting the right quarter back vs an average quarterback. Thus this data would actually suggest prioritizing running back picks over quarterback picks when drafting for fantasy football due to the larger potential gain.

Question 2: How have player performance metrics changed over time?

Firstly we will plot player performance over the weeks of the 2024 regular season by utilizng averages to get an understanding of the values for these statistics across the whole league.

library(broom)

#filtering data to use only positions important to that statistics and also only using 2024 data as this will analyze week by week performance
player_stats_2024 <- player_stats %>%
  filter(season == 2024) %>%
  mutate(
    passing_yards = ifelse(position == "QB", passing_yards, NA),
    rushing_yards = ifelse(position == "RB", rushing_yards, NA),
    receiving_yards = ifelse(position %in% c("WR", "TE"), receiving_yards, NA)
  )

# Aggregating by week for each of the stats
nfl_weekly_summary_2024 <- player_stats_2024 %>%
  group_by(week) %>%
  summarize(
    avg_passing_yards = mean(passing_yards, na.rm = TRUE),
    avg_rushing_yards = mean(rushing_yards, na.rm = TRUE),
    avg_receiving_yards = mean(receiving_yards, na.rm = TRUE)
  )

#Linear regression models
passing_lm <- lm(avg_passing_yards ~ week, data = nfl_weekly_summary_2024)
rushing_lm <- lm(avg_rushing_yards ~ week, data = nfl_weekly_summary_2024)
receiving_lm <- lm(avg_receiving_yards ~ week, data = nfl_weekly_summary_2024)

#slopes and confidence intervals
passing_slope <- tidy(passing_lm) %>% filter(term == "week")
passing_ci <- confint(passing_lm)["week", ]

rushing_slope <- tidy(rushing_lm) %>% filter(term == "week")
rushing_ci <- confint(rushing_lm)["week", ]

receiving_slope <- tidy(receiving_lm) %>% filter(term == "week")
receiving_ci <- confint(receiving_lm)["week", ]

#plot of timer series data for each statistic
ggplot(nfl_weekly_summary_2024, aes(x = week)) +
  geom_line(aes(y = avg_passing_yards, color = "Passing Yards")) +
  geom_line(aes(y = avg_rushing_yards, color = "Rushing Yards")) +
  geom_line(aes(y = avg_receiving_yards, color = "Receiving Yards")) +
  geom_smooth(aes(y = avg_passing_yards, color = "Passing Yards"), method = "lm", se = FALSE, linetype = "dashed") +
  geom_smooth(aes(y = avg_rushing_yards, color = "Rushing Yards"), method = "lm", se = FALSE, linetype = "dashed") +
  geom_smooth(aes(y = avg_receiving_yards, color = "Receiving Yards"), method = "lm", se = FALSE, linetype = "dashed") +
  labs(
    title = "Average NFL Player Performance Metrics by Week (2024)",
    x = "Week",
    y = "Average Yards",
    color = "Metric"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

#lLope values and confidence intervals
cat("Passing Yards: Slope =", passing_slope$estimate, 
    ", 95% Confidence Intervals = [", passing_ci[1], ", ", passing_ci[2], "]\n")
## Passing Yards: Slope = 2.48447 , 95% Confidence Intervals = [ 0.06238619 ,  4.906553 ]
cat("Rushing Yards: Slope =", rushing_slope$estimate, 
    ", 95% Confidence Intervals = [", rushing_ci[1], ", ", rushing_ci[2], "]\n")
## Rushing Yards: Slope = -0.02530769 , 95% Confidence Intervals = [ -0.4892483 ,  0.4386329 ]
cat("Receiving Yards: Slope =", receiving_slope$estimate, 
    ", 95% Confidence Intervals = [", receiving_ci[1], ", ", receiving_ci[2], "]\n")
## Receiving Yards: Slope = 0.3470138 , 95% Confidence Intervals = [ -0.06576434 ,  0.7597919 ]

From our plot we see some interesting trends in pasing, recieving, and rushing yards through the 2024 season. First thing to notice is that both confidence intervals for rushing and receiving yards overlap 0 and thus suggest that there is not enough evidence to reject the null hypothesis that there is no correlation between week and these 2 statistics. The most interesting statistic is passing yards which actually has a signifcant slope with the 95% confidence interval being [0.06, 4.90]. The confidence interval is close to 0 so its not super significant but it does suggest an upward trend in a quarterback’s passing yards through the weeks. This could suggest multiple things, firstly that quarterbacks tend to perform bettter in the passing game as the season progresses as they develop their skills. It could also suggest that teams are sticking to a singular quarterback and thus more playtime per quarterback playing thus leading to higher average yards. This does have implication for fantasy football as later in the season quarterback prioirty may rise thus calling for trades that prioritize having a good quarterback.

Next we plot the same thing but we average over a whole year at a time to understand the leagues overall trends through the years of football.

# Filter data to relevant positions
player_stats <- player_stats %>%
  mutate(
    passing_yards = ifelse(position == "QB", passing_yards, NA),
    rushing_yards = ifelse(position == "RB", rushing_yards, NA),
    receiving_yards = ifelse(position %in% c("WR", "TE"), receiving_yards, NA)
  )

# Aggregate metrics by season (year)
nfl_yearly_summary <- player_stats %>%
  group_by(season) %>%
  summarize(
    avg_passing_yards = mean(passing_yards, na.rm = TRUE),
    avg_rushing_yards = mean(rushing_yards, na.rm = TRUE),
    avg_receiving_yards = mean(receiving_yards, na.rm = TRUE)
  )

# Linera regression models
passing_lm <- lm(avg_passing_yards ~ season, data = nfl_yearly_summary)
rushing_lm <- lm(avg_rushing_yards ~ season, data = nfl_yearly_summary)
receiving_lm <- lm(avg_receiving_yards ~ season, data = nfl_yearly_summary)

# calculate confidence intervals
passing_slope <- tidy(passing_lm) %>% filter(term == "season")
passing_ci <- confint(passing_lm)["season", ]

rushing_slope <- tidy(rushing_lm) %>% filter(term == "season")
rushing_ci <- confint(rushing_lm)["season", ]

receiving_slope <- tidy(receiving_lm) %>% filter(term == "season")
receiving_ci <- confint(receiving_lm)["season", ]

# Plotting time series data
ggplot(nfl_yearly_summary, aes(x = season)) +
  geom_line(aes(y = avg_passing_yards, color = "Passing Yards")) +
  geom_line(aes(y = avg_rushing_yards, color = "Rushing Yards")) +
  geom_line(aes(y = avg_receiving_yards, color = "Receiving Yards")) +
  geom_smooth(aes(y = avg_passing_yards, color = "Passing Yards"), method = "lm", se = FALSE, linetype = "dashed") +
  geom_smooth(aes(y = avg_rushing_yards, color = "Rushing Yards"), method = "lm", se = FALSE, linetype = "dashed") +
  geom_smooth(aes(y = avg_receiving_yards, color = "Receiving Yards"), method = "lm", se = FALSE, linetype = "dashed") +
  labs(
    title = "Average NFL Player Performance Metrics by Year",
    x = "Year",
    y = "Average Yards",
    color = "Metric"
  ) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Slope values and confidence intervals
cat("Passing Yards: Slope =", passing_slope$estimate, 
    ", 95% Confidence Intervals = [", passing_ci[1], ", ", passing_ci[2], "]\n")
## Passing Yards: Slope = 1.270682 , 95% Confidence Intervals = [ 0.5980243 ,  1.943341 ]
cat("Rushing Yards: Slope =", rushing_slope$estimate, 
    ", 95% Confidence Intervals = [", rushing_ci[1], ", ", rushing_ci[2], "]\n")
## Rushing Yards: Slope = 0.05575585 , 95% Confidence Intervals = [ -0.03525171 ,  0.1467634 ]
cat("Receiving Yards: Slope =", receiving_slope$estimate, 
    ", 95% Confidence Intervals = [", receiving_ci[1], ", ", receiving_ci[2], "]\n")
## Receiving Yards: Slope = -0.1031624 , 95% Confidence Intervals = [ -0.2055482 ,  -0.0007765008 ]

This next graph aggregates data by year to analyze the trends in the three statistics over the years of the nfl. We once again see passing yards with a signifcant upwards trend suggesting the the average performance of a quarterback when measured by passing yards has increased over the years. This once again suggests the same implications as the previous graph, just on the scale of nfl seasons rather then the weeks of the 2024 season. RUshign and receiving yards also have signifcant values this time all through they are both pretty low but are enough evidence to suggest that there is a negative trend in average rushing and recieving yards over the years. I think this likely because of the nfl pivoting to utilizing more recievers and runningbacks then just those on the field at once, rotating more players to allow for better performance and thus spreading the yards among more players, reducing averages. Combing all of these trends together to analyze in terms of fantasy football suggests that quarterbacks have actually risen in fantasy importance versus other offensive skill positions over these last 25 years.

Question 3: How does a player’s contribution to their team’s performance change between regular and postseason games?

We will create facetted scatterplot to undertand how each position performance changes from regular season to post seasons. To allow for proper reading of the plots we will combine each players performance for the regular and post seasons seperatly by averaging the games that fall into each of those categories to create one datapoint. We will utilize 2023 datapoints for this plot as it is the most recent season that has a completed post season.

# Calculate a total EPA statistic for the data using 2023 datapoints
scatter_data <- player_stats %>%
  filter(season == 2023, position %in% c("QB", "RB", "WR", "TE"), season_type %in% c("REG", "POST")) %>%
  mutate(
    rushing_epa = ifelse(is.na(rushing_epa), 0, rushing_epa),
    receiving_epa = ifelse(is.na(receiving_epa), 0, receiving_epa),
    passing_epa = ifelse(is.na(passing_epa), 0, passing_epa),
    total_epa = rushing_epa + receiving_epa + passing_epa
  ) %>%
  group_by(player_id, position, season_type) %>%
  summarize(
    avg_total_epa = mean(total_epa, na.rm = TRUE),
    .groups = "drop"
  )

# Create faceted scatterplots for REG and POST performances
ggplot(scatter_data, aes(x = position, y = avg_total_epa, color = position)) +
  geom_jitter(width = 0.2, alpha = 0.7, size = 3) +
  geom_boxplot(alpha = 0.3, outlier.shape = NA, color = "black") +
  facet_wrap(~ season_type, scales = "free_y", ncol = 2) +
  labs(
    title = "Total EPA by Position for Regular and Postseason (2023)",
    x = "Position",
    y = "Average Total EPA (Expected points Added)",
    color = "Position"
  ) +
  theme(
    plot.title = element_text(size = 14, face = "bold"),
    legend.position = "bottom"
  )

The outcome of this plot is very interesting. As expected there are significantly more data points for the regular season than the post season. For this 2023 season there some interesting trends. Firstly running backs, tight ends, and wide receivers seem to follow very similar trends when it comes to expected points added between the regular and the post seasons.Quarterbacks on the other hand seem to actually show differing behavior. Firstly they seem to have a much more signifcant standard deviation in the post season than the regular season, with more outlier type data. We also see that the median is significantly higher then the regular season suggesting that in general post seasons quarterback performance is significantly better than regular season. During the post season, all the teams are better thus there arae better quarterbacks but they are also playing against better defenses, thsu this plot suggests that quarterbacks do the best of stepping up to occasion and delivering much needed top performances when playoffs come around.