Overview

This project will begin on Monday June 13th, and conclude with a 10-15 minute presentation on Friday, June 24th (either during the morning session from 10:30 to 12 PM or in the afternoon from 1:30 to 3 PM). The goal of this project is to practice understanding the structure of a dataset, and to practice generating and evaluating hypotheses using fundamental EDA and data visualization techniques.

Deliverables

Your team is expected to produce R Markdown slides (an example template will be provided shortly) to accompany your 10-15 minute presentation with the following information:

Timeline

There will be two submission deadlines:

Friday, June 17th @ 5:00 PM EST - Each student will push their individual code for the project thus far to their GitHub accounts for review. We will then provide feedback on the code submitted.

Thursday, June 23rd @ 11:59 PM EST - Slides and full code must be completed and ready for presentation. Send your slides to Prof Yurko’s email (ryurko@andrew.cmu.edu). All code, visualizations, and presentations must be made in R. Take advantage of examples from lecture and the presentation template, but also feel free to explore material online that may be relevant!

Data

Your team is assigned the NFL passing plays data. This dataset contains all passing plays from the 2021 NFL regular season accessed using the nflfastR package (accessed using nflreadr). The code chunk at the end shows how this dataset was constructed in R.

Each row of the dataset corresponds to a single passing play (including sacks) and has the following columns:

Note that a full glossary of the features available for NFL play-by-play data can be found here.

Code to build dataset

# Load all regular season passes from the 2021 regular season:
library(nflreadr)
nfl_2021_data <- nflreadr::load_pbp(2021, file_type = "rds")

nfl_passing_plays <- nfl_2021_data %>%
  filter(play_type == "pass", season_type == "REG", 
         !is.na(epa), !is.na(posteam), posteam != "") %>%
  select(# Player info attempting the pass:
         passer_player_name, passer_player_id, posteam, 
         # Info about the pass:
         complete_pass, interception, yards_gained, touchdown, 
         pass_location, pass_length, air_yards, yards_after_catch, epa, wpa,
         shotgun, no_huddle, qb_dropback, qb_hit, sack,
         # Context about the receiver:
         receiver_player_name, receiver_player_id   ,
         # Team context:
         posteam, defteam, posteam_type, 
         # Play and game context:
         play_id, yardline_100, side_of_field, down, qtr, play_clock,
         half_seconds_remaining, game_half, game_id,
         home_team, away_team, home_score, away_score,
         # Description of play
         desc)
# Save this file:
write_csv(nfl_passing_plays, 
          "data/sports/eda_projects/nfl_passing_plays_2021.csv")