class: center, middle, inverse, title-slide # Welcome to CMSACamp ## Background and overview ### June 1st, 2021 --- ## Meet the instructors - Teaching Assistant: __Beomjo Park__ - started [@CMU_Stats]( PhD in '18 - Previously Korea University, and research assistant at NCSoft (owner of the NC Dinos!) - Research: Robust Statistical Inference, Model misspecification, Bayesian Nonparametrics - Teaching Assistant: __Nicholas Kissel__ - started [@CMU_Stats]( PhD in '19 - Previously University of Pittsburgh '19, MS in statistics and BS in math & statistics - Research: Creating inferential procedures for machine learning modeling methods -- .pull-left[ - Instructor: __Ron Yurko__ ([@Stat_Ron]( - [@CMU_Stats]( '15, started PhD in '17 - Pittsburgh Pirates analytics intern '14 - Part-time Data Scientist [@ZelusAnalytics]( - Research: statistical genetics, _statistics in sports_ / _sports analytics_, and variable selection for model-based clustering ] .pull-right[ .center[] ] --- ## Statistics in sports research? You might think statistics in sports or sports analytics research is relatively new... -- .pull-left[ Professors [Brad Efron]( and [Carl Morris]( disagree - "Data analysis using Stein's estimator and its generalizations" - _Journal of the American Statistical Association_ (__1975__) - Introduction of __Empirical Bayes__ to sports - Improve accuracy by pooling information from other players ] .pull-right[ .center[] ] --- ## Sports analytics research __starts with the data__ .center[] Cervone et al. ["A multiresolution stochastic process model for predicting basketball possession outcomes."]( _Journal of the American Statistical Association_ (2016) --- ## Sports analytics research __starts with the data__ .center[] Cervone et al. ["A multiresolution stochastic process model for predicting basketball possession outcomes."]( _Journal of the American Statistical Association_ (2016) --- ## NFL Big Data Bowl tracking data example <img src="" width="150%" style="display: block; margin: auto;" /> Yurko et al. ["Going deep: models for continuous-time within-play valuation of game outcomes in American football with tracking data."]( _Journal of Quantitative Analysis in Sports_ (2020) --- ### General outline and key dates (subject to change, all times in EST) __Lectures__: Monday thru Friday, 12 to 1:30 PM (Ron's office hours are Mondays 4:30 to 5:30 PM) __Labs__: Monday thru Thursday, 2:30 to 4 PM (Beomjo's office hours are Thursday 4 to 5, Nick's are Friday 2 to 3) - Will begin with mini projects & practice presentations before shift to focus on main projects __CMSAConvo speaker series__: 3 to 4:30 PM every Friday -- .pull-left[ - First two weeks, June 1-11: - EDA, data visualization, clustering - Presentations from project advisors - __June 13: Project preference deadline__ - June 14-25: - Linear models and model assessment - Regularization and dimension reduction - __June 21-23: Resume check / career conversations with Professor Nugent__ - __June 24-25: EDA presentations__ ] -- .pull-right[ - June 28 - July 9: - Flexible models, machine learning - Labs will shift focus to main projects - __July 8-9: Modeling presentations__ - July 12 - 30: - Special topics (e.g. text analysis) - Focus on projects! - __July 30: Final project presentations__ Plus other guest speakers (__check your email!__) ] --- ## Goals for the summer .pull-left[ - Develop fundamentals research skills: data wrangling, visualization, modeling, communication - Become familiar with `R`, `tidyverse`, `ggplot2`, `markdown`, GitHub - Complete statistical learning bootcamp - Create a portfolio of projects with GitHub and practice reproducible research - __All presentations will be made using `R` Markdown with [xaringan](!__ - Network with academic researchers and industry professionals ] -- .pull-right[ __Ask questions, learn, and grow__ .center[] Senior Academic Advisor Samantha Nielsen ([]( ] --- ## Resources to remember! - CMSACamp website: []( - Check out the [References]( tab for links to online textbooks and other useful references - [Data Sources]( tab for links to various public datasets - We will also use slack to communicate, share interesting articles and materials throughout the summer - See previous email from Professor Nugent with the workspace invitation link -- .center[] --- ## CMSACamp alumni .pull-left[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Just realized I haven’t announced this on twitter yet, but I have officially committed to earn my PhD in statistics at <a href="">@virginia_tech</a> !! <a href=""></a></p>— Danielle Sebring (@DSebring17) <a href="">March 26, 2020</a></blockquote> <script async src="" charset="utf-8"></script> ] .pull-right[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Happy and humbled to announce that I'll be returning to the one and only <a href="">@CMU_Stats</a> as a PhD student this fall!<br>See you in Pittsburgh!</p>— Thea Sukianto (@stats_sukianto) <a href="">March 19, 2021</a></blockquote> <script async src="" charset="utf-8"></script> ] --- ## CMSACamp alumni .pull-left[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Super excited and honored to be chosen as one of the winners of this year's <a href="">#BigDataBowl</a>! Big shoutouts to <a href="">@sarahrunbailey</a> for being a fantastic mentor and <a href="">@StatsbyLopez</a> and crew for putting this amazing competition together! <a href=""></a></p>— Jill Reiner (@jillhreiner) <a href="">February 5, 2021</a></blockquote> <script async src="" charset="utf-8"></script> ] .pull-right[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Thank you <a href="">@SloanSportsConf</a> for an outstanding conference. <a href="">@j_bosch10</a> <a href=""></a></p>— Sam Kalman (@sam_kalman_) <a href="">March 7, 2020</a></blockquote> <script async src="" charset="utf-8"></script> ] --- ## CMSACamp alumni .pull-left[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Getting to work in the NHL has been a dream come true💛🖤! Super excited to continue working with Sam, Nick, and others in the Penguins organizations moving forward! <a href="">#LetsGoPens</a> <a href=""></a></p>— Katerina Wu (@kattaqueue) <a href="">March 9, 2021</a></blockquote> <script async src="" charset="utf-8"></script> ] .pull-right[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr">It’s unusual time but I am happy to share that I am graduating. First one to go to college and to graduate from both sides of my family <a href="">#firstgen</a> and <a href="">#rstats</a> graduate. <a href=""></a></p>— Kapil.Khanal (@almost_kapil) <a href="">May 8, 2020</a></blockquote> <script async src="" charset="utf-8"></script> ] --- class: center, middle # And now it's your turn... -- # (but we're here to help!) .center[]