Name:
Andrew ID:
Collaborated with:

This lab is to be done in class (completed outside of class time if need be). You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an knitted PDF file on Gradescope, by Friday 9pm, this week.

This week’s agenda: practicing SQLite queries, performing simple computations and joins, and testing our understanding by writing equivalent R code for these database manipulations.

Lahman baseball database

Thanks to Sean Lahman, extensive baseball data is freely available all the way back to the 1871 season. We’re going ot use a SQLite version of the baseball database put together by Jeff Knecht, at https://github.com/jknecht/baseball-archive-sqlite. The most recent SQLite database was recently updated to include the 2016 season. It has been posted to the class website at http://www.stat.cmu.edu/~ryantibs/statcomp/data/lahman2016.sqlite. Download this file (it’s about 50 MB) and save it in the working directory for your lab.

Q1. Practice with SQL data extraction

##     playerID yearID  AB  H HR
## 1  abercda01   1871   4  0  0
## 2   addybo01   1871 118 32  0
## 3  allisar01   1871 137 40  0
## 4  allisdo01   1871 133 44  2
## 5  ansonca01   1871 120 39  0
## 6  armstbo01   1871  49 11  0
## 7  barkeal01   1871   4  1  0
## 8  barnero01   1871 157 63  0
## 9  barrebi01   1871   5  1  0
## 10 barrofr01   1871  86 13  0
##     playerID yearID  AB   H HR
## 1  bondsba01   2001 476 156 73
## 2  mcgwima01   1998 509 152 70
## 3   sosasa01   1998 643 198 66
## 4  mcgwima01   1999 521 145 65
## 5   sosasa01   2001 577 189 64
## 6   sosasa01   1999 625 180 63
## 7  marisro01   1961 590 159 61
## 8   ruthba01   1927 540 192 60
## 9   ruthba01   1921 540 204 59
## 10  foxxji01   1932 585 213 58
##     playerID yearID  AB   H HR
## 1  bondsba01   2001 476 156 73
## 2  mcgwima01   1998 509 152 70
## 3   sosasa01   1998 643 198 66
## 4  mcgwima01   1999 521 145 65
## 5   sosasa01   2001 577 189 64
## 6   sosasa01   1999 625 180 63
## 7  marisro01   1961 590 159 61
## 8   ruthba01   1927 540 192 60
## 9   ruthba01   1921 540 204 59
## 10  foxxji01   1932 585 213 58
## 11 greenha01   1938 556 175 58
## 12 howarry01   2006 581 182 58
## 13 gonzalu01   2001 609 198 57
## 14 rodrial01   2002 624 187 57
## 15 griffke02   1997 608 185 56
## 16 griffke02   1998 633 180 56
## 17 wilsoha01   1930 585 208 56
##     playerID yearID  AB   H HR
## 1  mcgwima01   1998 509 152 70
## 2   sosasa01   1998 643 198 66
## 3  mcgwima01   1999 521 145 65
## 4   sosasa01   1999 625 180 63
## 5  griffke02   1997 608 185 56
## 6  griffke02   1998 633 180 56
## 7  mcgwima01   1996 423 132 52
## 8  fieldce01   1990 573 159 51
## 9  anderbr01   1996 579 172 50
## 10 belleal01   1995 546 173 50

Q2. Practice with SQL computations

##    AVG(HR)
## 1 2.813599
##   SUM(HR)
## 1  289283
##    playerID yearID teamID MAX(HR)
## 1 bondsba01   2001    SFN      73
##    AVG(HR)
## 1 3.585889
##   teamID  AVG(HR)
## 1    ANA 3.928783
## 2    ARI 3.610227
## 3    ATL 3.822374
## 4    BAL 3.958401
## 5    BOS 3.621142
##   teamID  AVG(HR)
## 1    ML1 5.029412
## 2    SFN 4.616438
## 3    LAN 3.950617
## 4    NYP 3.882353
## 5    BSP 3.176471
##    teamID yearID  AVG(HR)
## 1     DET   1992 5.352941
## 2     DET   1991 5.097561
## 3     CIN   1991 4.100000
## 4     CHN   1991 4.076923
## 5     NYA   1992 4.075000
## 6     TOR   1992 4.075000
## 7     BAL   1991 4.047619
## 8     MIN   1991 4.000000
## 9     BAL   1992 3.894737
## 10    NYA   1991 3.675000
## 11    CHA   1991 3.657895
## 12    TEX   1991 3.612245
## 13    SDN   1992 3.461538
## 14    OAK   1991 3.456522
## 15    SFN   1991 3.439024

Q3. More practice with computations

Q4. Practice with SQL join operations

Q5. All about the money

Q6. Batting averages (optional)