Name:
Andrew ID:
Collaborated with:

This lab is to be done in class (completed outside of class if need be). You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an knitted HTML file on Canvas, by Sunday 11:59pm, this week. Make sure to complete your weekly check-in (which can be done by coming to lecture, recitation, lab, or any office hour), as this will count a small number of points towards your lab score.

This week’s agenda: exploratory data analysis, cleaning data, fitting linear/logistic models, and using associated utility functions.

Prostate cancer data

Recall the data set on 97 men who have prostate cancer (from the book The Elements of Statistical Learning). Reading it into our R session:

pros.df = 
  read.table("http://www.stat.cmu.edu/~ryantibs/statcomp-F19/data/pros.dat")
dim(pros.df)
## [1] 97  9
head(pros.df, 3)
##       lcavol  lweight age      lbph svi       lcp gleason pgg45       lpsa
## 1 -0.5798185 2.769459  50 -1.386294   0 -1.386294       6     0 -0.4307829
## 2 -0.9942523 3.319626  58 -1.386294   0 -1.386294       6     0 -0.1625189
## 3 -0.5108256 2.691243  74 -1.386294   0 -1.386294       7    20 -0.1625189

Simple exploration and linear modeling

Reading in, exploring wage data

Wage linear regression modeling

Wage logistic regression modeling

Wage generalized additive modeling (optional)