Name:
Andrew ID:
Collaborated with:

This lab is to be completed in class. You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an Rmd file on Blackboard, by 11:59pm on the day of the lab.

There are no homework questions here. (Lucky you! But don’t worry, you still have something to think about at home: you should be working on your final project…)

Cross-validation with the prostate cancer data

pros.df = read.table("http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/prostate.data")
dim(pros.df)
## [1] 97 10
head(pros.df)
##       lcavol  lweight age      lbph svi       lcp gleason pgg45       lpsa
## 1 -0.5798185 2.769459  50 -1.386294   0 -1.386294       6     0 -0.4307829
## 2 -0.9942523 3.319626  58 -1.386294   0 -1.386294       6     0 -0.1625189
## 3 -0.5108256 2.691243  74 -1.386294   0 -1.386294       7    20 -0.1625189
## 4 -1.2039728 3.282789  58 -1.386294   0 -1.386294       6     0 -0.1625189
## 5  0.7514161 3.432373  62 -1.386294   0 -1.386294       6     0  0.3715636
## 6 -1.0498221 3.228826  50 -1.386294   0 -1.386294       6     0  0.7654678
##   train
## 1  TRUE
## 2  TRUE
## 3  TRUE
## 4  TRUE
## 5  TRUE
## 6  TRUE

Making predictions with the HIV data set

hiv.df = read.table("http://www.stat.cmu.edu/~ryantibs/statcomp-F16/data/hiv.dat")
dim(hiv.df)
## [1] 1073  241
hiv.df[1:5, c(1,sample(2:ncol(hiv.df),8))]
##           y p215 p41 p126 p25 p116 p154 p230 p17
## 1 14.612804    1   1    0   0    0    0    0   0
## 2 25.527251    1   1    0   0    0    0    0   0
## 3  0.000000    0   0    0   0    0    0    0   0
## 4  7.918125    1   0    0   0    0    0    0   0
## 5 11.394335    1   0    0   0    0    0    0   0