Name:
Andrew ID:
Collaborated with:

On this homework, you can collaborate with your classmates, but you must identify their names above, and you must submit your own homework as an knitted HTML file on Canvas, by Sunday 10pm, this week.

States data set

Below we construct a data frame, of 50 states x 10 variables. The first 8 variables are numeric and the last 2 are factors. The numeric variables here come from the built-in state.x77 matrix, which records various demographic factors on 50 US states, measured in the 1970s. You can learn more about this state data set by typing ?state.x77 into your R console.

state.df <- data.frame(state.x77, Region=state.region, Division=state.division)

Basic data frame manipulations

Prostate cancer data set

Let’s return to the prostate cancer data set that we looked at in the lab/homework from Week 2 (taken from the book The Elements of Statistical Learning). Below we read in a data frame of 97 men x 9 variables. You can remind yourself about what’s been measured by looking back at the lab/homework (or by visiting the URL linked above in your web browser, clicking on “Data” on the left-hand menu, and clicking “Info” under “Prostate”).

pros.dat <- 
  read.table("http://www.stat.cmu.edu/~ryantibs/statcomp-S18/data/pros.dat")

Practice with the apply family

t.test.by.ind <- function(x, ind) {
  stopifnot(all(ind %in% c(0, 1)))
  return(t.test(x[ind == 0], x[ind == 1]))
}

Rio Olympics data set

It’s Winter Olympics time! To get into the Olympics spirit, we’re going to examine data from the 2016 Summer Olympics in Rio de Janeiro, taken from https://github.com/flother/rio2016 (itself put together by scraping the official Summer Olympics website for information about the athletes). Below we read in the data and store it as rio.

rio <- read.csv("http://www.stat.cmu.edu/~ryantibs/statcomp-S18/data/rio.csv")

More practice with data frames and apply

Some advanced practice with apply