Name:
Andrew ID:
Collaborated with:

This lab is to be done in class (completed outside of class if need be). You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an knitted HTML file on Canvas, by Sunday 11:59pm, this week.

This week’s agenda: practicing grouping, spreading and gathering, and joins.

# Load the tidyverse!
library(tidyverse)

Practice with grouping

Below we read in a data frame sprint.m.df containing the top men’s times in the 100m sprint, as seen in previous labs. In the following, unless stated otherwise, use pipes and dplyr verbs to solve each part as cleanly/succintly as you can.

sprint.m.df = read.table(
  file="http://www.stat.cmu.edu/~ryantibs/statcomp-F18/data/sprint.m.dat",
  sep="\t", header=TRUE, quote="", stringsAsFactors=TRUE)

Practice spreading and gathering

In the following, use pipes and dplyr or tidyr verbs to solve each part as cleanly/succintly as you can. In some parts, it might make more sense to use direct indexing, and that’s perfectly fine.

Practice with joins

Below we read in a data frame sprint.w.df containing the top women’s times in the 100m sprint, as seen in previous labs. In the following, use pipes and dplyr verbs to solve each part as cleanly/succintly as you can. Note: you’ll receive warnings when you make joins about the conversion of factors to characters, and that’s fine, don’t worry about it.

sprint.w.df = read.table(
  file="http://www.stat.cmu.edu/~ryantibs/statcomp-F18/data/sprint.w.dat",
  sep="\t", header=TRUE, quote="", stringsAsFactors=TRUE)

More grouping and joining

Below is some solution code from Lab 8, where we convert the Birthdate and Date columns in the sprint.m.df and sprint.w.df data frames to numeric form. In what follows, you will resolve some of the questions from Lab 8, but using pipes and dplyr, tidyr.

date.to.numeric = function(val) {
  val = as.character(val)
  vec = strsplit(val, split  = "\\.")[[1]]
  if (nchar(vec[3]) == 2) vec[3] = paste0("19", vec[3])
  vec = as.numeric(vec)
  vec[3]*10^4 + vec[2]*10^2 + vec[1]
}

sprint.m.df$Birthdate = sapply(sprint.m.df$Birthdate, date.to.numeric)
sprint.m.df$Date = sapply(sprint.m.df$Date, date.to.numeric)
sprint.w.df$Birthdate = sapply(sprint.w.df$Birthdate, date.to.numeric)
sprint.w.df$Date = sapply(sprint.w.df$Date, date.to.numeric)

head(sprint.m.df, 5)
##   Rank Time Wind        Name Country Birthdate     City     Date
## 1    1 9.58  0.9  Usain Bolt     JAM  19860821   Berlin 20090816
## 2    2 9.63  1.5  Usain Bolt     JAM  19860821   London 20120805
## 3    3 9.69  0.0  Usain Bolt     JAM  19860821  Beijing 20080816
## 4    3 9.69  2.0   Tyson Gay     USA  19820809 Shanghai 20090920
## 5    3 9.69 -0.1 Yohan Blake     JAM  19891226 Lausanne 20120823
head(sprint.w.df, 5)
##   Rank  Time Wind                     Name Country Birthdate         City
## 1    1 10.49  0,0 Florence Griffith-Joyner     USA  19591221 Indianapolis
## 2    2 10.61 +1,2 Florence Griffith-Joyner     USA  19591221 Indianapolis
## 3    3 10.62 +1,0 Florence Griffith-Joyner     USA  19591221        Seoul
## 4    4 10.64 +1,2          Carmelita Jeter     USA  19791124     Shanghai
## 5    5 10.65 +1,1             Marion Jones     USA  19751012 Johannesburg
##       Date
## 1 19880716
## 2 19880717
## 3 19880924
## 4 20090920
## 5 19980912