Name:
Andrew ID:
Collaborated with:

This lab is to be done in class (completed outside of class if need be). You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an knitted HTML file on Canvas, by Sunday 11:59pm, this week. Make sure to complete your weekly check-in (which can be done by coming to lecture, recitation, lab, or any office hour), as this will count a small number of points towards your lab score.

This week’s agenda: learning to master pipes and dplyr.

# Load the tidyverse!
library(tidyverse)

Pipes to base R

For each of the following code blocks, which are written with pipes, write equivalent code in base R (to do the same thing).

# Pipes:
letters %>%
  toupper %>%
  paste(collapse="+") 
## [1] "A+B+C+D+E+F+G+H+I+J+K+L+M+N+O+P+Q+R+S+T+U+V+W+X+Y+Z"
# Base R:
# Pipes:
"     Ceci n'est pas une pipe     " %>% 
  gsub("une", "un", .) %>%
  trimws
## [1] "Ceci n'est pas un pipe"
# Base R:
# Pipes:
rnorm(1000) %>% 
  hist(breaks=30, main="N(0,1) draws", col="pink", prob=TRUE) 

# Base R:
# Pipes:
rnorm(1000) %>% 
  hist(breaks=30, plot=FALSE) %>%
  `[[`("density") %>%
  max
## [1] 0.45
# Base R:

Base R to pipes

For each of the following code blocks, which are written in base R, write equivalent code with pipes (to do the same thing).

# Base R:
paste("Your grade is", sample(c("A","B","C","D","R"), size=1))
## [1] "Your grade is R"
# Pipes:
# Base R: 
state.name[which.max(state.x77[,"Illiteracy"])] 
## [1] "Louisiana"
# Pipes:
str.url = "http://www.stat.cmu.edu/~ryantibs/statcomp-F19/data/trump.txt"

# Base R:
lines = readLines(str.url)
text = paste(lines, collapse=" ")
words = strsplit(text, split="[[:space:]]|[[:punct:]]")[[1]]
wordtab = table(words)
wordtab = sort(wordtab, decreasing=TRUE)
head(wordtab, 10)
## words
##       the  and   of   to  our will    I   in have 
##  592  189  146  127  126   90   83   73   69   58
# Pipes:
# Base R:
lines = readLines(str.url)
text = paste(lines, collapse=" ")
words = strsplit(text, split="[[:space:]]|[[:punct:]]")[[1]]
words = words[words != ""]
wordtab = table(words)
wordtab = sort(wordtab, decreasing=TRUE)
head(wordtab, 10)
## words
##  the  and   of   to  our will    I   in have    a 
##  189  146  127  126   90   83   73   69   58   51
# Pipes:

Sprints data, revisited

Below we read in a data frame sprint.w.df containing the top women’s times in the 100m sprint, as seen in previous labs. We also define a function factor.to.numeric() that was used in Lab 8, to convert the Wind column to numeric values. In what follows, use dplyr and pipes to answer the following questions on sprint.w.df.

sprint.w.df = read.table(
  file="http://www.stat.cmu.edu/~ryantibs/statcomp-F19/data/sprint.w.dat",
  sep="\t", header=TRUE, quote="", stringsAsFactors=TRUE)

factor.to.numeric = Vectorize(function(x) {
  x = strsplit(as.character(x), split = ",")[[1]]
  ifelse(length(x) > 1, 
         as.numeric(paste(x, collapse=".")), 
         as.numeric(x))
})

Prostate cancer data, revisited

Below we read in a data frame pros.df containing measurements on men with prostate cancer, as seen in previous labs. As before, in what follows, use dplyr and pipes to answer the following questions on pros.df.

pros.df = 
  read.table("http://www.stat.cmu.edu/~ryantibs/statcomp-F19/data/pros.dat")