Name:
Andrew ID:
Collaborated with:

This lab is to be completed in class. You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an Rmd file on Blackboard, by 11:59pm on the day of the lab.

There are Homework 3 questions dispersed throughout. These must be written up in a separate Rmd document, together with all Homework 3 questions from other labs. Your homework writeup must start as this one: by listing your name, Andrew ID, and who you collaborated with. You must submit your own homework as a knit HTML file on Blackboard, by 6pm on Sunday September 25. This document contains 19 of the 45 total points for Homework 3.

Important remark on compiling the homework: many homework questions depend on variables that are defined in the surrounding lab. It is easiest just to copy and paste the contents of all these labs into one big Rmd file, with your lab solutions and homework solutions filled in, knit it, and submit the HTML file.

Plot basics

n = 50
set.seed(0)
x = runif(n, min=-2, max=2)
y = x^3 + rnorm(n)
plot(x, y, type="p")

plot(x, y, type="l")

Hw3 Q1 (3 points). Again plot y versus x, only showing points whose x values are between -1 and 1. But this time, define x.trimmed to be the subset of x between -1 and 1, and define y.trimmed to be the corresponding subset of y. Then plot y.trimmed versus x.trimmed without setting xlim and ylim: now you should see that the y limit is (automatically) set as “tight” as possible. (Hint: use logical indexing to define x.trimmed, y.trimmed.)

plot(1:10, 1:10, pch=1:10)

displays the first 10 point types. If pch is a vector whose length is shorter than the total number of points to be plotted, then its entries are recycled, as appropriate. Plot y versus x, with the point type alternating in between an empty circle and a filled circle.

Hw3 Q2 (2 points). The col argument, recall, controls the color the points in the display. It operates similar to pch, in the sense that it can be a vector, and if the length of this vector is shorter than the total number of points, then it is recycled appropriately. Plot y versus x, and repeat the following pattern for the displayed points: a black empty circle, a blue filled circle, a black empty circle, a red filled circle.

Adding to plots

plot(1:10, 1:10, type="o")
legend("topleft", legend=c("Point", "Line"),  pch=c(21, NA), lty=c(NA, 1))

produces a legend with the first symbol being a point, and the second symbol a line. Now, reproduce the plot in the previous question, then add a legend to the top left corner. The legend text should be: “Data”, “Mean”, “Baseline”. The symbols corresponding to this text should be: an empty black circle, a thick red line, a dashed black line, respectively.

Hw3 Q3 (6 points). Produce a layered plot of y versus x, but with a gray tube displayed underneath the points. Specifically, this tube is defined by filling in the space between the two curves lines(x, x^3 + qnorm(0.10)) and lines(x, x^3 + qnorm(0.90)). (Hint: use polygon(); this function requires that the x coordinates of the polygon be passed in an appropriate order; you might find it useful to set use c(x, rev(x)) for the x coordinates.) Lastly, add a legend to the bottom right corner of the plot, with the text: “Data”, “Confidence band”, and corresponding symbols: an empty circle, a very thick gray line, respectively.

Text manipulation and plotting

sprint.tab = read.table("http://www.stat.cmu.edu/~ryantibs/statcomp-F16/data/sprint.dat",
                        sep="\t", quote="", header=TRUE)
head(sprint.tab)
##   Rank Time Wind        Name Country Birthdate     City       Date
## 1    1 9.58  0.9  Usain Bolt     JAM  21.08.86   Berlin 16.08.2009
## 2    2 9.63  1.5  Usain Bolt     JAM  21.08.86   London 05.08.2012
## 3    3 9.69  0.0  Usain Bolt     JAM  21.08.86  Beijing 16.08.2008
## 4    3 9.69  2.0   Tyson Gay     USA  09.08.82 Shanghai 20.09.2009
## 5    3 9.69 -0.1 Yohan Blake     JAM  26.12.89 Lausanne 23.08.2012
## 6    6 9.71  0.9   Tyson Gay     USA  09.08.82   Berlin 16.08.2009

Hw3 Q4 (8 points). Reproduce the previous plot, but leaving off the last bit of text (of the form “N men”). Define sprint.names to be the 4th column of sprint.tab, converted into a character vector. Then draw on the plot large blue empty circles around the times ran by Usain Bolt. (Hint: use appropriate indexing based sprint.names; and recall cex.) Also, identify the first man to break 10 seconds, and draw a large green circle around his time. (Hint: it is probably easiest just to define sprint.times.10, sprint.years.10, sprint.names.10 to be subsets of sprint.times, sprint.years, sprint.names, respectively, that correspond to the times under 10 seconds, and then work with the former set of vectors.) Lastly, add a legend to the bottom left corner of the plot, with legend text: “Usain Bolt”, and the name of the man who first broke 10 seconds, and corresponding symbols: a large blue empty circle, and a large green empty circle, respectively.