Name:
Andrew ID:
Collaborated with:
This lab is to be completed in class. You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an Rmd file on Blackboard, by 11:59pm on the day of the lab.
There are Homework 3 questions dispersed throughout. These must be written up in a separate Rmd document, together with all Homework 3 questions from other labs. Your homework writeup must start as this one: by listing your name, Andrew ID, and who you collaborated with. You must submit your own homework as a knit HTML file on Blackboard, by 6pm on Sunday September 25. This document contains 19 of the 45 total points for Homework 3.
Important remark on compiling the homework: many homework questions depend on variables that are defined in the surrounding lab. It is easiest just to copy and paste the contents of all these labs into one big Rmd file, with your lab solutions and homework solutions filled in, knit it, and submit the HTML file.
plot()
result with with type="p"
look normal, but the plot()
result with type="l"
look abnormal, having crossing lines? Then: modify the code below (hint: modify the definition of x
), so that the lines on the second plot do not cross.n = 50
set.seed(0)
x = runif(n, min=-2, max=2)
y = x^3 + rnorm(n)
plot(x, y, type="p")
plot(x, y, type="l")
The cex
argument can used to shrink or expand the size of the points that are drawn. Its default value is 1 (no shrinking or expansion). Values between 0 and 1 will shrink points, and values larger than 1 will expand points. Plot y
versus x
, first with cex
equal to 0.5 and then 2 (so, two separate plots). Give titles “Shrunken points”, and “Expanded points”, to the plots, respectively.
The xlim
and ylim
arugments can be used to change the limits on the x-axis and y-axis, repsectively. Each argument takes a vector of length 2, as in xlim = c(-1, 0)
, to set the x limit to be from -1 to 0. Plot y
versus x
, with the x limit set to be from -1 to 1, and the y limit set to be from -5 to 5. Assign x and y labels “Trimmed x” and “Trimmed y”, respectively.
Hw3 Q1 (3 points). Again plot y
versus x
, only showing points whose x values are between -1 and 1. But this time, define x.trimmed
to be the subset of x
between -1 and 1, and define y.trimmed
to be the corresponding subset of y
. Then plot y.trimmed
versus x.trimmed
without setting xlim
and ylim
: now you should see that the y limit is (automatically) set as “tight” as possible. (Hint: use logical indexing to define x.trimmed
, y.trimmed
.)
pch
argument, recall, controls the point type in the display. In the “Plot Basics” mini-lecture, we set it to a single number. But it can also be a vector of numbers, with one entry per point in the plot. So, e.g.,plot(1:10, 1:10, pch=1:10)
displays the first 10 point types. If pch
is a vector whose length is shorter than the total number of points to be plotted, then its entries are recycled, as appropriate. Plot y
versus x
, with the point type alternating in between an empty circle and a filled circle.
Hw3 Q2 (2 points). The col
argument, recall, controls the color the points in the display. It operates similar to pch
, in the sense that it can be a vector, and if the length of this vector is shorter than the total number of points, then it is recycled appropriately. Plot y
versus x
, and repeat the following pattern for the displayed points: a black empty circle, a blue filled circle, a black empty circle, a red filled circle.
Reproduce your previous plot of y
versus x
, with the x limit set to be from -1 to 1, and the y limit set to be from -5 to 5. Add the curve \(y = x^3\) to the plot, using lines()
, and have the curve be drawn in red with twice the normal thickness: recall the arguments col
, lwd
. Also add a straight horizontal line at 0 to the plot, using abline()
, and have the line be dashed: recall lty
.
The legend()
function, recall, adds a legend to an existing plot. In the “Adding to Plots” mini-lecture, we just set lty
to a single number, but lty
, lwd
, pch
, and col
can all be vectors, whose length is equal to the length of the legend
argument (if any of these are shorter, then they are recycled). They are used to indicate the symbols that should be matched to the text in legend
. So, e.g.,
plot(1:10, 1:10, type="o")
legend("topleft", legend=c("Point", "Line"), pch=c(21, NA), lty=c(NA, 1))
produces a legend with the first symbol being a point, and the second symbol a line. Now, reproduce the plot in the previous question, then add a legend to the top left corner. The legend text should be: “Data”, “Mean”, “Baseline”. The symbols corresponding to this text should be: an empty black circle, a thick red line, a dashed black line, respectively.
Hw3 Q3 (6 points). Produce a layered plot of y
versus x
, but with a gray tube displayed underneath the points. Specifically, this tube is defined by filling in the space between the two curves lines(x, x^3 + qnorm(0.10))
and lines(x, x^3 + qnorm(0.90))
. (Hint: use polygon()
; this function requires that the x coordinates of the polygon be passed in an appropriate order; you might find it useful to set use c(x, rev(x))
for the x coordinates.) Lastly, add a legend to the bottom right corner of the plot, with the text: “Data”, “Confidence band”, and corresponding symbols: an empty circle, a very thick gray line, respectively.
sprint.tab = read.table("http://www.stat.cmu.edu/~ryantibs/statcomp-F16/data/sprint.dat",
sep="\t", quote="", header=TRUE)
head(sprint.tab)
## Rank Time Wind Name Country Birthdate City Date
## 1 1 9.58 0.9 Usain Bolt JAM 21.08.86 Berlin 16.08.2009
## 2 2 9.63 1.5 Usain Bolt JAM 21.08.86 London 05.08.2012
## 3 3 9.69 0.0 Usain Bolt JAM 21.08.86 Beijing 16.08.2008
## 4 3 9.69 2.0 Tyson Gay USA 09.08.82 Shanghai 20.09.2009
## 5 3 9.69 -0.1 Yohan Blake JAM 26.12.89 Lausanne 23.08.2012
## 6 6 9.71 0.9 Tyson Gay USA 09.08.82 Berlin 16.08.2009
Define sprint.times
to be the 2nd column of sprint.tab
. Define sprint.dates
to be the 8th column of sprint.tab
, converted into a character vector. Define sprint.years
to be a character vector, with the last 4 characters of each entry of sprint.dates
. (Hint: have you forgotten? Better not! Use substr()
.) Finally, convert sprint.years
into a numeric vector.
Plot sprint.times
versus sprint.years
. For the point type, use small, filled black circles. Label the x-axis “Year” and the y-axis “Time (seconds)”. Title the plot “The 2829 fastest 100m sprint times”. Draw a dashed red horizontal line at 10 seconds. Below this line, draw in text on the plot “N men”, replacing “N” here by the number of men who have run under 10 seconds. (Hint: have you forgotten? Better not! Use paste()
.)
Hw3 Q4 (8 points). Reproduce the previous plot, but leaving off the last bit of text (of the form “N men”). Define sprint.names
to be the 4th column of sprint.tab
, converted into a character vector. Then draw on the plot large blue empty circles around the times ran by Usain Bolt. (Hint: use appropriate indexing based sprint.names
; and recall cex
.) Also, identify the first man to break 10 seconds, and draw a large green circle around his time. (Hint: it is probably easiest just to define sprint.times.10
, sprint.years.10
, sprint.names.10
to be subsets of sprint.times
, sprint.years
, sprint.names
, respectively, that correspond to the times under 10 seconds, and then work with the former set of vectors.) Lastly, add a legend to the bottom left corner of the plot, with legend text: “Usain Bolt”, and the name of the man who first broke 10 seconds, and corresponding symbols: a large blue empty circle, and a large green empty circle, respectively.