Name:
Andrew ID:
Collaborated with:

This lab is to be completed in class. You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an Rmd file on Blackboard, by 11:59pm on the day of the lab.

There are Homework 6 questions dispersed throughout. These must be written up in a separate Rmd document, together with all Homework 6 questions from other labs. Your homework writeup must start as this one: by listing your name, Andrew ID, and who you collaborated with. You must submit your own homework as a knit HTML file on Blackboard, by 6pm on Sunday October 16. This document contains 14 of the 45 total points for Homework 6.

Back to the states data frame

head(state.x77)
##            Population Income Illiteracy Life Exp Murder HS Grad Frost
## Alabama          3615   3624        2.1    69.05   15.1    41.3    20
## Alaska            365   6315        1.5    69.31   11.3    66.7   152
## Arizona          2212   4530        1.8    70.55    7.8    58.1    15
## Arkansas         2110   3378        1.9    70.66   10.1    39.9    65
## California      21198   5114        1.1    71.71   10.3    62.6    20
## Colorado         2541   4884        0.7    72.06    6.8    63.9   166
##              Area
## Alabama     50708
## Alaska     566432
## Arizona    113417
## Arkansas    51945
## California 156361
## Colorado   103766

Hw6 Q7 (2 points). For each division, compute and display the median graduate-by-literate percentage. (Hint: use state.df.by.div and sapply().) Which division has the highest median graduate-by-literate percentage?

Hw6 Q8 (2 points). For each division, compute and display the median HS graduation percentage. Do so using sapply() on state.df.by.div, with the FUN input defined “on-the-fly”. Are these percentages generally higher or lower than the median graduate-by-literate percentages, and are you surprised by this result? Which division has the highest median HS graduation percentage?

The sprints data frame

sprint.df = read.table("http://www.stat.cmu.edu/~ryantibs/statcomp-F16/data/sprint.dat",
                       sep="\t", quote="", header=TRUE)
class(sprint.df)
## [1] "data.frame"
head(sprint.df)
##   Rank Time Wind        Name Country Birthdate     City       Date
## 1    1 9.58  0.9  Usain Bolt     JAM  21.08.86   Berlin 16.08.2009
## 2    2 9.63  1.5  Usain Bolt     JAM  21.08.86   London 05.08.2012
## 3    3 9.69  0.0  Usain Bolt     JAM  21.08.86  Beijing 16.08.2008
## 4    3 9.69  2.0   Tyson Gay     USA  09.08.82 Shanghai 20.09.2009
## 5    3 9.69 -0.1 Yohan Blake     JAM  26.12.89 Lausanne 23.08.2012
## 6    6 9.71  0.9   Tyson Gay     USA  09.08.82   Berlin 16.08.2009

Hw6 Q9 (2 points). Compute, from sprint.df and the newly created Year column, the fastest 100m sprint time in each year of the data frame, calling the result fast.time.by.year. Plot this by year, as in the last question. Has the fastest sprint time roughly gone down, or gone up, over the years?

Hw6 Bonus. Given a set of x,y pairs, the greatest convex minorant is defined as the biggest convex function that lies below the graph of the x,y pairs. We call particular x,y points extreme points if the greatest convex minorant passes through these points. Looking at your plot from the last question (so that x denotes years and y the fastest sprint times), which points are extreme points? You can answer this visually, or programmatically. Which runner was responsible for such times, and what does that roughly suggest about their performances?

Hw6 Q10 (8 points). Read in the data table at “http://www.stat.cmu.edu/~ryantibs/statcomp-F16/data/sprint.w.dat”, on 2018 women’s 100m sprint times, saving it as a data frame sprint.w.df. Repeat the steps leading up through Hw6 Q9, to produce fast.w.time.by.year, a vector with the fastest women’s sprint time in each year, and plot this by year. Does it look like the fastest sprint time for women has roughly gone down, or gone up, over the years?

Finally, produce a single plot that shows both the trends fast.time.by.year by year and fast.w.time.by.year by year. Make sure to set the x and y limits so that all points are visible. Use different colors for the men times and the women times, and draw a legend indicating what is what. Label the axes and title the plot appropriately.