Name:
Andrew ID:
Collaborated with:

This lab is to be completed in class. You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an Rmd file on Blackboard, by 11:59pm on the day of the lab.

There are Homework 6 questions dispersed throughout. These must be written up in a separate Rmd document, together with all Homework 6 questions from other labs. Your homework writeup must start as this one: by listing your name, Andrew ID, and who you collaborated with. You must submit your own homework as a knit HTML file on Blackboard, by 6pm on Sunday October 16. This document contains 15 of the 45 total points for Homework 6.

The states data frame

state.df = data.frame(state.x77, Region=state.region, Division=state.division)
head(state.df)
##            Population Income Illiteracy Life.Exp Murder HS.Grad Frost
## Alabama          3615   3624        2.1    69.05   15.1    41.3    20
## Alaska            365   6315        1.5    69.31   11.3    66.7   152
## Arizona          2212   4530        1.8    70.55    7.8    58.1    15
## Arkansas         2110   3378        1.9    70.66   10.1    39.9    65
## California      21198   5114        1.1    71.71   10.3    62.6    20
## Colorado         2541   4884        0.7    72.06    6.8    63.9   166
##              Area Region           Division
## Alabama     50708  South East South Central
## Alaska     566432   West            Pacific
## Arizona    113417   West           Mountain
## Arkansas    51945  South West South Central
## California 156361   West            Pacific
## Colorado   103766   West           Mountain

Hw6 Q1 (6 points). Plot the state centers in state.df, i.e., plot the Center.y column (on the y-axis) versus the Center.x column (on the x-axis). Use the regular point type, but set cex=5 to get very large empty circles. Then, in the center of these empty circles, draw the state abbreviations. (Hint: use text().) Label the axes and title the plot appropriately.

Now let’s do something more interesting with colors. Plot the state centers, with cex=5 again, but this time use filled circles, having a colors that reflect the values in the Frost column. The highest Frost value should be assigned a light blue color, and the lowest Frost value a pink color, with appropriate interpolation of colors in between. (Hint: recall customRampPalette(), and the function get.col.from.val(), from the “Curves, Surfaces, and Colors” mini-lecture.) Then, again, in the center of these empty circles, draw the state abbreviations, label the axes, and title the plot appropriately. Does the plot make sense to you, i.e., do you see an expected geographic pattern, where Frost (the average number of days with minimum temperature below freezing) tends to be highest?

Access tasks with the states data frame

Hw6 Bonus. In each of the last two plots, add the line of best fit.

Hw6 Q2 (4 points). Recall in the “Data Frames” mini-lecture we saw that the apply() function could be used on columns (or rows, as well) of data frames. E.g., the code below calculates the maximum value of each of the first 8 numeric variables in state.df, which are, recall, just taken from the matrix state.x77.

head(state.x77) # We'll consider only the numeric variables
##            Population Income Illiteracy Life Exp Murder HS Grad Frost
## Alabama          3615   3624        2.1    69.05   15.1    41.3    20
## Alaska            365   6315        1.5    69.31   11.3    66.7   152
## Arizona          2212   4530        1.8    70.55    7.8    58.1    15
## Arkansas         2110   3378        1.9    70.66   10.1    39.9    65
## California      21198   5114        1.1    71.71   10.3    62.6    20
## Colorado         2541   4884        0.7    72.06    6.8    63.9   166
##              Area
## Alabama     50708
## Alaska     566432
## Arizona    113417
## Arkansas    51945
## California 156361
## Colorado   103766
max.vals = apply(state.x77, 2, max) # Compute the max of each column
max.vals
## Population     Income Illiteracy   Life Exp     Murder    HS Grad 
##    21198.0     6315.0        2.8       73.6       15.1       67.3 
##      Frost       Area 
##      188.0   566432.0

Using apply(), compute which state achieves the max value in each column, saving this as max.state. (Hint: this should only take one line of code.) Then compute the min value in each column, and which state achieves the min for each column, saving these as min.vals and min.state, respectively. Finally, create a new data frame, called state.extremes: it should have 8 rows, and 4 columns. The rows should be assigned the names of the first 8 numeric columns in state.df, and the columns should be assigned the names “Max.Value”, “Max.State”, “Min.Value”, and “Min.State”. The columns should be populated using max.vals, max.state, min.vals, and min.state. Display the entries of this new data frame.

The strikes data frame

strike.df = read.csv("http://www.stat.cmu.edu/~ryantibs/statcomp-F16/data/strikes.csv")
class(strike.df)
## [1] "data.frame"
head(strike.df)
##     country year strike.volume unemployment inflation left.parliament
## 1 Australia 1951           296          1.3      19.8            43.0
## 2 Australia 1952           397          2.2      17.2            43.0
## 3 Australia 1953           360          2.5       4.3            43.0
## 4 Australia 1954             3          1.7       0.7            47.0
## 5 Australia 1955           326          1.4       2.0            38.5
## 6 Australia 1956           352          1.8       6.3            38.5
##   centralization density
## 1      0.3748588      NA
## 2      0.3751829      NA
## 3      0.3745076      NA
## 4      0.3710170      NA
## 5      0.3752675      NA
## 6      0.3716072      NA

Hw6 Q3 (5 points). Write a function called country.var.summary() that takes the following inputs: strike.df, the strikes data frame; where, a string giving the name of a country that appears in the strike data frame, with a default value of “USA”; what, a string giving name of a variable that appears in the strikes data frame, in columns 3 through 8, with a default value of “strike.volume”; and plot.it, a boolean signaling whether we should produce a plot, with a default value of TRUE. As a side effect, if plot.it is TRUE, then the function should produce a plot of the specified variable versus the year, for the specified country. The labels and title should be set appropriately. The output of the function should be a vector of summary statistics on the specified variable, in the specified country, as computed by summary(). As an example, your function should produce the same plot as in the last question when country="Canada" and var="unemployment", and its output should be as follows.

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  2.400   4.350   5.600   6.043   7.250  11.800 

After you’ve written it, use your function to produce plots and summaries of the strike volume in France and the US. Then use it to produce plots and summaries of the unemployment rate in Italy and Germany. Then use it to produce summaries (no plots) of the inflation rate in Denmark and Finland.