Name:
Andrew ID:
Collaborated with:
This lab is to be completed in class. You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an Rmd file on Blackboard, by 11:59pm on the day of the lab.
There are Homework 10 questions dispersed throughout. These must be written up in a separate Rmd document, together with all Homework 10 questions from other labs. Your homework writeup must start as this one: by listing your name, Andrew ID, and who you collaborated with. You must submit your own homework as a knit HTML file on Blackboard, by 11:59pm on Tuesday November 15. This document contains 22 of the 45 total points for Homework 10.
Data on the political economy of strikes, as described in the “Split-Apply-Combine” mini-lectures, is up at http://www.stat.cmu.edu/~ryantibs/statcomp-F16/data/strikes.csv. Read this into your R session and call the resulting data frame strikes.df
. Check that is has 625 rows and 8 columns, and display its first 5 rows.
Split strikes.df
by country, using the split()
function. Call the resulting list strikes.by.country
, and show the names of elements the list, as well as the first 5 rows of the data frame for Canada.
Using strikes.by.country
and sapply()
, compute the average unemployment rate for each country. What country has the highest average unemployment rate? The lowest?
Hw10 Bonus. Using the map()
function from the maps
package, draw a map of the world, with the countries in the strikes.df
data frame colored according to their average unemployment rate. For the color palette, use terrain.colors()
. For all countries not found in the strikes.df
data frame, color them in gray.
Using strikes.by.country
and sapply()
, compute a summary (min, quartiles, max) of the unemployment rate for each country. Study the output—do its dimensions make sense to you?
Using strikes.by.country
and just one call to sapply()
, compute the average unemployment rate, inflation rates, and strike volume for each country. The output should be a matrix of dimension 3 x 18. Also, with just the one call to sapply()
, figure out how to make the output matrix have appropriate row names (to your choosing).
Hw10 Q1 (8 points). Using split()
and sapply()
, compute the average unemployment rate, inflation rates, and strike volume for each year in the strikes.df
data set. The output should be a matrix of dimension 3 x 35. Show the columns for 1960, 1977, 1980, 1985. Then, display the average unemployment rate by year and the average inflation rate by year, in the same plot. Label the axes and title the plot appropriately. Include an informative legend.
Using strikes.df
, split()
, and sapply()
, compute the average inflation rate for each country, pre and post 1975. The output should be a numeric vector of length 36. (Hint: the hard part here is the splitting. There are several ways to do this. One way is as follows: define a new column (say) yearPre1975
to be the indicator that the year
column is less than or equal to 1975. Then define a new column (say) countryPre1975
to be the string concatenation of the country
and yearPre1975
columns. Then split on countryPre1975
and proceed as usual.)
Using the result from the last question, compute for each country the difference in average unemployment post and pre 1975. Which country had the biggest increase in average unemployment from pre to post 1975? The biggest decrease?
Hw10 Q2 (4 points). Show how to compute the average inflation rate for each country pre and post 1975, from strikes.df
, using a single call to daply()
, i.e., without using any auxiliary columns in strikes.df
, like the ones you created in yearPre1975
, countryPre1975
. You will need to have gone through the “Plyr: d\*ply()
” mini-lecture to do this question, so you might want to come to this one after class on Wednesday or Friday. (Hint: recall the function I()
.) Check that the results are the same as those you computed above, with split()
and sapply()
.
In the “Split-Apply-Combine” mini-lecture, we computed the coefficients from regressing strike.volume
onto left.parliament
, separately for each country in the strikes.df
data frame. Following this code structure, regress strike.volume
onto left.parliament
, unemployment
, and inflation
, separately for each country. The output should be a matrix of dimension 4 x 18 (1 row for the intercept, then 3 rows for the coefficients of left.parliament
, unemployment
, inflation
). Display the columns for Belgium, Canada, UK, and USA.
Following the code at the end of the “Split-Apply-Combine” mini-lecture, plot the coefficients of left.parliament
, from the countrywise regressions of strike.volume
onto left.parliament
, unemployment
, inflation
.
Hw10 Q3 (10 points). Modify your code for computing the coefficients from regresssing strike.volume
onto left.parliament
, unemployment
, and inflation
, separately for each country in the strikes.df
data frame, so that instead of just reporting the coefficients, you also report their standard errors. (Hint: you will need to figure out how to extract the standard errors from the call summary()
on the object returned by lm()
. Look at the solution to one of the bonus questions on Hw9.) The output should be a matrix of dimension 8 x 18 (1 row for the intercept, 3 rows for the coefficients of left.parliament
, unemployment
, inflation
, and 4 rows for their standard errors). Display the columns for Belgium, Canada, UK, and USA.
Finally, reproduce your plot from the last question of the coefficients of left.parliament
, from the countrywise regressions of strike.volume
onto left.parliament
, unemployment
, inflation
. But now on top of each point—denoting a coefficient value of left.parliament
for a different country—draw a vertical line segment through this point, extending from the coefficient value minus one standard error to the coefficient value plus one standard error. (Hint: segments()
.) Make sure that these line segments to not extend past the y limits on your plot. For how many countries do their line segments (from the coefficient value minus one standard error to the coefficient value plus one standard error) not intersect the 0 line? Which ones are they?