Name:
Andrew ID:
Collaborated with:

This lab is to be completed in class. You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an Rmd file on Blackboard, by 11:59pm on the day of the lab.

There are Homework 10 questions dispersed throughout. These must be written up in a separate Rmd document, together with all Homework 10 questions from other labs. Your homework writeup must start as this one: by listing your name, Andrew ID, and who you collaborated with. You must submit your own homework as a knit HTML file on Blackboard, by 11:59pm on Tuesday November 15. This document contains 23 of the 45 total points for Homework 10.

Split-apply-combine practice with the debt data

Gross domestic product (GDP) is a measure of the total market value of all goods and services produced in a given country in a given year. The percentage growth rate of GDP in year \(t\) is \[ 100 \cdot \left(\frac{GDP_{t+1} - GDP_{t}}{GDP_{t}}\right) - 100 \] (This formula is not important for your lab, it is just given by way of background.) An important claim in economics is that the rate of GDP growth is closely related to the level of government debt, specifically with the ratio of the government’s debt to the GDP. Data on GDP growth and the debt-to-GDP ratio for twenty countries around the world, between the 1940s to 2010, is available at http://stat.cmu.edu/~ryantibs/statcomp-F16/data/debt.csv. Note that not every country has data for the same years, and some years in the middle of the period are missing data for some countries but not others. Read this data into your R session, call the resulting data frame debt.df, check that it has dimension 1171 x 4, and display its first 4 rows.
Install the package plyr if you haven’t done so already, and load it into your R session with library(plyr). Use daply() to calculate the average GDP growth rate for each country in debt.df (averaging over years). Then use ddply() and dlply() to calculate the same results, but just in different formats.
Show how the same result as in the last question can be computed with the built-in apply functions in R. (Hint: you’ll need one extra step, to split according to country.)
Now use daply() to calculate average GDP growth rate for each year in debt.df (averaging over countries). Check that the average growth rates for 1972 and 1989 are 5.6300 and 3.1868, respectively. Then, plot the average growth rate versus the year. Label the axes and title the plot appropriately. What does the general trend appear to be?
An overall trend seems to be there, but how certain are we of the individual yearwise averages? To address this, it is useful to plot “error bars” on top of each average. To this end, also calculate the standard error of the average GDP growth rates for each year in debt.df. To be precise, if \(\bar{x}\) is the average of numbers \(x_1,\ldots,x_n\), then the standard error of \(\bar{x}\) is defined to be \(\hat\sigma/\sqrt{n}\), where \(\hat\sigma\) is the standard deviation of \(x_1,\ldots,x_n\). Check that the standard errors for 1972 and 1989 are 0.5440 and 0.3912, respectively.
Now reproduce your previous plot of average GDP growth rates versus years in debt.df, but now on top of each point—denoting an average growth rate for a particular year—draw a vertical line segment through this point, extending from the average growth rate minus one standard error to the average growth rate plus one standard error (Hint: segments().) Make sure that these line segments to not extend past the y limits on your plot.

Investigating correlations in the debt data

Using daply() to calculate the correlation between GDP growth rate and debt-to-GDP ratio for each country in debt.df. As a check, the mean of these countrywise correlations should be -0.1778. Plot a histogram of these correlations, with 10 breaks, and an appropriate x-axis label and title. Are there any countries whose correlation stands out (large positive or negative)? If so, which ones?

Hw10 Q4 (8 points). There are 4 countries whose correlations, between GDP growth rate and debt-to-GDP ratio, are less than -0.5. Identify them, and define debt.df.low to be the subset of rows of debt.df corresponding to the data from these 4 countries. Then, using a single call to d_ply() on debt.df.low, produce a separate scatter plot for each country of the GDP growth rate versus debt-to-GDP ratio, over the years in which these were observed. You should thus have 4 plots in total, arranged in a 2 x 2 plotting grid. Each plot should have appropriately labeled x- and y-axes, and should have an appropriate title portraying the country’s name. Also, on each plot, draw the line-of-best-fit (linear regression line, from regressing growth onto ratio) in red, on top of the scatter points.

Hw10 Bonus.* Using just one line of code, in which you call one of the d*ply() functions, create a matrix whose entries are GDP growth by year (rows) and country (columns). Check that it has dimension 64 x 20. Show the first 6 rows and 6 columns.

Economists: which ones are right?

Some economists claim that high levels of government debt leads to slower growth. Other economists claim that low economic growth just propagates forward. The debt data lets us relate (say) this year’s debt to this year’s growth rate; but to investigate economists’ claims, we need to relate this year’s debt to next year’s growth. First, create a new data frame debt.df.france which contains just the rows of debt.df for France. Check that it has dimension 54 rows and 4 columns, and display its first 5 rows.
Create a new column in debt.df.france, called next.growth, which gives next year’s growth if the next year is in the data frame, or NA if the next year is missing. Make sure that your construction of the next.growth column is entirely programmatic, i.e., nothing “by hand”, so you should be determining programmatically if the next year is in the data frame. (Hint: you may rely on the fact that the rows of the data frame are sorted by years.) To check your answers, next.growth for 1971 should be 5.8858, but for 1972 it should be NA.

Hw10 Q5 (8 points). Add a next.growth column, as you did in the last question, but now to the whole debt.df data frame. Make sure that you do not accidentally put the first growth value for one country as the next.growth value for another. So, to check your answers, the next.growth for France in 2009 should be NA, not 9.1670. (Hint: write a function to encapsulate what you did in the last question, and then use ddply().) Show the first 5 and last 5 rows of the modified debt.df data frame.

Hw10 Q6 (7 points). Plot next year’s GDP growth against this year’s debt ratio, over all the data in debt.df, with appropriate axes labels and an appropriate title. Report the coefficients from regressing next year’s growth rate on the current year’s debt ratio, again over all the data in debt.df. Add this regression line to your plot.

Then, plot next year’s GDP growth against the current year’s GDP growth. Similarly, report the coefficients from regressing next year’s growth rate onto this year’s growth rate, and add this regression line to your plot.

Can you tell, from comparing the latter two regressions, whether current growth or current debt is a better predictor of future growth?

Hw10 Bonus. Add a new column called delta.growth to the debt.df data frame, giving the difference between next year’s GDP growth rate and this year’s GDP growth rate. Then, report the coefficients from regressing the change in GDP growth on the current GDP growth rate and the current debt-to-GDP ratio, over all the data in debt.df.

Some economists have claimed that there is a “tipping point”, or even a “point of no return”, when the ratio of government debt-to-GDP crosses 90%, above which growth slows dramatically or even becomes negative. Add an indicator column high.debt to the debt.df data frame, that takes the value TRUE when the debt-to-GDP ratio is over 90% and FALSE otherwise. Now regress the change in GDP growth on the current GDP growth rate, the current debt-to-GDP ratio, as well as the indicator that the debt is above 90%. Report the coefficients. What does the coefficient of the indicator variable high.debt tell you about the claim?

Lab 11f: Plyr: `d*ply()`

Statistical Computing, 36-350

Friday November 11, 2016

Split-apply-combine practice with the debt data

Investigating correlations in the debt data

Economists: which ones are right?

Lab 11f: Plyr: d*ply()

Statistical Computing, 36-350

Friday November 11, 2016

Split-apply-combine practice with the debt data

Investigating correlations in the debt data

Economists: which ones are right?

Lab 11f: Plyr: `d*ply()`