Processing math: 100%

Name:
Andrew ID:
Collaborated with:

This lab is to be completed in class. You can collaborate with your classmates, but you must identify their names above, and you must submit your own lab as an Rmd file on Blackboard, by 11:59pm on the day of the lab.

There are Homework 6 questions dispersed throughout. These must be written up in a separate Rmd document, together with all Homework 6 questions from other labs. Your homework writeup must start as this one: by listing your name, Andrew ID, and who you collaborated with. You must submit your own homework as a knit HTML file on Blackboard, by 6pm on Sunday October 16. This document contains 16 of the 45 total points for Homework 6.

Don’t use apply(), just yet!

head(state.x77)
##            Population Income Illiteracy Life Exp Murder HS Grad Frost
## Alabama          3615   3624        2.1    69.05   15.1    41.3    20
## Alaska            365   6315        1.5    69.31   11.3    66.7   152
## Arizona          2212   4530        1.8    70.55    7.8    58.1    15
## Arkansas         2110   3378        1.9    70.66   10.1    39.9    65
## California      21198   5114        1.1    71.71   10.3    62.6    20
## Colorado         2541   4884        0.7    72.06    6.8    63.9   166
##              Area
## Alabama     50708
## Alaska     566432
## Arizona    113417
## Arkansas    51945
## California 156361
## Colorado   103766

Hw6 Q4 (3 points). For each variable, what is the state that achieves the largest value? What is the state that achieves the smallest value? (Hint: you’re allowed to use the transpose operator, t(), here.)

Hw6 Q5 (3 points). For each variable, what is the standard deviation, and the mean absolute deviation? (Hint: recall that the mean absolute deviation of a vector is the average of the absolute differences between its entries and the average. You’re allowed to use the function scale(), here, which—with the input center set to TRUE and the input scale set to FALSE—returns a matrix where each column has been centered, i.e., has had its mean subtracted out of it.)

OK, now let’s use apply()!

plot(xvec <- seq(-3,3,length=101), xvec^3)
text(-1, 15, paste("Pearson's correlation:", 
                   round(cor(xvec, xvec^3, method="pearson"),3))) # Default
text(-1, 20, paste("Spearman's correlation:", 
                   round(cor(xvec, xvec^3, method="spearman"),3)))

Modify your function cor.v1.v2() so that it takes a third argument, method, whose default value is “pearson”, but that can also be “spearman” (or “kendall”), and signifies what type of correlation we should be computing with cor(). Check that cor.v1.v2(v1=state.x77[,"Life Exp"], method="spearman") gives you 0.298391, and cor.v1.v2(v1=state.x77[,"Frost"], method="spearman") still gives you 1.

Hw6 Q6 (10 points). The Spearman correlations between Illiteracy and Frost, as well as Murder and Frost, both look reasonably large in absolute value (so do the Pearson correlations, but let’s suppose the Spearman correlations are more interesting for now.) However, it’s hard to judge them without assigning some notion of variability to our computed Spearman correlations. The jackknife is a super neat tool for doing just this. Here is a general description of how the jackknife works.

Write a function called cor.v1.v2.jack(), which takes the same inputs as your latest version of cor.v1.v2(), but instead of computing a correlation between v1 and v2 (of the specified type, in method), the function cor.v1.v2.jack() should compute a jackknife standard error of this correlation. (Hint: you’ll want to use a for() loop, in which you leave out one point at a time; in the body of this for() loop, you can in fact just call cor.v1.v2(). Then collect the correlations, and compute the jackknife standard error per the above formula.) Check that cor.v1.v2.jack(v1=state.x77[,"Life Exp"], method="spearman") gives you 0.1540541, and cor.v1.v2.jack(v1=state.x77[,"Frost"], method="spearman") still gives you 0. Explain why it makes sense to get 0, for this last jackknife standard error.

Finally, using apply() and cor.v1.v2.jack(), compute the jackknife standard errors of the Spearman correlations between each one of the 8 variables in the state.x77 matrix and the Frost variable. Create a matrix of dimension 3 x 8, where the first row is the Spearman correlation between each variable and Frost, the second row is the Spearman correlation plus twice the jackknife standard error of this Spearman correlation, and the third row is the Spearman correlation minus twice the jackknife standard error of this Spearman correlation. Assign row names “Spear cor”, “Spear cor + 2se”, and “Spear cor - 2se”. Display this matrix. You can think of each Spearman correlation plus/minus 2 jackknife standard errors as defining a (rough) 95% confidence interval—which variables have confidence intervals that do not contain zero?