36-467/667
Lecture 6 (17 September 2020)
\[ \newcommand{\X}{\mathbf{x}} \newcommand{\w}{\mathbf{w}} \newcommand{\V}{\mathbf{v}} \newcommand{\S}{\mathbf{s}} \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\TrueRegFunc}{\mu} \newcommand{\EstRegFunc}{\widehat{\TrueRegFunc}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator{\dof}{DoF} \DeclareMathOperator{\det}{det} \newcommand{\TrueNoise}{\epsilon} \newcommand{\EstNoise}{\widehat{\TrueNoise}} \]
World PC1
(\(\approx 35\%\) of between-population variance)
World PC2
(\(\approx 18\%\) of between-population variance)
World PC3
(\(\approx 12\%\) of between-population variance)
## year month day RPT VAL ROS KIL SHA BIR DUB CLA MUL CLO
## 1 61 1 1 15.04 14.96 13.17 9.29 13.96 9.87 13.67 10.25 10.83 12.58
## 2 61 1 2 14.71 16.88 10.83 6.50 12.62 7.67 11.50 10.04 9.79 9.67
## 3 61 1 3 18.50 16.88 12.33 10.13 11.17 6.17 11.25 8.04 8.50 7.67
## 4 61 1 4 10.58 6.63 11.75 4.58 4.54 2.88 8.63 1.79 5.83 5.88
## 5 61 1 5 13.33 13.25 11.42 6.17 10.71 8.21 11.92 6.54 10.92 10.34
## 6 61 1 6 13.21 8.12 9.96 6.67 5.37 4.50 10.67 4.42 7.17 7.50
## BEL MAL time
## 1 18.50 15.04 1961-01-01 12:00:00
## 2 17.54 13.83 1961-01-02 12:00:00
## 3 12.75 12.71 1961-01-03 12:00:00
## 4 5.46 10.88 1961-01-04 12:00:00
## 5 12.92 11.83 1961-01-05 12:00:00
## 6 8.12 13.17 1961-01-06 12:00:00
wind.pca.1 <- prcomp(wind[, 4:15])
wind.pca.1$sdev
## [1] 15.149749 4.806761 3.848214 2.840283 2.796445 1.932717 1.809999
## [8] 1.559231 1.408849 1.355770 1.164033 1.079990
plot(-wind.pca.1$rotation[, 1], ylim = c(0, 1))
text(1:12, -wind.pca.1$rotation[, 1], pos = 3, labels = colnames(wind)[4:15])
A pattern over space
A function of space
A function of time
state.pca <- prcomp(state.x77, scale. = TRUE)
signif(state.pca$rotation[, 1:2], 2)
## PC1 PC2
## Population 0.130 0.410
## Income -0.300 0.520
## Illiteracy 0.470 0.053
## Life Exp -0.410 -0.082
## Murder 0.440 0.310
## HS Grad -0.420 0.300
## Frost -0.360 -0.150
## Area -0.033 0.590
states are locations, PCs are patterns of variables
Turn the data on its side
state.vars.pca <- prcomp(t(scale(state.x77))) # What's t()?
length(state.vars.pca$sdev) # Why 8?
## [1] 8
head(signif(state.vars.pca$rotation[, 1:2]), 4)
## PC1 PC2
## Alabama -0.2801370 0.03161830
## Alaska 0.0147876 0.56532600
## Arizona -0.0700666 0.00872764
## Arkansas -0.1653660 0.03283480
signif(state.vars.pca$x[, 1], 2)
## Population Income Illiteracy Life Exp Murder HS Grad Frost
## -2.60 2.90 -6.80 4.90 -6.70 4.80 4.30
## Area
## -0.69
New PC1 vector \(\propto\) old scores on PC1, etc.
Anthony, David W. 2007. The Horse, the Wheel and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World. Princeton: Princeton University Press.
Cavalli-Sforza, Luigi L. 2000. Genes, Peoples, and Languages. New York: North Point Press.
Cavalli-Sforza, Luigi L., Paolo Menozzi, and Alberto Piazza. 1993. “Demic Expansions and Human Evolution.” Science 259:639–46. https://doi.org/10.1126/science.8430313.
———. 1994. The History and Geography of Human Genes. Princeton: Princeton University Press.
Dhillon, Paramveer S., Dean P. Foster, Sham M. Kakade, and Lyle H. Ungar. 2013. “A Risk Comparison of Ordinary Least Squares Vs Ridge Regression.” Journal of Machine Lerning Research 14:1505–11. http://jmlr.org/papers/v14/dhillon13a.html.
Feuerverger, Andrey, Yu He, and Shashi Khatri. 2012. “Statistical Significance of the Netflix Challenge.” Statistical Science 27:202–31. https://doi.org/10.1214/11-STS368.
Glymour, Clark. 1998. “What Went Wrong? Reflections on Science by Observation and The Bell Curve.” Philosophy of Science 65:1–32. http://www.hss.cmu.edu/philosophy/glymour/glymour1998.pdf.
Goerg, Georg M. 2013. “Forecastable Component Analysis (Foreca).” In Proceedings of the 30th International Conference on Machine Learning [Icml 2013], edited by Sanjoy Dasgupta and David McAllester, 28:64–72. 2. http://proceedings.mlr.press/v28/goerg13.html.
Novembre, John, and Matthew Stephens. 2008. “Interpreting Principal Component Analyses of Spatial Population Genetic Variation.” Nature Genetics 40:646–49. https://doi.org/10.1038/ng.139.
Shalizi, Cosma Rohilla. n.d. Advanced Data Analysis from an Elementary Point of View. Cambridge, England: Cambridge University Press. http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV.
Stone, James V. 2004. Independent Component Analysis: A Tutorial Introduction. Cambridge, Massachusetts: MIT Press.
Wall, Michael E., Andreas Rechtsteiner, and Luis M. Rocha. 2003. “Singular Value Decomposition and Principal Component Analysis.” In A Practical Approach to Microarray Data Analysis, edited by D. P. Berrar, W. Dubitsky, and M. Granzow, 91–109. Norwell, Massachusetts: Kluwer. https://arxiv.org/abs/physics/0208101.
Zeller, Richard A., and Edward G. Carmines. 1980. Measurement in the Social Sciences: The Link Between Theory and Data. Cambridge, England: Cambridge University Press.