Principal Components Analysis II

36-467/667

Lecture 6 (17 September 2020)

\[ \newcommand{\X}{\mathbf{x}} \newcommand{\w}{\mathbf{w}} \newcommand{\V}{\mathbf{v}} \newcommand{\S}{\mathbf{s}} \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\TrueRegFunc}{\mu} \newcommand{\EstRegFunc}{\widehat{\TrueRegFunc}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator{\dof}{DoF} \DeclareMathOperator{\det}{det} \newcommand{\TrueNoise}{\epsilon} \newcommand{\EstNoise}{\widehat{\TrueNoise}} \]

In our last episode…

Some properties of the PCs

Some properties of the eigenvalues

Some properties of PCA as a whole

Some properties of PC scores

\[\begin{eqnarray} \Var{\text{scores}} & = & \frac{1}{n} \S^T \S\\ & = & \frac{1}{n} (\X\w)^T(\X\w)\\ & = & \frac{1}{n}\w^T \X^T \X \w\\ & = & \w^T \V\w ~\text{ by definition of} ~ \V\\ & = & \w^T ( \w \mathbf{\Lambda} \mathbf{w}^T) \w ~\text{by eigendecomposition}\\ & = & (\w^T \w) \mathbf{\Lambda} (\w^T\w)\\ & = & \mathbf{\Lambda} \end{eqnarray}\]

Another way to think about PCA

PCA can be used for any multivariate data

PCA with spatial data

A famous example

Some maps from Cavalli-Sforza, Menozzi, and Piazza (1993)

World PC1

(\(\approx 35\%\) of between-population variance)

Some maps from Cavalli-Sforza, Menozzi, and Piazza (1993)

World PC2

(\(\approx 18\%\) of between-population variance)

Some maps from Cavalli-Sforza, Menozzi, and Piazza (1993)

World PC3

(\(\approx 12\%\) of between-population variance)

Some maps from Cavalli-Sforza, Menozzi, and Piazza (1993)

PCA with multiple time series

Irish wind data

Irish wind data

##   year month day   RPT   VAL   ROS   KIL   SHA  BIR   DUB   CLA   MUL   CLO
## 1   61     1   1 15.04 14.96 13.17  9.29 13.96 9.87 13.67 10.25 10.83 12.58
## 2   61     1   2 14.71 16.88 10.83  6.50 12.62 7.67 11.50 10.04  9.79  9.67
## 3   61     1   3 18.50 16.88 12.33 10.13 11.17 6.17 11.25  8.04  8.50  7.67
## 4   61     1   4 10.58  6.63 11.75  4.58  4.54 2.88  8.63  1.79  5.83  5.88
## 5   61     1   5 13.33 13.25 11.42  6.17 10.71 8.21 11.92  6.54 10.92 10.34
## 6   61     1   6 13.21  8.12  9.96  6.67  5.37 4.50 10.67  4.42  7.17  7.50
##     BEL   MAL                time
## 1 18.50 15.04 1961-01-01 12:00:00
## 2 17.54 13.83 1961-01-02 12:00:00
## 3 12.75 12.71 1961-01-03 12:00:00
## 4  5.46 10.88 1961-01-04 12:00:00
## 5 12.92 11.83 1961-01-05 12:00:00
## 6  8.12 13.17 1961-01-06 12:00:00

Irish wind data

Irish wind data — one time series

Irish wind data — all the time series

PCA: \(n = 6574\), \(p=12\)

wind.pca.1 <- prcomp(wind[, 4:15])
wind.pca.1$sdev
##  [1] 15.149749  4.806761  3.848214  2.840283  2.796445  1.932717  1.809999
##  [8]  1.559231  1.408849  1.355770  1.164033  1.079990

PC1: The eigenvector

plot(-wind.pca.1$rotation[, 1], ylim = c(0, 1))
text(1:12, -wind.pca.1$rotation[, 1], pos = 3, labels = colnames(wind)[4:15])

A pattern over space

PC1: The eigenvector

A function of space

PC1: The scores

A function of time

Try to describe the first component here

PCA with spatio-temporal data

Interpreting PCA results

PCA is exploratory analysis, not statistical inference

Some alternatives to PCA

Summing up

Details and asides

Some more maps from Cavalli-Sforza, Menozzi, and Piazza (1993)

Some more maps from Cavalli-Sforza, Menozzi, and Piazza (1993)

Principal Components Regression

Orthogonal matrices

Recall the states…

state.pca <- prcomp(state.x77, scale. = TRUE)
signif(state.pca$rotation[, 1:2], 2)
##               PC1    PC2
## Population  0.130  0.410
## Income     -0.300  0.520
## Illiteracy  0.470  0.053
## Life Exp   -0.410 -0.082
## Murder      0.440  0.310
## HS Grad    -0.420  0.300
## Frost      -0.360 -0.150
## Area       -0.033  0.590

states are locations, PCs are patterns of variables

Each score is spatially distributed

Try it the other way

Turn the data on its side

state.vars.pca <- prcomp(t(scale(state.x77)))  # What's t()?
length(state.vars.pca$sdev)  # Why 8?
## [1] 8
head(signif(state.vars.pca$rotation[, 1:2]), 4)
##                 PC1        PC2
## Alabama  -0.2801370 0.03161830
## Alaska    0.0147876 0.56532600
## Arizona  -0.0700666 0.00872764
## Arkansas -0.1653660 0.03283480
signif(state.vars.pca$x[, 1], 2)
## Population     Income Illiteracy   Life Exp     Murder    HS Grad      Frost 
##      -2.60       2.90      -6.80       4.90      -6.70       4.80       4.30 
##       Area 
##      -0.69

The states turned on their sides…

PCA of \(\X\) vs. PCA of \(\X^T\)

\[\begin{eqnarray} \mathbf{u}\mathbf{\Psi}\mathbf{u}^T & = & p^{-1} \X \X^T\\ \mathbf{u}\mathbf{\Psi}\mathbf{u}^T & = & p^{-1} \S \w^T (\S \w^T)^T\\ \mathbf{u}\mathbf{\Psi}\mathbf{u}^T & = & p^{-1} \S \w^T \w \S^T\\ \mathbf{u}\mathbf{\Psi}^{1/2} \mathbf{\Psi}^{1/2}\mathbf{u}^T & = & p^{-1/2} \S \w^T \w \S^T p^{-1/2}\\ (\mathbf{u}\mathbf{\Psi}^{1/2}) (\mathbf{u}\mathbf{\Psi}^{1/2})^T & = & p^{-1/2} \S \S^T p^{-1/2}\\ \mathbf{u} & = & p^{-1/2} \mathbf{\Psi}^{-1/2} \S \end{eqnarray}\]

New PC1 vector \(\propto\) old scores on PC1, etc.

No, really, PCA doesn’t do statistical inference

Other alternatives to PCA

References

Anthony, David W. 2007. The Horse, the Wheel and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World. Princeton: Princeton University Press.

Cavalli-Sforza, Luigi L. 2000. Genes, Peoples, and Languages. New York: North Point Press.

Cavalli-Sforza, Luigi L., Paolo Menozzi, and Alberto Piazza. 1993. “Demic Expansions and Human Evolution.” Science 259:639–46. https://doi.org/10.1126/science.8430313.

———. 1994. The History and Geography of Human Genes. Princeton: Princeton University Press.

Dhillon, Paramveer S., Dean P. Foster, Sham M. Kakade, and Lyle H. Ungar. 2013. “A Risk Comparison of Ordinary Least Squares Vs Ridge Regression.” Journal of Machine Lerning Research 14:1505–11. http://jmlr.org/papers/v14/dhillon13a.html.

Feuerverger, Andrey, Yu He, and Shashi Khatri. 2012. “Statistical Significance of the Netflix Challenge.” Statistical Science 27:202–31. https://doi.org/10.1214/11-STS368.

Glymour, Clark. 1998. “What Went Wrong? Reflections on Science by Observation and The Bell Curve.” Philosophy of Science 65:1–32. http://www.hss.cmu.edu/philosophy/glymour/glymour1998.pdf.

Goerg, Georg M. 2013. “Forecastable Component Analysis (Foreca).” In Proceedings of the 30th International Conference on Machine Learning [Icml 2013], edited by Sanjoy Dasgupta and David McAllester, 28:64–72. 2. http://proceedings.mlr.press/v28/goerg13.html.

Novembre, John, and Matthew Stephens. 2008. “Interpreting Principal Component Analyses of Spatial Population Genetic Variation.” Nature Genetics 40:646–49. https://doi.org/10.1038/ng.139.

Shalizi, Cosma Rohilla. n.d. Advanced Data Analysis from an Elementary Point of View. Cambridge, England: Cambridge University Press. http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV.

Stone, James V. 2004. Independent Component Analysis: A Tutorial Introduction. Cambridge, Massachusetts: MIT Press.

Wall, Michael E., Andreas Rechtsteiner, and Luis M. Rocha. 2003. “Singular Value Decomposition and Principal Component Analysis.” In A Practical Approach to Microarray Data Analysis, edited by D. P. Berrar, W. Dubitsky, and M. Granzow, 91–109. Norwell, Massachusetts: Kluwer. https://arxiv.org/abs/physics/0208101.

Zeller, Richard A., and Edward G. Carmines. 1980. Measurement in the Social Sciences: The Link Between Theory and Data. Cambridge, England: Cambridge University Press.