36-467/667
11 September 2019
\[ \newcommand{\X}{\mathbf{x}} \newcommand{\w}{\mathbf{w}} \newcommand{\V}{\mathbf{v}} \newcommand{\S}{\mathbf{s}} \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator*{\argmin}{argmin} \]
Do it for one vector first:
\[\begin{eqnarray} {\|\vec{x_i} - (\vec{w}\cdot\vec{x_i})\vec{w}\|}^2 & =& \left(\vec{x_i} - (\vec{w}\cdot\vec{x_i})\vec{w}\right)\cdot\left(\vec{x_i} - (\vec{w}\cdot\vec{x_i})\vec{w}\right)\\ & = & \vec{x_i}\cdot\vec{x_i} -\vec{x_i}\cdot (\vec{w}\cdot\vec{x_i})\vec{w}\\ \nonumber & & - (\vec{w}\cdot\vec{x_i})\vec{w}\cdot\vec{x_i} + (\vec{w}\cdot\vec{x_i})\vec{w}\cdot(\vec{w}\cdot\vec{x_i})\vec{w}\\ & = & {\|\vec{x_i}\|}^2 -2(\vec{w}\cdot\vec{x_i})^2 + (\vec{w}\cdot\vec{x_i})^2\vec{w}\cdot\vec{w}\\ & = & \vec{x_i}\cdot\vec{x_i} - (\vec{w}\cdot\vec{x_i})^2 \end{eqnarray}\](\(\Expect{Z^2} = (\Expect{Z})^2 + \Var{Z}\))
But \[ \frac{1}{n}\sum_{i=1}^{n}{\vec{x_i} \cdot \vec{w}} = 0 \] (Why?)
so \[ L(\vec{w}) = \SampleVar{\vec{w}\cdot\vec{x_i}} \]
The direction which gives us the best approximation of the data is the direction with the greatest variance
Matrix form: all lengths of projections \(=\mathbf{x}\mathbf{w}\) \([n\times 1]\)
\[\begin{eqnarray} \SampleVar{\vec{w}\cdot\vec{x_i}} & = & \frac{1}{n}\sum_{i}{{\left(\vec{x_i} \cdot \vec{w}\right)}^2}\\ & = & \frac{1}{n}{\left(\X \w\right)}^{T} \left(\X \w\right)\\ & = & \frac{1}{n} \w^T \X^T \X \w\\ & = & \w^T \frac{\X^T \X}{n} \w\\ \end{eqnarray}\]Fact: \(\V \equiv \frac{\X^T \X}{n} =\) sample covariance matrix of the vectors
Constraint: \(\vec{w}\) has length 1 \(\Leftrightarrow\) \(\w^T \w = 1\)
Add a Lagrange multiplier \(\lambda\) and take derivatives
THIS IS AN EIGENVALUE/EIGENVECTOR EQUATION!
At the solution, \[ \SampleVar{\vec{w}\cdot\vec{x_i}} = \w^T \V \w = \w^T \lambda \w = \lambda \] so the maximum is the leading eigenvector of \(\V\)
\(\V\) is a special matrix: symmetric and non-negative definite:
\[ \S = \X \w \]
Show that this \(=\mathbf{\Lambda}\)
(Hint: \(\V = \w \mathbf{\Lambda} \w^T\))
\[ \S = \X \w \]
prcomp is the best built-in PCA commandTurchin et al. (2018)
9 variable measuring social complexity, over time, for dozens of locations across the world
Dates from -9600 to 1900
summary(soccomp[, complexities], digits = 2)##      PolPop       PolTerr          CapPop        levels      government  
##  Min.   :1.4   Min.   :-0.22   Min.   :1.4   Min.   :0.0   Min.   :0.00  
##  1st Qu.:4.2   1st Qu.: 3.65   1st Qu.:3.5   1st Qu.:1.8   1st Qu.:0.24  
##  Median :6.0   Median : 5.18   Median :4.3   Median :3.0   Median :0.62  
##  Mean   :5.5   Mean   : 4.78   Mean   :4.2   Mean   :2.9   Mean   :0.55  
##  3rd Qu.:6.8   3rd Qu.: 5.97   3rd Qu.:5.1   3rd Qu.:4.0   3rd Qu.:0.86  
##  Max.   :8.5   Max.   : 7.40   Max.   :6.3   Max.   :6.6   Max.   :1.00  
##     infrastr       writing         texts          money    
##  Min.   :0.00   Min.   :0.00   Min.   :0.00   Min.   :0.0  
##  1st Qu.:0.34   1st Qu.:0.26   1st Qu.:0.10   1st Qu.:1.8  
##  Median :0.75   Median :0.82   Median :0.93   Median :4.0  
##  Mean   :0.64   Mean   :0.65   Mean   :0.63   Mean   :3.4  
##  3rd Qu.:0.90   3rd Qu.:0.86   3rd Qu.:0.97   3rd Qu.:5.0  
##  Max.   :1.00   Max.   :1.00   Max.   :1.00   Max.   :6.0soccomp.pca <- prcomp(soccomp[, complexities], scale = TRUE)What are the parts of the return value?
str(soccomp.pca)## List of 5
##  $ sdev    : num [1:9] 2.634 0.733 0.646 0.581 0.481 ...
##  $ rotation: num [1:9, 1:9] 0.351 0.32 0.339 0.341 0.332 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:9] "PolPop" "PolTerr" "CapPop" "levels" ...
##   .. ..$ : chr [1:9] "PC1" "PC2" "PC3" "PC4" ...
##  $ center  : Named num [1:9] 5.515 4.779 4.229 2.923 0.552 ...
##   ..- attr(*, "names")= chr [1:9] "PolPop" "PolTerr" "CapPop" "levels" ...
##  $ scale   : Named num [1:9] 1.59 1.561 1.112 1.449 0.325 ...
##   ..- attr(*, "names")= chr [1:9] "PolPop" "PolTerr" "CapPop" "levels" ...
##  $ x       : num [1:414, 1:9] -4.35 -4.24 -4.11 -3.67 -3.51 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : NULL
##   .. ..$ : chr [1:9] "PC1" "PC2" "PC3" "PC4" ...
##  - attr(*, "class")= chr "prcomp"sdev = standard deviations along PCs \(=\sqrt{\lambda_i}\)rotation = matrix of eigenvectors \(=\w\)x = scores on all PCs \(=\S\)plot(cumsum(soccomp.pca$sdev^2)/sum(soccomp.pca$sdev^2), xlab = "Number of components used", 
    ylab = expression(R^2), ylim = c(0, 1))
abline(h = 0.75, lty = "dashed")
abline(h = 0.9, lty = "dotted")One principal component already keeps more than 75% of the variance (3 components keep just under 90%)
plot(soccomp.pca$rotation[, 1], xlab = "", main = "PCs of the complexity measures", 
    xaxt = "n", ylab = "PC weight", ylim = c(-1, 1), type = "l")
axis(side = 1, at = 1:9, labels = colnames(soccomp)[complexities], las = 2, 
    cex.axis = 0.5)
lines(soccomp.pca$rotation[, 2], col = "red")
lines(soccomp.pca$rotation[, 3], col = "blue")
legend("bottomright", legend = c("PC1", "PC2", "PC3"), lty = "solid", col = c("black", 
    "red", "blue"))
abline(h = 0, col = "grey")soccomp$PC1 <- soccomp.pca$x[, 1]
with(soccomp, plot(PC1 ~ Time))plot(soccomp.pca$x[, 1], soccomp.pca$x[, 2], xlab = "Score on PC1", ylab = "Score on PC2")source("http://www.stat.cmu.edu/~cshalizi/dm/19/hw/01/nytac-and-bow.R")
art.stories <- read.directory("nyt_corpus/art")
music.stories <- read.directory("nyt_corpus/music")
art.BoW.list <- lapply(art.stories, table)
music.BoW.list <- lapply(music.stories, table)
nyt.BoW.frame <- make.BoW.frame(c(art.BoW.list, music.BoW.list), row.names = c(paste("art", 
    1:length(art.BoW.list), sep = "."), paste("music", 1:length(music.BoW.list), 
    sep = ".")))
nyt.nice.frame <- div.by.euc.length(idf.weight(nyt.BoW.frame))
nyt.pca <- prcomp(nyt.nice.frame)First component:
nyt.latent.sem <- nyt.pca$rotation
signif(sort(nyt.latent.sem[, 1], decreasing = TRUE)[1:30], 2)##       music        trio     theater   orchestra   composers       opera 
##       0.110       0.084       0.083       0.067       0.059       0.058 
##    theaters           m    festival        east     program           y 
##       0.055       0.054       0.051       0.049       0.048       0.048 
##      jersey     players   committee      sunday        june     concert 
##       0.047       0.047       0.046       0.045       0.045       0.045 
##    symphony       organ     matinee   misstated instruments           p 
##       0.044       0.044       0.043       0.042       0.041       0.041 
##         X.d       april      samuel        jazz     pianist     society 
##       0.041       0.040       0.040       0.039       0.038       0.038signif(sort(nyt.latent.sem[, 1], decreasing = FALSE)[1:30], 2)##       she       her        ms         i      said    mother    cooper 
##    -0.260    -0.240    -0.200    -0.150    -0.130    -0.110    -0.100 
##        my  painting   process paintings        im        he       mrs 
##    -0.094    -0.088    -0.071    -0.070    -0.068    -0.065    -0.065 
##        me  gagosian       was   picasso     image sculpture      baby 
##    -0.063    -0.062    -0.058    -0.057    -0.056    -0.056    -0.055 
##   artists      work    photos       you    nature    studio       out 
##    -0.055    -0.054    -0.051    -0.051    -0.050    -0.050    -0.050 
##      says      like 
##    -0.050    -0.049Second component:
signif(sort(nyt.latent.sem[, 2], decreasing = TRUE)[1:30], 2)##         art      museum      images     artists   donations     museums 
##       0.150       0.120       0.095       0.092       0.075       0.073 
##    painting         tax   paintings   sculpture     gallery  sculptures 
##       0.073       0.070       0.065       0.060       0.055       0.051 
##     painted       white    patterns      artist      nature     service 
##       0.050       0.050       0.047       0.047       0.046       0.046 
##  decorative        feet     digital      statue       color    computer 
##       0.043       0.043       0.043       0.042       0.042       0.041 
##       paris         war collections     diamond       stone     dealers 
##       0.041       0.041       0.041       0.041       0.041       0.040signif(sort(nyt.latent.sem[, 2], decreasing = FALSE)[1:30], 2)##          her          she      theater        opera           ms 
##       -0.220       -0.220       -0.160       -0.130       -0.130 
##            i         hour   production         sang     festival 
##       -0.083       -0.081       -0.075       -0.075       -0.074 
##        music      musical        songs        vocal    orchestra 
##       -0.070       -0.070       -0.068       -0.067       -0.067 
##           la      singing      matinee  performance         band 
##       -0.065       -0.065       -0.061       -0.061       -0.060 
##       awards    composers         says           my           im 
##       -0.058       -0.058       -0.058       -0.056       -0.056 
##         play     broadway       singer       cooper performances 
##       -0.056       -0.055       -0.052       -0.051       -0.051story.classes <- c(rep("art", times = length(art.stories)), rep("music", times = length(music.stories)))
plot(nyt.pca$x[, 1:2], pch = ifelse(story.classes == "music", "m", "a"), col = ifelse(story.classes == 
    "music", "blue", "red"))Use \(k\) directions in a \(p\times k\) matrix \(\w\)
Require: \(\mathbf{w}^T\mathbf{w} = \mathbf{I}\), the basis vectors are orthonormal
\(\X \w =\) matrix of projection lengths \([n\times k]\)
\(\X \w \w^T =\) matrix of projected vectors \([n\times p]\)
\(\X - \X \w \w^T =\) matrix of vector residuals \([n\times p]\)
\((\X-\X\w\w^T)(\X-\X\w\w^T)^T =\) matrix of inner products of vector residuals \([n\times n]\)
\(\tr{((\X-\X\w\w^T)(\X-\X\w\w^T)^T)} =\) sum of squared errors \([1\times 1]\)
so maximize \(\frac{1}{n}\tr{(\X\w\w^T\X^T)}\)
“trace is cyclic” so \[ \tr{(\X\w\w^T\X^T)} = \tr{(\X^T\X\w\w^T)} = \tr{(\w^T\X^T\X\w)} \] so we want to maximize \[ \tr{\left(\w^T \frac{\X^T \X}{n}\w\right)} \] under the constraint \[ \w^T \w = \mathbf{I} \]
This is the same form we saw before, so it has the same sort of solution: each column of \(\w\) must be an eigenvector of \(\V\).
\[ \max_{w, \lambda}{\mathcal{L}(w,\lambda)} \]
Deerwester, Scott, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. “Indexing by Latent Semantic Analysis.” Journal of the American Society for Information Science 41:391–407. https://doi.org/10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9.
Feuerverger, Andrey, Yu He, and Shashi Khatri. 2012. “Statistical Significance of the Netflix Challenge.” Statistical Science 27:202–31. https://doi.org/10.1214/11-STS368.
Landauer, Thomas K., and Susan T. Dumais. 1997. “A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.” Psychological Review 104:211–40. http://lsa.colorado.edu/papers/plato/plato.annote.html.
Turchin, Peter, Thomas E. Currie, Harvey Whitehouse, Pieter François, Kevin Feeney, Daniel Mullins, Daniel Hoyer, et al. 2018. “Quantitative Historical Analysis Uncovers a Single Dimension of Complexity That Structures Global Variation in Human Social Organization.” Proceedings of the National Academy of Sciences (USA) 115:E144–E151. https://doi.org/10.1073/pnas.1708800115.