Dimension Reduction III — Random Linear Projections and Locally Linear Embeddings

36-462/662, Fall 2019

18 September 2019

\[ \newcommand{\X}{\mathbf{x}} \newcommand{\Y}{\mathbf{y}} \newcommand{\w}{\mathbf{w}} \newcommand{\V}{\mathbf{v}} \newcommand{\S}{\mathbf{s}} \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator*{\argmin}{argmin} \]

Recap

PCA and distance preservation

Distance-preserving projections

The random projection trick

The random projection trick

Random projections are nearly distance-preserving with high probability

The problem with nonlinear structure

x = matrix(c(exp(-0.2 * (-(1:300)/10)) * cos(-(1:300)/10), exp(-0.2 * (-(1:300)/10)) * 
    sin(-(1:300)/10)), ncol = 2)
plot(x)

PCA just fails here

fit.all = prcomp(x)
approx.all = fit.all$x[, 1] %*% t(fit.all$rotation[, 1])
plot(x, xlab = expression(x[1]), ylab = expression(x[2]))
points(approx.all, pch = 4)

Manifold learning

PCA doesn’t do too badly around any small part of the curve

fit = prcomp(x[270:280, ])
pca.approx = fit$x[, 1] %*% t(fit$rotation[, 1]) + colMeans(x[270:280, ])
plot(rbind(x[270:280, ], pca.approx), type = "n", xlab = expression(x[1]), ylab = expression(x[2]))
points(x[270:280, ])
points(pca.approx, pch = 4)

Local(ly) Linear Embedding

LLE

  1. For each \(\vec{x}_i\), find the \(k\) nearest neighbors.
  2. Find optimal weights for reconstructing \(\vec{x}_i\) from its neighbors, i.e., minimize \[\begin{equation} MSE(\mathbf{w}) \equiv \frac{1}{n}\sum_{i=1}^{n}{{\| \vec{x}_i - \sum_{j \neq i}{w_{ij} \vec{x}_j}\|}^2} \end{equation}\] with \(w_{ij} = 0\) unless \(j\) is a nearest neighbor of \(i\), and \(\sum_{j}{w_{ij}} = 1\)
  3. Find coordinates \(\Y\) which minimize the reconstruction error, \[\begin{equation} \Phi(\mathbf{Y}) \equiv \sum_{i=1}^{n}{{\|\vec{y}_i - \sum_{j\neq i}{w_{ij} \vec{y}_j}\|}^2} \end{equation}\] with constraints \(\sum_{i}{y_{ij}} = 0\) and \(\Y^T \Y = \mathbf{I}\) (centered, de-correlated)

Finding neighbors in \(p\)-dimensional space

Finding weights

\[ \min_{\w}{\frac{1}{n}\sum_{i=1}^{n}{{\| \vec{x}_i - \sum_{j \neq i}{w_{ij} \vec{x}_j}\|}^2}} \]

Finding the weights

Finding the new coordinates

Implementation

# Local linear embedding of data vectors Inputs: n*p matrix of vectors,
# number of dimensions q to find (< p), number of nearest neighbors per
# vector, scalar regularization setting Calls: find.kNNs,
# reconstruction.weights, coords.from.weights Output: n*q matrix of new
# coordinates
lle <- function(x, q, k = q + 1, alpha = 0.01) {
    stopifnot(q > 0, q < ncol(x), k > q, alpha > 0)  # sanity checks
    kNNs = find.kNNs(x, k)  # should return an n*k matrix of indices
    w = reconstruction.weights(x, kNNs, alpha)  # n*n weight  matrix
    coords = coords.from.weights(w, q)  # n*q coordinate matrix
    return(coords)
}

spiral.lle = lle(x, 1, 2)
plot(spiral.lle, ylab = "Coordinate on manifold")

all(diff(spiral.lle) > 0)
## [1] TRUE

plot(x, col = rainbow(300, end = 5/6)[cut(spiral.lle, 300, labels = FALSE)], 
    pch = 16)

Afternotes

References

Boucheron, Stéphane, Gábor Lugosi, and Pascal Massart. 2013. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford: Oxford University Press.

Mahoney, Michael W. 2011. “Randomized Algorithms for Matrices and Data.” Foundations and Trends in Machine Learning 2:123–224. https://doi.org/10.1561/2200000035.

Roweis, Sam T., and Laurence K. Saul. 2000. “Nonlinear Dimensionality Reduction by Locally Linear Embedding.” Science 290:2323–6. https://doi.org/10.1126/science.290.5500.2323.

Saul, Lawrence K., and Sam T. Roweis. 2003. “Think Globally, Fit Locally: Supervised Learning of Low Dimensional Manifolds.” Journal of Machine Learning Research 4:119–55. http://jmlr.csail.mit.edu/papers/v4/saul03a.html.