Optimal Linear Prediction

36-467/667

20 September 2020 (Lecture 7)

\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\TrueRegFunc}{\mu} \newcommand{\EstRegFunc}{\widehat{\TrueRegFunc}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator{\det}{det} \newcommand{\TrueNoise}{\epsilon} \newcommand{\EstNoise}{\widehat{\TrueNoise}} \]

In our previous episodes

Today: use correlations to do prediction

Optimal prediction in general

What’s the best constant guess for a random variable \(Y\)?

\[\begin{eqnarray} \TrueRegFunc & = & \argmin_{m \in \mathbb{R}}{\Expect{(Y-m)^2}}\\ & = & \argmin_{m \in \mathbb{R}}{\left(\Var{(Y-m)} + (\Expect{Y-m})^2\right)}\\ & = & \argmin_{m \in \mathbb{R}}{\left(\Var{Y} + (\Expect{Y} - m)^2\right)}\\ & = & \argmin_{m \in \mathbb{R}}{ (\Expect{Y} - m)^2}\\ & = & \Expect{Y} \end{eqnarray}\]

Optimal prediction in general

For each \(z \in \mathcal{Z}\), best \(m(z)\) is \(\Expect{Y|Z=z}\) (by previous slide), so \[ \TrueRegFunc(z) = \Expect{Y|Z=z} \]

Optimal prediction in general

Optimal linear prediction with univariate predictor

Our prediction will be of the form \[ m(z) = a + b z \] and we want the best \(a, b\)

Optimal linear prediction, univariate case

\[ (\alpha, \beta) = \argmin_{a \in \mathbb{R}, b \in \mathbb{R}}{\Expect{(Y-(a+bZ))^2}} \]

Expand out that expectation, then take derivatives and set them to 0

The intercept

\[\begin{eqnarray} \Expect{(Y-(a+bZ))^2} & = & \Expect{Y^2} - 2\Expect{Y(a+bZ)} + \Expect{(a+bZ)^2}\\ & = & \Expect{Y^2} - 2a\Expect{Y} - 2b\Expect{YZ} +\\ & & a^2 + 2 ab \Expect{Z} + b^2 \Expect{Z^2}\\ \left. \frac{\partial}{\partial a}\Expect{(Y-(a+bZ))^2} \right|_{a=\alpha, b=\beta} & = & -2\Expect{Y} + 2\alpha + 2\beta\Expect{Z} = 0\\ \alpha & = & \Expect{Y} - \beta\Expect{Z} \end{eqnarray}\]

Remember: optimal linear predictor is \(\alpha + \beta Z\)

\(\therefore\) optimal linear predictor looks like \[ \Expect{Y} + \beta(Z-\Expect{Z}) \] \(\Rightarrow\) centering \(Z\) and/or \(Y\) won’t change the slope

The slope

\[\begin{eqnarray} \left. \frac{\partial}{\partial b}\Expect{(Y-(a+bZ))^2}\right|_{a=\alpha, b=\beta} & = & -2\Expect{YZ} + 2\alpha \Expect{Z} + 2\beta \Expect{Z^2} = 0\\ 0 & = & -\Expect{YZ} + (\Expect{Y} - \beta\Expect{Z})\Expect{Z} + \beta\Expect{Z^2} \\ 0 & = & \Expect{Y}\Expect{Z} - \Expect{YZ} + \beta(\Expect{Z^2} - \Expect{Z}^2)\\ 0 & = & -\Cov{Y,Z} + \beta \Var{Z}\\ \beta & = & \frac{\Cov{Y,Z}}{\Var{Z}} \end{eqnarray}\]

The optimal linear predictor of \(Y\) from \(Z\)

The optimal linear predictor of \(Y\) from a single \(Z\) is always

\[ \alpha + \beta Z = \Expect{Y} + \left(\frac{\Cov{Z,Y}}{\Var{Z}}\right) (Z - \Expect{Z}) \]

What did we not assume?

A little worked example (I)

We know: \[ m(Z) = \Expect{Y} + \frac{\Cov{Y, Z}}{\Var{Z}}(Z - \Expect{Z}) \]

Substituting in: \[ m(X_1) = \mu + \frac{\gamma}{\sigma^2}(X_1 - \mu) \]

Some general properties of the optimal linear predictor

  1. The prediction errors average out to zero
  2. The prediction errors are uncorrelated with \(Z\)
  3. The variance of the prediction errors \(\leq\) the variance of \(Y\)

The prediction errors average out to zero

\[\begin{eqnarray} \Expect{Y-m(Z)} & = & \Expect{Y - (\Expect{Y} + \beta(Z-\Expect{Z}))}\\ & = & \Expect{Y} - \Expect{Y} - \beta(\Expect{Z} - \Expect{Z}) = 0 \end{eqnarray}\]

The prediction errors are uncorrelated with \(Z\)

\[\begin{eqnarray} \Cov{Z, Y-m(Z)} & = & \Expect{Z(Y-m(Z))} ~\text{(by previous slide)}\\ & = & \Expect{Z(Y - \Expect{Y} - \frac{\Cov{Y,Z}}{\Var{Z}}(Z-\Expect{Z}))}\\ & = & \Expect{ZY - Z\Expect{Y} - \frac{\Cov{Y,Z}}{\Var{Z}}(Z^2) + \frac{\Cov{Y,Z}}{\Var{Z}} (Z \Expect{Z})}\\ & = & \Expect{ZY} - \Expect{Z}\Expect{Y} - \frac{\Cov{Y,Z}}{\Var{Z}}\Expect{Z^2} + \frac{\Cov{Y,Z}}{\Var{Z}} (\Expect{Z})^2\\ & = & \Cov{Z,Y} - \frac{\Cov{Y,Z}}{\Var{Z}}(\Var{Z})\\ & = & 0 \end{eqnarray}\]

The prediction errors are uncorrelated with \(Z\)

Alternate take:

\[\begin{eqnarray} \Cov{Z, Y-m(Z)} & = & \Cov{Z, Y} - \Cov{Z, \alpha + \beta Z}\\ & = & \Cov{Y,Z} - \Cov{Z, \beta Z}\\ & = & \Cov{Y,Z} - \beta\Cov{Z,Z}\\ & = & \Cov{Y,Z} - \beta\Var{Z}\\ & = & \Cov{Y,Z} - \Cov{Y,Z} = 0 \end{eqnarray}\]

How big are the prediction errors?

\[\begin{eqnarray} \Var{Y-m(Z)} & = & \Var{Y - \alpha - \beta Z}\\ & = & \Var{Y - \beta Z}\\ & = & \Var{Y} + \beta^2\Var{Z} - 2\beta\Cov{Y,Z} \end{eqnarray}\]

but \(\beta = \Cov{Y,Z}/\Var{Z}\) so

\[\begin{eqnarray} \Var{Y-m(Z)} & = & \Var{Y} + \frac{(\Cov{Y,Z})^2}{\Var{Z}} - 2\frac{(\Cov{Y,Z})^2}{\Var{Z}}\\ & = & \Var{Y} - \frac{(\Cov{Y,Z})^2}{\Var{Z}}\\ & < & \Var{Y} ~ \text{unless}\ \Cov{Y,Z} = 0 \end{eqnarray}\]

\(\Rightarrow\) Optimal linear predictor is (almost) always better than nothing…

Multivariate case

We try to predict \(Y\) from a whole bunch of variables

Bundle those predictor variables into \(\vec{Z}\)

Solution:

\[ m(\vec{Z}) = \alpha+\vec{\beta}\cdot \vec{Z} = \Expect{Y} + (\Var{\vec{Z}})^{-1} \Cov{\vec{Z},Y} \cdot (\vec{Z} - \Expect{\vec{Z}}) \]

and

\[ \Var{Y-m(\vec{Z})} = \Var{Y} - \Cov{Y,\vec{Z}}^T (\Var{\vec{Z}})^{-1} \Cov{Y,\vec{Z}} \]

What we don’t assume, again

Some possible contexts

Interpolating or extrapolating a single variable

\[\begin{eqnarray} Y & = & X(r_0, t_0)\\ \vec{Z} & = & [X(r_1, t_1), X(r_2, t_2), \ldots X(r_n, t_n)] \end{eqnarray}\]

A little worked example (II)

Work out \(\vec{\beta}\) (off-line!) and get \[\begin{eqnarray} m(x_1, x_2) = \mu + \frac{\gamma}{\sigma^2 + \rho}\left( (x_1 - \mu) + (x_2 - \mu)\right) \end{eqnarray}\] vs. with one predictor \[ \mu + \frac{\gamma}{\sigma^2}(x_1 - \mu) \]

Predicting one variable from another

\[\begin{eqnarray} Y & = & X(r_0, t_0)\\ \vec{Z} & = & [U(r_1, t_1), U(r_2, t_2), \ldots U(r_n, t_n)]\\ \end{eqnarray}\]

Predicting one variable from 2+ others

\[\begin{eqnarray} Y & = & X(r_0, t_0)\\ \vec{Z} & = & [U(r_1, t_1), V(r_1, t_1), U(r_2, t_2), V(r_2, t_2), \ldots U(r_n, t_n), V(r_n, t_n)] \end{eqnarray}\]

Optimal prediction depends on variances and covariances

so how do we get these?

Summing up

Backup: Gory details for multivariate predictors

\[\begin{eqnarray} m(\vec{Z}) & = & a + \vec{b} \cdot \vec{Z}\\ (\alpha, \vec{\beta}) & = & \argmin_{a \in \mathbb{R}, \vec{b} \in \mathbb{R}^n}{\Expect{(Y-(a + \vec{b} \cdot \vec{Z}))^2}}\\ \Expect{(Y-(a+\vec{b}\cdot \vec{Z}))^2} & = & \Expect{Y^2} + a^2 + \Expect{(\vec{b}\cdot \vec{Z})^2}\\ \nonumber & & - 2\Expect{Y (\vec{b}\cdot \vec{Z})} - 2 \Expect{Y a} + 2 \Expect{a \vec{b} \cdot \vec{Z}}\\ & = & \Expect{Y^2} + a^2 + \vec{b} \cdot \Expect{\vec{Z} \otimes \vec{Z}} b \\ \nonumber & & -2a\Expect{Y} - 2 \vec{b} \cdot \Expect{Y\vec{Z}} + 2a\vec{b}\cdot \Expect{\vec{Z}}\\ \end{eqnarray}\]

(\(\vec{u} \otimes \vec{v}\) is the outer product, the square matrix where \((\vec{u} \times \vec{v})_{ij} = u_i v_j\))

Backup: Gory details: the intercept

Take derivative w.r.t. \(a\), set to 0 at \(a=\alpha\), \(\vec{b}=\vec{\beta}\):

\[\begin{eqnarray} 0 & = & -2\Expect{Y} + 2\vec{\beta} \cdot \Expect{\vec{Z}} + 2\alpha \\ \alpha & = & \Expect{Y} - \vec{\beta} \cdot \Expect{\vec{Z}}\\ \end{eqnarray}\]

just like when \(Z\) was univariate

Backup: Gory details: the slopes

\[\begin{eqnarray} -2 \Expect{Y\vec{Z}} + 2 \Expect{\vec{Z} \otimes \vec{Z}} \beta + 2 \alpha \Expect{\vec{Z}} & = & 0\\ \Expect{Y\vec{Z}} - \alpha\Expect{\vec{Z}} & = & \Expect{\vec{Z} \otimes \vec{Z}} \beta\\ \Expect{Y\vec{Z}} - (\Expect{Y} - \vec{\beta} \cdot \Expect{\vec{Z}}) \Expect{\vec{Z}} & = & \Expect{\vec{Z} \otimes \vec{Z}} \beta\\ \Cov{Y,\vec{Z}} & = & \Var{\vec{Z}} \beta\\ \beta & = & (\Var{\vec{Z}})^{-1} \Cov{Y,\vec{Z}} \end{eqnarray}\]

Reduces to \(\Cov{Y,Z}/\Var{Z}\) when \(Z\) is univariate

Backup: Gory details: the PCA view

Backup: Estimation I: “plug-in”

so for univariate \(Z\), \[ \hat{m}(z) = \overline{y} - \frac{\widehat{\Cov{Y,Z}}}{\widehat{\Var{Z}}}(z-\overline{z}) \]

Backup: Estimation II: ordinary least squares

Backup: Estimation: When does OLS/plug-in work?

Backup: Square roots of a matrix