Trends and Smoothing II
36-467/36-667
4 September 2018
\[
\newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]}
\newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]}
\newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]}
\newcommand{\TrueRegFunc}{\mu}
\newcommand{\EstRegFunc}{\widehat{\TrueRegFunc}}
\DeclareMathOperator{\tr}{tr}
\DeclareMathOperator*{\argmin}{argmin}
\DeclareMathOperator{\dof}{DoF}
\DeclareMathOperator{\det}{det}
\newcommand{\TrueNoise}{\epsilon}
\newcommand{\EstNoise}{\widehat{\TrueNoise}}
\]
In our last episode…
- Data \(X(t) = \TrueRegFunc(t) + \TrueNoise(t)\)
- \(\TrueRegFunc\) deterministic (=trend), \(\TrueNoise\) stochastic and mean-zero (=fluctuations)
- Wanted: estimates of \(\TrueRegFunc\) and/or \(\TrueNoise\) from one data set
- Hope: \(\TrueRegFunc\) is a smooth function \(\Rightarrow\) average nearby \(X\)’s
- Linear smoother: \(\EstRegFunc(t) = \sum_{j=1}^{n}{w(t, t_j) x_j}\)
- Fitted values on the data \(\mathbf{\EstRegFunc} = \mathbf{w}\mathbf{x}\)
- \(\mathbf{w}\) is the source of all knowledge
Expectation of the fitted values
\[\begin{eqnarray}
\Expect{\mathbf{\EstRegFunc}} & = & \Expect{\mathbf{w}\mathbf{X}}\\
& = & \mathbf{w}\Expect{\mathbf{X}}\\
& = & \mathbf{w} \mathbf{\mu}
\end{eqnarray}\]
Unbiased estimate \(\Leftrightarrow \mathbf{w} \mathbf{\mu} = \mathbf{\mu}\)
Expanding in eigenvectors
- Generally, \(\mathbf{w}\) has \(n\) linearly-independent eigenvectors \(\mathbf{e}_1, \ldots \mathbf{e}_n\), with eigenvalues \(\lambda_1, \ldots \lambda_n\)
- So \(\mathbf{x} = \sum_{j=1}^{n}{c_j \mathbf{e}_j}\)
- So \(\mathbf{w}\mathbf{x} = \mathbf{w}\sum_{j=1}^{n}{c_j \mathbf{e}_j} = \sum_{j=1}^{n}{c_j \lambda_j \mathbf{e}_j}\)
- Components of the data which match large-\(\lambda\) eigenvectors are enhanced
- Components of the data which match small-\(\lambda\) eigenvectors are shrunk
A little example
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0.5000000 0.5000000 0.0000000 0.0000000 0.0000000 0.0000000
## [2,] 0.3333333 0.3333333 0.3333333 0.0000000 0.0000000 0.0000000
## [3,] 0.0000000 0.3333333 0.3333333 0.3333333 0.0000000 0.0000000
## [4,] 0.0000000 0.0000000 0.3333333 0.3333333 0.3333333 0.0000000
## [5,] 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333 0.3333333
## [6,] 0.0000000 0.0000000 0.0000000 0.0000000 0.3333333 0.3333333
## [7,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.3333333
## [8,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [9,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [10,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## [,7] [,8] [,9] [,10]
## [1,] 0.0000000 0.0000000 0.0000000 0.0000000
## [2,] 0.0000000 0.0000000 0.0000000 0.0000000
## [3,] 0.0000000 0.0000000 0.0000000 0.0000000
## [4,] 0.0000000 0.0000000 0.0000000 0.0000000
## [5,] 0.0000000 0.0000000 0.0000000 0.0000000
## [6,] 0.3333333 0.0000000 0.0000000 0.0000000
## [7,] 0.3333333 0.3333333 0.0000000 0.0000000
## [8,] 0.3333333 0.3333333 0.3333333 0.0000000
## [9,] 0.0000000 0.3333333 0.3333333 0.3333333
## [10,] 0.0000000 0.0000000 0.5000000 0.5000000
A little example
## [1] 1.00000000 0.96261129 0.85490143 0.68968376 0.48651845
## [6] -0.31012390 0.26920019 -0.23729622 -0.11137134 0.06254301
## [1] 0.3162278 0.3162278 0.3162278 0.3162278 0.3162278 0.3162278 0.3162278
## [8] 0.3162278 0.3162278 0.3162278
A little example
Variance of the fitted values
\[\begin{eqnarray}
\Var{\mathbf{\EstRegFunc}} & = & \Var{\mathbf{w}\mathbf{X}}\\
& = & \mathbf{w}\Var{\mathbf{X}}\mathbf{w}^T\\
& = & \mathbf{w}\Var{\mathbf{\TrueRegFunc} + \mathbf{\TrueNoise}}\mathbf{w}^T\\
& = & \mathbf{w}\Var{\mathbf{\TrueNoise}}\mathbf{w}^T
\end{eqnarray}\]
IF \(\Var{\mathbf{\TrueNoise}} = \sigma^2 \mathbf{I}\), THEN \(\Var{\mathbf{\EstRegFunc}} = \sigma^2\mathbf{w}\mathbf{w}^T\)
How much do the fitted values respond to the data?
\[\begin{eqnarray}
\sum_{i=1}^{n}{\Cov{\EstRegFunc_i, X_i}} & = & \sum_{i=1}^{n}{\Cov{\sum_{j=1}^{n}{w_{ij} X_j}, X_i}}\\
& = & \sum_{i=1}^{n}{\sum_{j=1}^{n}{w_{ij} \Cov{X_i, X_j}}}\\
& = & \sum_{i=1}^{n}{\sum_{j=1}^{n}{w_{ij} \Cov{\TrueNoise_i, \TrueNoise_j}}}
\end{eqnarray}\]
IF \(\Var{\mathbf{\TrueNoise}} = \sigma^2 \mathbf{I}\), THEN this \(= \sigma^2\tr{\mathbf{w}} = \sigma^2 \text{(sum of eigenvalues)}\)
\(\tr{\mathbf{w}} =\) (effective) degrees of freedom
Data = trend + fluctuation
- \(X(t) = \TrueRegFunc(t) + \TrueNoise(t)\)
- \(\Rightarrow\) \(\TrueNoise(t) = X(t) - \TrueRegFunc(t)\)
- \(\Rightarrow\) \(\EstNoise(t) \equiv X(t) - \EstRegFunc(t) =\) residuals
\[\begin{eqnarray}
\mathbf{\EstNoise} & = & \mathbf{x} - \mathbf{\EstRegFunc}\\
& = & \mathbf{x} - \mathbf{w}\mathbf{x}\\
& = & (\mathbf{I} - \mathbf{w})\mathbf{x}
\end{eqnarray}\]
Convince yourself: \(\mathbf{I}-\mathbf{w}\) has same eigenvectors as \(\mathbf{w}\), but eigenvalues \(1-\lambda\)
Expected residuals
\[\begin{eqnarray}
\Expect{\mathbf{\EstNoise}} & = & \Expect{(\mathbf{I}-\mathbf{w})\mathbf{X}}\\
& = & (\mathbf{I}-\mathbf{w})\mathbf{\TrueRegFunc}
\end{eqnarray}\]
Biased trend estimate \(\Leftrightarrow\) biased fluctuation estimate
Break for the in-class exercise
- \(X(t) = \TrueRegFunc(t) + \TrueNoise(t)\), and \(\Var{\TrueNoise(t)} = \sigma^2\), \(\Cov{\TrueNoise(t_1), \TrueNoise(t_2)} = 0\)
- Set \(\EstRegFunc(t) = \frac{1}{3}\sum_{s=t-1}^{t+1}{X(s)}\)
- Ignore the ends of the data where we don’t have neighbors on both sides
- What is \(\Cov{\EstRegFunc(t), \EstRegFunc(t+1)}\)?
- What is \(\Cov{\EstRegFunc(t), \EstRegFunc(t+2)}\)?
- What is \(\Cov{\EstRegFunc(t), \EstRegFunc(t+3)}\)?
- Why aren’t all of these 0?
Variance and covariance of the residuals
\[
\Var{\mathbf{\EstNoise}} = (\mathbf{I}-\mathbf{w}) \Var{\mathbf{\epsilon}} (\mathbf{I}-\mathbf{w})^T
\]
IF \(\Var{\mathbf{\epsilon}} = \sigma^2 \mathbf{I}\), THEN this \(= \sigma^2 (\mathbf{I}-\mathbf{w})(\mathbf{I}-\mathbf{w})^T\)
NB: Correlations from off-diagonal entries in \(\mathbf{w}\)
Splines
\[
\EstRegFunc = \argmin_{m}{\frac{1}{n}\sum_{i=1}^{n}{(x_i - m(t_i))^2} + \lambda\int{(m^{\prime\prime}(t))^2 dt}}
\]
- This \(\lambda\) not an eigenvalue (sorry)
- Fit the data points vs. over-all curvature
- Minimization is over all functions
- Solution is always a piecewise cubic polynomial, but continuous, with continuous 1st and 2nd derivatives
- \(\lambda \rightarrow 0\) \(\Rightarrow\) Straight lines between data points
- \(\lambda \rightarrow \infty\) \(\Rightarrow\) Global linear fit
- \(\downarrow\) degrees of freedom as \(\uparrow \lambda\)
How do we pick \(\lambda\)?
- Want trend to predict not-yet-seen stuff (interpolate, extrapolate, filter)
- A good \(\lambda\) predicts new stuff well
- Hold out part of the data and try to predict that from the rest
Leave-one-out cross-validation (LOOCV)
- For each of the \(n\) data points:
- Fit using every data point except \(i\), get \(\EstRegFunc^{(-i)}\);
- Find \(\EstRegFunc^{(-i)}(t_i)\);
- Find \((x_i - \EstRegFunc^{(-i)}(t_i))^2\).
Average over all data points, \(n^{-1}\sum_{i=1}^{n}{(x_i - \EstRegFunc^{(-i)}(t_i))^2}\)
- Low LOOCV \(\Leftrightarrow\) good ability to predict new data
This is what smooth.spline
does automatically
Leave-one-out cross-validation (LOOCV)
Don’t have to re-fit linear smoothers \(n\) times
\[\begin{eqnarray}
\EstRegFunc^{(-i)}(t_i) &= & \frac{({\mathbf{w} \mathbf{x})}_i - w_{ii} x_i}{1-w_{ii}}\\
x_i - \EstRegFunc^{(-i)}(t_i) & = & \frac{x_i - \EstRegFunc(t_i)}{1-w_{ii}}\\
LOOCV & = & \frac{1}{n}\sum_{i=1}^{n}{\left(\frac{x_i-\EstRegFunc(t_i)}{1-w_{ii}}\right)^2}
\end{eqnarray}\]
Many variants
- \(h\)-block CV: omit a buffer of radius \(h\) around the hold-out point from the training set
- \(k\)- or \(v\)-fold CV: divide data into \(k\) equal-sized “folds”, try to predict each fold using the rest of the data
- \(hv\)-block CV: \(v\)-fold with a buffer
- etc., et.
The moral
- Never care about how good the in-sample fit is (\(R^2\), \(R^2_{adj}\), etc.)
- Always care about ability to predict new data
Summing up
- If the trend is smooth, we can estimate it by smoothing
- Every smoother is biased towards some patterns and against others
- Properties of the fitted values come from the weights
- Fluctuations are residuals after removing a trend
- De-trending can create correlations
- We decide how to smooth by cross-validation