Trends and Smoothing II
36-467/36-667, Fall 2020
10 September 2020
\[
\newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]}
\newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]}
\newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]}
\newcommand{\TrueRegFunc}{\mu}
\newcommand{\EstRegFunc}{\widehat{\TrueRegFunc}}
\DeclareMathOperator{\tr}{tr}
\DeclareMathOperator*{\argmin}{argmin}
\DeclareMathOperator{\dof}{DoF}
\DeclareMathOperator{\det}{det}
\newcommand{\TrueNoise}{\epsilon}
\newcommand{\EstNoise}{\widehat{\TrueNoise}}
\]
In our last episode…
- Data \(X(t) = \TrueRegFunc(t) + \TrueNoise(t)\)
- \(\TrueRegFunc\) deterministic (=trend), \(\TrueNoise\) stochastic and mean-zero (=fluctuations)
- Wanted: estimates of \(\TrueRegFunc\) and/or \(\TrueNoise\) from one data set
- Hope: \(\TrueRegFunc\) is a smooth function \(\Rightarrow\) average nearby \(X\)’s
- Linear smoother: \(\EstRegFunc(t) = \sum_{j=1}^{n}{w(t, t_j) x_j}\)
- Fitted values on the data \(\mathbf{\EstRegFunc} = \mathbf{w}\mathbf{x}\)
- \(\mathbf{w}\) determines the properties of the fitted values
- \(\mathbf{w}\) has \(n\) eigenvalues \(\lambda_1 \geq \lambda_2 \geq \ldots \geq \lambda_n\), and eigenvectors \(\mathbf{v}_1, \mathbf{v}_2, \ldots \mathbf{v}_n\)
- Components of \(\mathbf{x}\) which look like eigenvectors \(\mathbf{v}\) with big eigenvalues \(\lambda\) will be preserved by the smoothing (\(\lambda = 1\)), or only shrunk a little (\(\lambda\) nearly 1)
- Components of \(\mathbf{x}\) which look like eigenvectors with small eigenvalues will be shrunk a lot by smoothing (\(\lambda\) near zero) or completely eliminated (\(\lambda = 0\))
- The smoother is biased towards finding trends that look like its leading eigenvectors
- degrees of freedom \(=\) covariance between data and fitted values \(=\) sum of the eigenvalues of \(\mathbf{w}\)
An example to make things concrete: GDP Per Capita
- GDP = gross domestic product = annual total value money value of all goods and services sold in a country
- GDP per capita = average annual income per person
- US GDP per capita, adjusted for inflation, measured quarterly:
An example to make things concrete: GDPPC growth rate
Trend=mean?
- One possible trend = constant at the global mean
- Interpretation: steady growth plus random fluctuations
- Might make more sense to omit some extreme values when estimating the mean?
The math for taking the trend to be the global mean
- \(\mathbf{w}\) is the \(n\times n\) matrix with \(1/n\) everywhere
- You can check: eigenvalues are 1 and 0 (repeated \(n-1\) times)
- You can check: eigenvector for \(\lambda=1\) is \([1 1 \ldots 1]\)
- Degrees of freedom = 1
Trend=short moving average?
- Moving average going out 0.5 yr (=2 quarters) on either side:
The math for moving averages
- Every entry in \(\mathbf{w}\) is \(1/k\) (here \(1/5\)) or 0
- \(1/k\) along the diagonal (why?)
- Degrees of freedom = \(\tr{\mathbf{w}} = n/k\)
- Leading eigenvalue is always 1
- Generally true when we do (weighted) averages; see handout
- Other eigenvalues get smaller as width of the averaging window grows
- Leading eigenvector is constant (like with the global mean)
- After that we get sine waves (as we saw last time)
- Bigger eigenvalues \(\Leftrightarrow\) longer wavelength sine waves
- “low-pass filter”
- The moving average “likes” (preserves, is biased towards) patterns in the data that look like slowly-changing sine waves
- Or sums of slowly-changing sine waves
- This bias gets stronger as the width of the averaging grows
Trend=wider moving average?
- Moving average going out 2.5 yr (=10 quarters) on either side
- Correlations are pretty small after \(\approx 10\) quarters
Trend=one-sided moving average?
- Might seem weird to have 2020 affecting trend for 2018…
- Moving average over the previous four years (=16 quarters)
Data = trend + fluctuation
- \(X(t) = \TrueRegFunc(t) + \TrueNoise(t)\)
- \(\Rightarrow\) \(\TrueNoise(t) = X(t) - \TrueRegFunc(t)\)
- \(\Rightarrow\) \(\EstNoise(t) \equiv X(t) - \EstRegFunc(t) =\) residuals
Some residuals
Residuals from using a constant trend:
Some residuals
Residuals from using an MA(5):
Some residuals
Residuals from using an MA(16) on the past:
Some math of residuals
\[\begin{eqnarray}
\mathbf{\EstNoise} & = & \mathbf{x} - \mathbf{\EstRegFunc}\\
& = & \mathbf{x} - \mathbf{w}\mathbf{x}\\
& = & (\mathbf{I} - \mathbf{w})\mathbf{x}
\end{eqnarray}\]
- Convince yourself: \(\mathbf{I}-\mathbf{w}\) has same eigenvectors as \(\mathbf{w}\), but eigenvalues \(1-\lambda\)
Expected residuals
\[\begin{eqnarray}
\Expect{\mathbf{\EstNoise}} & = & \Expect{(\mathbf{I}-\mathbf{w})\mathbf{X}}\\
& = & (\mathbf{I}-\mathbf{w})\mathbf{\TrueRegFunc}
\end{eqnarray}\]
Biased trend estimate \(\Leftrightarrow\) biased fluctuation estimate
Variance and covariance of the residuals
\[
\Var{\mathbf{\EstNoise}} = (\mathbf{I}-\mathbf{w}) \Var{\mathbf{\epsilon}} (\mathbf{I}-\mathbf{w})^T
\]
IF \(\Var{\mathbf{\epsilon}} = \sigma^2 \mathbf{I}\), THEN \(\Var{\mathbf{\EstNoise}}= \sigma^2 (\mathbf{I}-\mathbf{w})(\mathbf{I}-\mathbf{w})^T\)
NB: Correlations from off-diagonal entries in \(\mathbf{w}\), even though there are no correlations for the true fluctuations
- The way that smoothing creates correlations in fitted and detrended values is sometimes called the Yule-Slutsky effect
- See the handout for the origin of the name and more details
Splines
\[
\EstRegFunc = \argmin_{m}{\frac{1}{n}\sum_{i=1}^{n}{(x_i - m(t_i))^2} + \lambda\int{(m^{\prime\prime}(t))^2 dt}}
\]
- This \(\lambda\) not an eigenvalue (sorry)
- it’s more like the price at which we’ll trade more curvature (\(m^{\prime\prime}\)) for less mean squared error
- Fit to the data points vs. over-all curvature
- Minimization is over all functions
- Solution is always a piecewise cubic polynomial, but continuous, with continuous 1st and 2nd derivatives
- \(\lambda \rightarrow 0\) \(\Rightarrow\) Straight lines between data points
- \(\lambda \rightarrow \infty\) \(\Rightarrow\) Global linear fit
- \(\downarrow\) degrees of freedom as \(\uparrow \lambda\)
- Easiest R command:
smooth.spline()
(described in detail in the handout)
How do we pick \(\lambda\)?
- Want trend to predict not-yet-seen stuff (interpolate, extrapolate, filter)
- A good \(\lambda\) predicts new stuff well
- Hold out part of the data and try to predict that from the rest
Leave-one-out cross-validation (LOOCV)
- For each of the \(n\) data points:
- Fit using every data point except \(i\), get \(\EstRegFunc^{(-i)}\);
- Find \(\EstRegFunc^{(-i)}(t_i)\);
- Find \((x_i - \EstRegFunc^{(-i)}(t_i))^2\).
Average over all data points, \(n^{-1}\sum_{i=1}^{n}{(x_i - \EstRegFunc^{(-i)}(t_i))^2}\)
- Low LOOCV \(\Leftrightarrow\) good ability to predict new data
This is what smooth.spline
does automatically
Leave-one-out cross-validation (LOOCV)
Don’t have to re-fit linear smoothers \(n\) times
\[\begin{eqnarray}
\EstRegFunc^{(-i)}(t_i) &= & \frac{({\mathbf{w} \mathbf{x})}_i - w_{ii} x_i}{1-w_{ii}}\\
x_i - \EstRegFunc^{(-i)}(t_i) & = & \frac{x_i - \EstRegFunc(t_i)}{1-w_{ii}}\\
LOOCV & = & \frac{1}{n}\sum_{i=1}^{n}{\left(\frac{x_i-\EstRegFunc(t_i)}{1-w_{ii}}\right)^2}
\end{eqnarray}\]
Many variants
- \(h\)-block CV: omit a buffer of radius \(h\) around the hold-out point from the training set
- \(k\)- or \(v\)-fold CV: divide data into \(k\) equal-sized “folds”, try to predict each fold using the rest of the data
- \(hv\)-block CV: \(v\)-fold with a buffer
- etc., etc.
The moral on CV
- Never care about how good the in-sample fit is (\(R^2\), \(R^2_{adj}\), etc.)
- Always care about ability to predict new data
Spline smoothing of economic growth
smooth.spline()
is pretty robust but chokes on NA values:
growth.ss <- with(na.omit(gdppc), smooth.spline(x = year, y = growth))
growth.ss
## Call:
## smooth.spline(x = year, y = growth)
##
## Smoothing Parameter spar= 0.2029895 lambda= 4.923235e-08 (16 iterations)
## Equivalent Degrees of Freedom (Df): 83.07802
## Penalized Criterion (RSS): 0.1871268
## GCV: 0.001244193
Summing up
- If the trend is smooth, we can estimate it by smoothing
- Every smoother is biased towards some patterns and against others
- Properties of the fitted values come from the weights
- Fluctuations are estimated as residuals after removing a trend
- De-trending can create correlations
- We decide how to smooth by cross-validation