\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\TrueRegFunc}{\mu} \newcommand{\EstRegFunc}{\widehat{\TrueRegFunc}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator{\dof}{DoF} \DeclareMathOperator{\det}{det} \newcommand{\TrueNoise}{\epsilon} \newcommand{\EstNoise}{\widehat{\TrueNoise}} \]

In our last episode…

Data \(X(t) = \TrueRegFunc(t) + \TrueNoise(t)\)
\(\TrueRegFunc\) deterministic (=trend), \(\TrueNoise\) stochastic and mean-zero (=fluctuations)
Wanted: estimates of \(\TrueRegFunc\) and/or \(\TrueNoise\) from one data set
Hope: \(\TrueRegFunc\) is a smooth function \(\Rightarrow\) average nearby \(X\)’s
Linear smoother: \(\EstRegFunc(t) = \sum_{j=1}^{n}{w(t, t_j) x_j}\)
Fitted values on the data \(\mathbf{\EstRegFunc} = \mathbf{w}\mathbf{x}\)
\(\mathbf{w}\) determines the properties of the fitted values
- \(\mathbf{w}\) has \(n\) eigenvalues \(\lambda_1 \geq \lambda_2 \geq \ldots \geq \lambda_n\), and eigenvectors \(\mathbf{v}_1, \mathbf{v}_2, \ldots \mathbf{v}_n\)
- Components of \(\mathbf{x}\) which look like eigenvectors \(\mathbf{v}\) with big eigenvalues \(\lambda\) will be preserved by the smoothing (\(\lambda = 1\)), or only shrunk a little (\(\lambda\) nearly 1)
- Components of \(\mathbf{x}\) which look like eigenvectors with small eigenvalues will be shrunk a lot by smoothing (\(\lambda\) near zero) or completely eliminated (\(\lambda = 0\))
- The smoother is biased towards finding trends that look like its leading eigenvectors
- degrees of freedom \(=\) covariance between data and fitted values \(=\) sum of the eigenvalues of \(\mathbf{w}\)

An example to make things concrete: GDP Per Capita

GDP = gross domestic product = annual total value money value of all goods and services sold in a country
GDP per capita = average annual income per person
US GDP per capita, adjusted for inflation, measured quarterly:

An example to make things concrete: GDPPC growth rate

Logarithmic growth rate:

Trend=mean?

One possible trend = constant at the global mean
- Interpretation: steady growth plus random fluctuations

Might make more sense to omit some extreme values when estimating the mean?

The math for taking the trend to be the global mean

\(\mathbf{w}\) is the \(n\times n\) matrix with \(1/n\) everywhere
You can check: eigenvalues are 1 and 0 (repeated \(n-1\) times)
You can check: eigenvector for \(\lambda=1\) is \([1 1 \ldots 1]\)
Degrees of freedom = 1

Trend=short moving average?

Moving average going out 0.5 yr (=2 quarters) on either side:

The math for moving averages

Every entry in \(\mathbf{w}\) is \(1/k\) (here \(1/5\)) or 0
- \(1/k\) along the diagonal (why?)
Degrees of freedom = \(\tr{\mathbf{w}} = n/k\)
Leading eigenvalue is always 1
- Generally true when we do (weighted) averages; see handout
- Other eigenvalues get smaller as width of the averaging window grows
Leading eigenvector is constant (like with the global mean)
After that we get sine waves (as we saw last time)
- Bigger eigenvalues \(\Leftrightarrow\) longer wavelength sine waves
- “low-pass filter”
The moving average “likes” (preserves, is biased towards) patterns in the data that look like slowly-changing sine waves
- Or sums of slowly-changing sine waves
- This bias gets stronger as the width of the averaging grows

Trend=wider moving average?

Moving average going out 2.5 yr (=10 quarters) on either side
- Correlations are pretty small after \(\approx 10\) quarters

Trend=one-sided moving average?

Might seem weird to have 2020 affecting trend for 2018…
Moving average over the previous four years (=16 quarters)

Data = trend + fluctuation

\(X(t) = \TrueRegFunc(t) + \TrueNoise(t)\)
\(\Rightarrow\) \(\TrueNoise(t) = X(t) - \TrueRegFunc(t)\)
\(\Rightarrow\) \(\EstNoise(t) \equiv X(t) - \EstRegFunc(t) =\) residuals

Some residuals

Residuals from using a constant trend:

Some residuals

Residuals from using an MA(5):

Some residuals

Residuals from using an MA(16) on the past:

Some math of residuals

\[\begin{eqnarray} \mathbf{\EstNoise} & = & \mathbf{x} - \mathbf{\EstRegFunc}\\ & = & \mathbf{x} - \mathbf{w}\mathbf{x}\\ & = & (\mathbf{I} - \mathbf{w})\mathbf{x} \end{eqnarray}\]

Convince yourself: \(\mathbf{I}-\mathbf{w}\) has same eigenvectors as \(\mathbf{w}\), but eigenvalues \(1-\lambda\)

Expected residuals

\[\begin{eqnarray} \Expect{\mathbf{\EstNoise}} & = & \Expect{(\mathbf{I}-\mathbf{w})\mathbf{X}}\\ & = & (\mathbf{I}-\mathbf{w})\mathbf{\TrueRegFunc} \end{eqnarray}\]

Biased trend estimate \(\Leftrightarrow\) biased fluctuation estimate

Variance and covariance of the residuals

\[ \Var{\mathbf{\EstNoise}} = (\mathbf{I}-\mathbf{w}) \Var{\mathbf{\epsilon}} (\mathbf{I}-\mathbf{w})^T \]

IF \(\Var{\mathbf{\epsilon}} = \sigma^2 \mathbf{I}\), THEN \(\Var{\mathbf{\EstNoise}}= \sigma^2 (\mathbf{I}-\mathbf{w})(\mathbf{I}-\mathbf{w})^T\)

NB: Correlations from off-diagonal entries in \(\mathbf{w}\), even though there are no correlations for the true fluctuations

The way that smoothing creates correlations in fitted and detrended values is sometimes called the Yule-Slutsky effect
- See the handout for the origin of the name and more details

Splines

\[ \EstRegFunc = \argmin_{m}{\frac{1}{n}\sum_{i=1}^{n}{(x_i - m(t_i))^2} + \lambda\int{(m^{\prime\prime}(t))^2 dt}} \]

This \(\lambda\) not an eigenvalue (sorry)
- it’s more like the price at which we’ll trade more curvature (\(m^{\prime\prime}\)) for less mean squared error
Fit to the data points vs. over-all curvature
Minimization is over all functions
Solution is always a piecewise cubic polynomial, but continuous, with continuous 1st and 2nd derivatives
\(\lambda \rightarrow 0\) \(\Rightarrow\) Straight lines between data points
\(\lambda \rightarrow \infty\) \(\Rightarrow\) Global linear fit
\(\downarrow\) degrees of freedom as \(\uparrow \lambda\)
Easiest R command: smooth.spline() (described in detail in the handout)

How do we pick \(\lambda\)?

Want trend to predict not-yet-seen stuff (interpolate, extrapolate, filter)
A good \(\lambda\) predicts new stuff well
Hold out part of the data and try to predict that from the rest

Leave-one-out cross-validation (LOOCV)

For each of the \(n\) data points:
- Fit using every data point except \(i\), get \(\EstRegFunc^{(-i)}\);
- Find \(\EstRegFunc^{(-i)}(t_i)\);
- Find \((x_i - \EstRegFunc^{(-i)}(t_i))^2\).
Average over all data points, \(n^{-1}\sum_{i=1}^{n}{(x_i - \EstRegFunc^{(-i)}(t_i))^2}\)
Low LOOCV \(\Leftrightarrow\) good ability to predict new data
This is what smooth.spline does automatically

Leave-one-out cross-validation (LOOCV)

Don’t have to re-fit linear smoothers \(n\) times

\[\begin{eqnarray} \EstRegFunc^{(-i)}(t_i) &= & \frac{({\mathbf{w} \mathbf{x})}_i - w_{ii} x_i}{1-w_{ii}}\\ x_i - \EstRegFunc^{(-i)}(t_i) & = & \frac{x_i - \EstRegFunc(t_i)}{1-w_{ii}}\\ LOOCV & = & \frac{1}{n}\sum_{i=1}^{n}{\left(\frac{x_i-\EstRegFunc(t_i)}{1-w_{ii}}\right)^2} \end{eqnarray}\]

Many variants

\(h\)-block CV: omit a buffer of radius \(h\) around the hold-out point from the training set
\(k\)- or \(v\)-fold CV: divide data into \(k\) equal-sized “folds”, try to predict each fold using the rest of the data
\(hv\)-block CV: \(v\)-fold with a buffer
etc., etc.

The moral on CV

Never care about how good the in-sample fit is (\(R^2\), \(R^2_{adj}\), etc.)
Always care about ability to predict new data

Spline smoothing of economic growth

smooth.spline() is pretty robust but chokes on NA values:

growth.ss <- with(na.omit(gdppc), smooth.spline(x = year, y = growth))
growth.ss

## Call:
## smooth.spline(x = year, y = growth)
## 
## Smoothing Parameter  spar= 0.2029895  lambda= 4.923235e-08 (16 iterations)
## Equivalent Degrees of Freedom (Df): 83.07802
## Penalized Criterion (RSS): 0.1871268
## GCV: 0.001244193

Summing up

If the trend is smooth, we can estimate it by smoothing
Every smoother is biased towards some patterns and against others
Properties of the fitted values come from the weights
Fluctuations are estimated as residuals after removing a trend
De-trending can create correlations
We decide how to smooth by cross-validation

Trends and Smoothing II

In our last episode…

An example to make things concrete: GDP Per Capita

An example to make things concrete: GDPPC growth rate

Trend=mean?

The math for taking the trend to be the global mean

Trend=short moving average?

The math for moving averages

Trend=wider moving average?

Trend=one-sided moving average?

Data = trend + fluctuation

Some residuals

Some residuals

Some residuals

Some math of residuals

Expected residuals

Variance and covariance of the residuals

Splines

How do we pick \(\lambda\)?

Leave-one-out cross-validation (LOOCV)

Leave-one-out cross-validation (LOOCV)

Many variants

The moral on CV

Spline smoothing of economic growth

Summing up