\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\TrueRegFunc}{\mu} \newcommand{\EstRegFunc}{\widehat{\TrueRegFunc}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator{\det}{det} \newcommand{\TrueNoise}{\epsilon} \newcommand{\EstNoise}{\widehat{\TrueNoise}} \newcommand{\SignalNoise}{N} \]

In our previous episodes

General approach to optimal linear prediction
- Predict \(Y\) from \(\vec{Z} = [Z_1, Z_2, \ldots Z_p]\)
- Prediction is \(\alpha + \vec{\beta} \cdot \vec{Z}\)
- Best \(\alpha = \Expect{Y} - \vec{\beta} \cdot \Expect{\vec{Z}}\)
- Best \(\vec{\beta} = \Var{\vec{Z}}^{-1} \Cov{\vec{Z}, Y}\)
Previously:
- Interpolating and extrapolating time series
- Interpolating and extrapolating random fields
Today:
- Extracting a random signal from random noise
- Extracting a deterministic, periodic trend from noise

Optimal linear filtering for time series

Given: \(X(t_1), X(t_2), \ldots X(t_n)\)
Assumption: \(X(t) = S(t) + \SignalNoise(t)\), with \(\Expect{\SignalNoise(t)} = 0\)
- Signal \(S(t)\) plus noise \(\SignalNoise(t)\)
- NOTE: We’re assuming signal \(S\) is also a random process!
Desired: prediction of \(S(t_0)\)
- This is filtering

Optimal linear filtering for time series

\[\begin{eqnarray} \hat{S}(t_0) & = & \alpha + \vec{\beta} \cdot \left[\begin{array}{c} X(t_1) \\ X(t_2) \\ \vdots \\ X(t_n) \end{array}\right]\\ \alpha & = & \Expect{S(t_0)} - \vec{\beta} \cdot \left[\begin{array}{c} \Expect{X(t_1)}\\ \Expect{X(t_2)} \\ \vdots \\ \Expect{X(t_n)}\end{array}\right]\\ \vec{\beta} & = & {\left[\begin{array}{cccc} \Var{X(t_1)} & \Cov{X(t_1), X(t_2)} & \ldots & \Cov{X(t_1), X(t_n)}\\ \Cov{X(t_1), X(t_2)} & \Var{X(t_2)} & \ldots & \Cov{X(t_2), X(t_n)}\\ \vdots & \vdots & \ldots & \vdots\\ \Cov{X(t_1), X(t_n)} & \Cov{X(t_2), X(t_n)} & \ldots & \Var{X(t_n)}\end{array}\right]}^{-1} \left[\begin{array}{c} \Cov{S(t_0), X(t_1)}\\ \Cov{S(t_0), X(t_2)}\\ \vdots \\ \Cov{S(t_0), X(t_n)}\end{array}\right] \end{eqnarray}\]

This should look familiar

It’s almost the same as optimal linear prediction for time series
It’s exactly the same as optimal linear prediction of one time series from another

Simple case, to build intuition

Assume \(S\) is stationary, with \(\Expect{S(t)} = 0\), autocovariance function \(\gamma(h)\)
Assume \(\SignalNoise\) is stationary, with variance \(\sigma^2\) and no autocorrelation or correlation with \(S\)

\[\begin{eqnarray} \Cov{X(t), X(t+h)} & = & \Cov{S(t)+\SignalNoise(t), S(t+h)+\SignalNoise(t+h)}\\ & = & \Cov{S(t), S(t+h)} + \Cov{\SignalNoise(t), \SignalNoise(t+h)}\\ & & + \Cov{S(t), \SignalNoise(t+h)} + \Cov{S(t+h), \SignalNoise(t)}\\ & = & \gamma(h) + \sigma^2\mathbf{1}(h=0)\\ \Cov{S(t), X(t+h)} & = & \Cov{S(t), S(t+h) + \SignalNoise(t+h)}\\ & = & \Cov{S(t), S(t+h)} + \Cov{S(t), \SignalNoise(t+h)}\\ & = & \gamma(h) \end{eqnarray}\]

(Last time I said it’s common to add an extra “nugget” of auto-covariance at lag 0 for the observations; that’s handling measurement noise)

One observation, one estimate

Given: \(X(t)\)
Desired: \(S(t)\)

\[\begin{eqnarray} \hat{S}(t) & = & \beta X(t)\\ \beta & = & \frac{\Cov{S(t), X(t)}}{\Var{X(t)}}\\ &= & \frac{\gamma(0)}{\gamma(0) + \sigma^2} < 1\\ \hat{S}(t) & = & \frac{\gamma(0)}{\gamma(0) + \sigma^2} X(t) \end{eqnarray}\]

Two observations, one estimate

Given: \(X(t-1)\), \(X(t)\)
Desired: \(S(t)\)
Estimate is \[\begin{eqnarray} \hat{S}(t) & = & \beta_0 X(t) + \beta_1 X(t-1)\\ \left[ \begin{array}{cc} \beta_0 \\ \beta_1 \end{array}\right] & = & \left[\begin{array}{cc} \Var{X(t)} & \Cov{X(t), X(t-1)}\\ \Cov{X(t-1), X(t)} & \Var{X(t-1)} \end{array}\right]^{-1} \left[\begin{array}{c} \Cov{X(t), S(t)} \\ \Cov{X(t-1), S(t)}\end{array} \right] \end{eqnarray}\]

Two observations, one estimate

\[\begin{eqnarray} \left[ \begin{array}{cc} \beta_0 \\ \beta_1 \end{array}\right] & = & \left[\begin{array}{cc} \Var{X(t)} & \Cov{X(t), X(t-1)}\\ \Cov{X(t-1), X(t)} & \Var{X(t-1)}\end{array}\right]^{-1} \left[\begin{array}{c} \Cov{X(t), S(t)} \\ \Cov{X(t-1), S(t)}\end{array} \right]\\ & = & \left[\begin{array}{cc} \gamma(0)+\sigma^2 & \gamma(1)\\ \gamma(1) & \gamma(0)+\sigma^2\end{array}\right]^{-1} \left[\begin{array}{c} \gamma(0) \\ \gamma(1) \end{array} \right]\\ &=& \frac{1}{(\gamma(0)+\sigma^2)^2-\gamma^2(1)}\left[\begin{array}{cc} \gamma(0)+\sigma^2 & -\gamma(1)\\ -\gamma(1) & \gamma(0)+\sigma^2\end{array}\right]\left[\begin{array}{c} \gamma(0) \\ \gamma(1) \end{array} \right]\\ & = & \frac{1}{(\gamma(0)+\sigma^2)^2-\gamma^2(1)}\left[\begin{array}{c} (\gamma(0)+\sigma^2)\gamma(0) - \gamma^2(1) \\ \gamma(1)\sigma^2\end{array}\right] \end{eqnarray}\]

What’s going on here?

Slope on \(X(t) < 1\) \(\Rightarrow\) Estimate of \(S(t)\) should not change 1-for-1 with \(X(t)\)
Weak noise \(\Rightarrow\) pay more attention to the data
Strong correlation in the signal \(\Rightarrow\) pay more attention to \(X(t-1)\) when estimating \(S(t)\)
- Reduces to the one observation, one signal case when \(\gamma(1) = 0\)

The general pattern: the Wiener filter

Assume \(X(t) = S(t) + \SignalNoise(t)\)
Assume \(\SignalNoise\) uncorrelated with \(S\)
Then \(\Cov{X(t), X(t+h)} = \Cov{S(t), S(t+h)} + \Cov{\SignalNoise(t), \SignalNoise(t+h)}\)
and \(\Cov{S(t), X(t+h)} = \Cov{S(t), S(t+h)}\)
Calculate coefficients in the usual way
Wiener filter = applying these coefficients to the \(X(t)\) series to get \(\hat{S}(t)\)
Need to know, or guess, at the correlations of either the signal or the noise to do de-noising
- Do not need any direct observations of the signal

Extracting periodic or cyclic patterns

Back to the \(X(t) = \TrueRegFunc(t) + \TrueNoise(t)\) representation
Trend may be periodic, \(\TrueRegFunc(t) = \TrueRegFunc(t+T)\) for period \(T\)
- Also call this cyclic or seasonal
- Can also have \(\TrueRegFunc(t) = m(t) + p(t)\) with \(p\) periodic and \(m\) not
How can we use this, if we know \(T\)?

Estimating the periodic component

For simplicity, say we have \(X(0), X(1), \ldots X(n-1)\), \(n=kT\)

\[\begin{eqnarray} \EstRegFunc(0) & = & \frac{1}{k}\sum_{i=0}^{k-1}{X(iT)}\\ \EstRegFunc(1) & = & \frac{1}{k}\sum_{i=0}^{k -1}{X(1+iT)}\\ \EstRegFunc(t) & = & \frac{1}{k}\sum_{i=0}^{k-1}{X(t+iT)}\\ & \vdots & \\ \EstRegFunc(T-1) & = & \frac{1}{k}\sum_{i=0}^{k-1}{X(T-1+iT)} \end{eqnarray}\]

Estimating the periodic component

library(gstat)
data(wind)
dub.jan1 <- wind[wind$month == 1 & wind$day == 1, "DUB"]
mean(dub.jan1)

## [1] 12.67722

colMeans(wind[wind$month == 1 & wind$day == 1, -(1:3)])

##       RPT       VAL       ROS       KIL       SHA       BIR       DUB       CLA 
## 14.610000 11.200556 12.931667  6.845556 10.798333  7.678333 12.677222  8.952778 
##       MUL       CLO       BEL       MAL 
## 10.057222  9.936667 13.987222 18.536667

Now repeat for every day of the calendar

After extracting the periodic component

Observation = trend + fluctuation \[\begin{eqnarray} X(t) & = & \TrueRegFunc(t) + \TrueNoise(t)\\ X(t) & = & \EstRegFunc(t) + \EstNoise(t)\\ \EstNoise(t) & = & X(t) - \EstRegFunc(t) \end{eqnarray}\]

If we think \(\TrueRegFunc\) is periodic, we can estimate it by averaging over periods
Then remove the periodic component from the observations
- a.k.a. “Seasonally adjusted” values

Summary

Extracting a random signal from noise looks just like any other linear prediction problem
- \(X(t) = S(t) + \SignalNoise(t)\), predict \(S(t)\) from \(X\)
- Need to make assumptions about the noise, \(\Cov{\SignalNoise(t), \SignalNoise(t+h)}\)
Extracting periodic components by averaging
- \(X(t) = \TrueRegFunc(t) + \TrueNoise(t)\), with \(\TrueRegFunc\) deterministic
- Periodic/cyclic/seasonal component means \(\TrueRegFunc(t) = \TrueRegFunc(t+T)\)
- Estimate the periodic trend by averaging over periods
- Need to know period \(T\)

Trend + periodicity + fluctuations

\[\begin{eqnarray} X(t) & = & m(t) + p(t) + \TrueNoise(t)\\ & = & \text{long-run trend} + \text{periodic component} + \text{random fluctuations} \end{eqnarray}\]

Smooth and remove the smooth trend
- Often: deliberately under-smooth
Average over periods and remove the periodic component
What’s left is \(\approx\) fluctuations

Backup: Trend + Period for CO2 Concentration

Atmospheric CO2 concentration measured at the Mauna Loa observatory in Hawaii

This data set is famous as the “Keeling curve”, and goes back to the 1950s

Photo: Mauna Loa Observatory by Rick Peterson

Backup: Trend + Period for CO2 Concentration

Clearly some sort of repeating deviation from the long-run trend

Backup: Trend + Period for CO2 Concentration

Extract a linear time trend

Backup: Trend + Period for CO2 Concentration

Residuals from the linear trend = periodic component + fluctuations

Notice that each month clearly has its own distribution
Red line = smoothing spline applied to these data
Trough in carbon dioxide at end of northern hemisphere summer, and peak at end of northern hemisphere winter¹

Backup: Trend + Period for CO2 Concentration

Random fluctuations = residuals after removing the long-run trend and the periodic component

These are very structured-looking because the initial linear trend isn’t that good (there’s acceleration)
Iterative approach: use the initial estimate of the periodic component to get a better idea of the long-run trend

Backup: Trend + Period for CO2 Concentration

We could now go back to re-estimate the annual cycle:

Backup: Trend + Period for CO2 Concentration

Alternatively: if you know (or are willing to guess) a parametric form for the long-run trend, \(m(t) = f(t;\theta)\), you can introduce extra “dummy” (indicator) variables for each phase in the cycle (here, months), and use least squares to estimate \(\theta\) and the coefficients on those dummies
This is going to be somewhat more efficient, both statistically and computationally, than the iterative procedure if you have a good idea of the form of the trend
Here a reasonable guess at the long-run trend would be a quadratic, \(f(t;\theta) = \theta_0 + \theta_1 t + \theta_2 t^2\)

quad.trend.plus.cycle <- lm(co2ppm ~ date + I(as.numeric(date)^2) + factor(month), 
    data = mauna_loa_monthly)
signif(coefficients(quad.trend.plus.cycle), 2)

##           (Intercept)                  date I(as.numeric(date)^2) 
##               3.3e+02               3.0e-03               9.7e-08 
##        factor(month)2        factor(month)3        factor(month)4 
##               6.3e-01               1.4e+00               2.5e+00 
##        factor(month)5        factor(month)6        factor(month)7 
##               3.0e+00               2.2e+00               6.1e-01 
##        factor(month)8        factor(month)9       factor(month)10 
##              -1.5e+00              -3.2e+00              -3.3e+00 
##       factor(month)11       factor(month)12 
##              -2.1e+00              -9.5e-01

Note: one fewer monthly dummy coefficient than the length of the cycle
- Always one fewer dummy coefficients than levels of the factor variable (do you remember why from linear regression?)
- Notice that the lowest (most negative) coefficients are for September and October, and the highest (most positive) are for April-May-June

Backup: Trend + Period for CO2 Concentration

The fitted values will capture the combination of the trend and the periodic cycle:

Backup: Trend + Period for CO2 Concentration

The residuals will be estimates of the fluctuations, which now are a lot more random looking:

Backup: Trend + Period for CO2 Concentration

In photosynthesis, plants use carbon dioxide and water (and sunlight) to make the sugar glucose and oxygen; like us, they later metabolize that sugar for energy, which involves taking in oxygen and releasing carbon dioxide. So plants are going to remove carbon dioxide from the atmosphere, on net, when they photosynthesize more than they burn sugar. In other words, plants will be net carbon dioxide removers when they are green and growing in the spring and summer, and net carbon dioxide releasers when they aren’t, in the fall and winter. Since there are more plants in the northern hemisphere than the southern, because there’s more land mass in the northern hemisphere, the annual carbon dioxide cycle tracks the northern hemisphere seasons rather than the southern.↩

Linear Methods for Noise Removal and Filtering

In our previous episodes

Optimal linear filtering for time series

Optimal linear filtering for time series

This should look familiar

Simple case, to build intuition

One observation, one estimate

Two observations, one estimate

Two observations, one estimate

What’s going on here?

The general pattern: the Wiener filter

Extracting periodic or cyclic patterns

Estimating the periodic component

Estimating the periodic component

After extracting the periodic component

Summary

Trend + periodicity + fluctuations

Backup: Trend + Period for CO2 Concentration

Backup: Trend + Period for CO2 Concentration

Backup: Trend + Period for CO2 Concentration

Backup: Trend + Period for CO2 Concentration

Backup: Trend + Period for CO2 Concentration

Backup: Trend + Period for CO2 Concentration

Backup: Trend + Period for CO2 Concentration

Backup: Trend + Period for CO2 Concentration

Backup: Trend + Period for CO2 Concentration

Backup: Trend + Period for CO2 Concentration

Backup: Trend + Period for CO2 Concentration