\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\TrueRegFunc}{\mu} \newcommand{\EstRegFunc}{\widehat{\TrueRegFunc}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator{\det}{det} \newcommand{\TrueNoise}{\epsilon} \newcommand{\EstNoise}{\widehat{\TrueNoise}} \newcommand{\SignalNoise}{N} \]

In our previous episodes

General approach to optimal linear prediction
- Predict \(Y\) from \(\vec{Z} = [Z_1, Z_2, \ldots Z_p]\)
- Prediction is \(\alpha + \vec{\beta} \cdot \vec{Z}\)
- Best \(\alpha = \Expect{Y} - \vec{\beta} \cdot \Expect{\vec{Z}}\)
- Best \(\beta = \Var{\vec{Z}}^{-1} \Cov{\vec{Z}, Y}\)
Previously:
- Interpolating and extrapolating time series
- Interpolating and extrapolating random fields
Today:
- Extracting a random signal from random noise
- Extracting a deterministic, periodic trend from noise

Optimal linear filtering for time series

Given: \(X(t_1), X(t_2), \ldots X(t_n)\)
Assumption: \(X(t) = S(t) + \SignalNoise(t)\), with \(\Expect{\SignalNoise(t)} = 0\)
- Signal \(S(t)\) plus noise \(\SignalNoise(t)\)
- NOTE: Assuming signal \(S\) is also a random process!
Desired: prediction of \(S(t_0)\)
- This is filtering

Optimal linear filtering for time series

\[\begin{eqnarray} \hat{S}(t_0) & = & \alpha + \vec{\beta} \cdot \left[\begin{array}{c} X(t_1) \\ X(t_2) \\ \vdots \\ X(t_n) \end{array}\right]\\ \alpha & = & \Expect{S(t_0)} - \vec{\beta} \cdot \left[\begin{array}{c} \Expect{X(t_1)}\\ \Expect{X(t_2)} \\ \vdots \\ \Expect{X(t_n)}\end{array}\right] ~ \text{(goes away if everything's centered)}\\ \vec{\beta} & = & {\left[\begin{array}{cccc} \Var{X(t_1)} & \Cov{X(t_1), X(t_2)} & \ldots & \Cov{X(t_1), X(t_n)}\\ \Cov{X(t_1), X(t_2)} & \Var{X(t_2)} & \ldots & \Cov{X(t_2), X(t_n)}\\ \vdots & \vdots & \ldots & \vdots\\ \Cov{X(t_1), X(t_n)} & \Cov{X(t_2), X(t_n)} & \ldots & \Var{X(t_n)}\end{array}\right]}^{-1} \left[\begin{array}{c} \Cov{S(t_0), X(t_1)}\\ \Cov{S(t_0), X(t_2)}\\ \vdots \\ \Cov{S(t_0), X(t_n)}\end{array}\right] \end{eqnarray}\]

This should look familiar

It’s almost the same as optimal linear prediction for time series
It’s exactly the same as optimal linear prediction of one time series from another

Simple case, to build intuition

Assume \(S\) is stationary, with \(\Expect{S(t)} = 0\), autocovariance function \(\gamma(h)\)
Assume \(\SignalNoise\) is stationary, with variance \(\sigma^2\) and no autocorrelation or correlation with \(S\)

\[\begin{eqnarray} \Cov{X(t), X(t+h)} & = & \Cov{S(t)+\SignalNoise(t), S(t+h)+\SignalNoise(t+h)}\\ & = & \Cov{S(t), S(t+h)} + \Cov{\SignalNoise(t), \SignalNoise(t+h)}\\ & & + \Cov{S(t), \SignalNoise(t+h)} + \Cov{S(t+h), \SignalNoise(t)}\\ & = & \gamma(h) + \sigma^2\mathbf{1}(h=0)\\ \Cov{S(t), X(t+h)} & = & \Cov{S(t), S(t+h) + \SignalNoise(t+h)}\\ & = & \Cov{S(t), S(t+h)} + \Cov{S(t), \SignalNoise(t+h)}\\ & = & \gamma(h) \end{eqnarray}\]

One observation, one estimate

Given: \(X(t)\)
Desired: \(S(t)\)

\[\begin{eqnarray} \hat{S}(t) & = & \beta X(t)\\ \beta & = & \frac{\Cov{S(t), X(t)}}{\Var{X(t)}}\\ &= & \frac{\gamma(0)}{\gamma(0) + \sigma^2} < 1\\ \hat{S}(t) & = & \frac{\gamma(0)}{\gamma(0) + \sigma^2} X(t) \end{eqnarray}\]

Two observations, one estimate

Given: \(X(t-1)\), \(X(t)\)
Desired: \(S(t)\)
Estimate is \[\begin{eqnarray} \hat{S}(t) & = & \beta_0 X(t) + \beta_1 X(t-1)\\ \left[ \begin{array}{cc} \beta_0 \\ \beta_1 \end{array}\right] & = & \left[\begin{array}{cc} \Var{X(t)} & \Cov{X(t), X(t-1)}\\ \Cov{X(t-1), X(t)} & \Var{X(t-1)} \end{array}\right]^{-1} \left[\begin{array}{c} \Cov{X(t), S(t)} \\ \Cov{X(t-1), S(t)}\end{array} \right] \end{eqnarray}\]

EXERCISE: Find \(\beta_0, \beta_1\) in terms of the function \(\gamma\) and \(\sigma^2\)

Hint 1: \(\left[\begin{array}{cc} a & b \\ c & d\end{array}\right]^{-1} = \frac{1}{ad - bc}\left[\begin{array}{cc} d & -b \\ -c & a\end{array}\right]\)

Hint 2: \(\Cov{X(t), X(t+h)} = \gamma(h) + \sigma^2\mathbf{1}(h=0)\) and \(\Cov{S(t), X(t+h)} = \gamma(h)\)

Two observations, one estimate

\[\begin{eqnarray} \left[ \begin{array}{cc} \beta_0 \\ \beta_1 \end{array}\right] & = & \left[\begin{array}{cc} \Var{X(t)} & \Cov{X(t), X(t-1)}\\ \Cov{X(t-1), X(t)} & \Var{X(t-1)}\end{array}\right]^{-1} \left[\begin{array}{c} \Cov{X(t), S(t)} \\ \Cov{X(t-1), S(t)}\end{array} \right]\\ & = & \left[\begin{array}{cc} \gamma(0)+\sigma^2 & \gamma(1)\\ \gamma(1) & \gamma(0)+\sigma^2\end{array}\right]^{-1} \left[\begin{array}{c} \gamma(0) \\ \gamma(1) \end{array} \right]\\ &=& \frac{1}{(\gamma(0)+\sigma^2)^2-\gamma^2(1)}\left[\begin{array}{cc} \gamma(0)+\sigma^2 & -\gamma(1)\\ -\gamma(1) & \gamma(0)+\sigma^2\end{array}\right]\left[\begin{array}{c} \gamma(0) \\ \gamma(1) \end{array} \right]\\ & = & \frac{1}{(\gamma(0)+\sigma^2)^2-\gamma^2(1)}\left[\begin{array}{c} (\gamma(0)+\sigma^2)\gamma(0) - \gamma^2(1) \\ \gamma(1)\sigma^2\end{array}\right] \end{eqnarray}\]

What’s going on here?

Slope on \(X(t) < 1\) \(\Rightarrow\) Estimate of \(S(t)\) should not change 1-for-1 with \(X(t)\)
Weak noise \(\Rightarrow\) pay more attention to the data
Strong correlation in the signal \(\Rightarrow\) pay more attention to \(X(t-1)\) when estimating \(S(t)\)
- Reduces to the one observation, one signal case when \(\gamma(1) = 0\)

The general pattern: the Wiener filter

Assume \(X(t) = S(t) + \SignalNoise(t)\)
Assume \(\SignalNoise\) uncorrelated with \(S\)
Then \(\Cov{X(t), X(t+h)} = \Cov{S(t), S(t+h)} + \Cov{\SignalNoise(t), \SignalNoise(t+h)}\)
and \(\Cov{S(t), X(t+h)} = \Cov{S(t), S(t+h)}\)
Calculate coefficients in the usual way
Wiener filter = applying these coefficients to the same series to get \(\hat{S}(t)\)
Need to know, or guess, at the correlations of either the signal or the noise to do de-noising
- Do not need any direct observations of the signal

Extracting periodic or cyclic patterns

Back to the \(X(t) = \TrueRegFunc(t) + \TrueNoise(t)\) representation
Trend may be periodic, \(\TrueRegFunc(t) = \TrueRegFunc(t+T)\) for period \(T\)
- Also call this cyclic or seasonal
- Can also have \(\TrueRegFunc(t) = m(t) + p(t)\) with \(p\) periodic and \(m\) not
How can we use this, if we know \(T\)?

Estimating the periodic component

For simplicity, say we have \(X(0), X(1), \ldots X(n)\), \(n=kT\)

\[\begin{eqnarray} \EstRegFunc(0) & = & \frac{1}{k-1}\sum_{i=0}^{k-1}{X(iT)}\\ \EstRegFunc(1) & = & \frac{1}{k-1}\sum_{i=0}^{k -1}{X(1+iT)}\\ \EstRegFunc(t) & = & \frac{1}{k-1}\sum_{i=0}^{k-1}{X(t+iT)}\\ & \vdots & \\ \EstRegFunc(T-1) & = & \frac{1}{k-1}\sum_{i=0}^{k-1}{X(T-1+iT)} \end{eqnarray}\]

Estimating the periodic component

library(gstat)
data(wind)
dub.jan1 <- wind[wind$month == 1 & wind$day == 1, "DUB"]
mean(dub.jan1)

## [1] 12.67722

colMeans(wind[wind$month == 1 & wind$day == 1, -(1:3)])

##       RPT       VAL       ROS       KIL       SHA       BIR       DUB 
## 14.610000 11.200556 12.931667  6.845556 10.798333  7.678333 12.677222 
##       CLA       MUL       CLO       BEL       MAL 
##  8.952778 10.057222  9.936667 13.987222 18.536667

Now repeat for every day of the calendar

After extracting the periodic component

Observation = trend + fluctuation \[\begin{eqnarray} X(t) & = & \TrueRegFunc(t) + \TrueNoise(t)\\ X(t) & = & \EstRegFunc(t) + \EstNoise(t)\\ \EstNoise(t) & = & X(t) - \EstRegFunc(t) \end{eqnarray}\]

If we think \(\TrueRegFunc\) is periodic, we can estimate it by averaging over periods
Then remove the periodic component from the observations
- a.k.a. “Seasonally adjusted” values

Summary

Extracting a random signal from noise looks just like any other linear prediction problem
- \(X(t) = S(t) + \SignalNoise(t)\), predict \(S(t)\) from \(X\)
- Need to make assumptions about the noise, \(\Cov{\SignalNoise(t), \SignalNoise(t+h)}\)
Extracting periodic components by averaging
- \(X(t) = \TrueRegFunc(t) + \TrueNoise(t)\), with \(\TrueRegFunc\) deterministic
- Periodic/cyclic/seasonal component means \(\TrueRegFunc(t) = \TrueRegFunc(t+T)\)
  - Estimate the periodic trend by averaging over periods
  - Need to know period \(T\)
Next week: how to analyze any process into a sum of periodic components with multiple periods
- Helps with identifying \(T\)
- Helps with covariance estimation
- Helps with any sort of linear filtering
- Changes how you see the world

Trend + periodicity + fluctuations

\[\begin{eqnarray} X(t) & = & m(t) + p(t) + \TrueNoise(t)\\ & = & \text{long-run trend} + \text{periodic component} + \text{random fluctuations} \end{eqnarray}\]

Smooth and remove the smooth trend
Average over periods and remove the periodic component
What’s left is \(\approx\) fluctuations

Linear Methods for Noise Removal and Filtering

In our previous episodes

Optimal linear filtering for time series

Optimal linear filtering for time series

This should look familiar

Simple case, to build intuition

One observation, one estimate

Two observations, one estimate

Two observations, one estimate

What’s going on here?

The general pattern: the Wiener filter

Extracting periodic or cyclic patterns

Estimating the periodic component

Estimating the periodic component

After extracting the periodic component

Summary

Trend + periodicity + fluctuations