Linear Methods for Noise Removal and Filtering
36-467/36-667
1 October 2020 (Lecture 10)
\[
\newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]}
\newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]}
\newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}\left[ #1 \right]}
\newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]}
\newcommand{\TrueRegFunc}{\mu}
\newcommand{\EstRegFunc}{\widehat{\TrueRegFunc}}
\DeclareMathOperator{\tr}{tr}
\DeclareMathOperator*{\argmin}{argmin}
\DeclareMathOperator{\det}{det}
\newcommand{\TrueNoise}{\epsilon}
\newcommand{\EstNoise}{\widehat{\TrueNoise}}
\newcommand{\SignalNoise}{N}
\]
In our previous episodes
- General approach to optimal linear prediction
- Predict \(Y\) from \(\vec{Z} = [Z_1, Z_2, \ldots Z_p]\)
- Prediction is \(\alpha + \vec{\beta} \cdot \vec{Z}\)
- Best \(\alpha = \Expect{Y} - \vec{\beta} \cdot \Expect{\vec{Z}}\)
- Best \(\vec{\beta} = \Var{\vec{Z}}^{-1} \Cov{\vec{Z}, Y}\)
- Previously:
- Interpolating and extrapolating time series
- Interpolating and extrapolating random fields
- Today:
- Extracting a random signal from random noise
- Extracting a deterministic, periodic trend from noise
Optimal linear filtering for time series
- Given: \(X(t_1), X(t_2), \ldots X(t_n)\)
- Assumption: \(X(t) = S(t) + \SignalNoise(t)\), with \(\Expect{\SignalNoise(t)} = 0\)
- Signal \(S(t)\) plus noise \(\SignalNoise(t)\)
- NOTE: We’re assuming signal \(S\) is also a random process!
- Desired: prediction of \(S(t_0)\)
Optimal linear filtering for time series
\[\begin{eqnarray}
\hat{S}(t_0) & = & \alpha + \vec{\beta} \cdot \left[\begin{array}{c} X(t_1) \\ X(t_2) \\ \vdots \\ X(t_n) \end{array}\right]\\
\alpha & = & \Expect{S(t_0)} - \vec{\beta} \cdot \left[\begin{array}{c} \Expect{X(t_1)}\\ \Expect{X(t_2)} \\ \vdots \\ \Expect{X(t_n)}\end{array}\right]\\
\vec{\beta} & = & {\left[\begin{array}{cccc} \Var{X(t_1)} & \Cov{X(t_1), X(t_2)} & \ldots & \Cov{X(t_1), X(t_n)}\\
\Cov{X(t_1), X(t_2)} & \Var{X(t_2)} & \ldots & \Cov{X(t_2), X(t_n)}\\
\vdots & \vdots & \ldots & \vdots\\
\Cov{X(t_1), X(t_n)} & \Cov{X(t_2), X(t_n)} & \ldots & \Var{X(t_n)}\end{array}\right]}^{-1} \left[\begin{array}{c} \Cov{S(t_0), X(t_1)}\\
\Cov{S(t_0), X(t_2)}\\ \vdots \\ \Cov{S(t_0), X(t_n)}\end{array}\right]
\end{eqnarray}\]
This should look familiar
- It’s almost the same as optimal linear prediction for time series
- It’s exactly the same as optimal linear prediction of one time series from another
Simple case, to build intuition
- Assume \(S\) is stationary, with \(\Expect{S(t)} = 0\), autocovariance function \(\gamma(h)\)
- Assume \(\SignalNoise\) is stationary, with variance \(\sigma^2\) and no autocorrelation or correlation with \(S\)
\[\begin{eqnarray}
\Cov{X(t), X(t+h)} & = & \Cov{S(t)+\SignalNoise(t), S(t+h)+\SignalNoise(t+h)}\\
& = & \Cov{S(t), S(t+h)} + \Cov{\SignalNoise(t), \SignalNoise(t+h)}\\
& & + \Cov{S(t), \SignalNoise(t+h)} + \Cov{S(t+h), \SignalNoise(t)}\\
& = & \gamma(h) + \sigma^2\mathbf{1}(h=0)\\
\Cov{S(t), X(t+h)} & = & \Cov{S(t), S(t+h) + \SignalNoise(t+h)}\\
& = & \Cov{S(t), S(t+h)} + \Cov{S(t), \SignalNoise(t+h)}\\
& = & \gamma(h)
\end{eqnarray}\]
(Last time I said it’s common to add an extra “nugget” of auto-covariance at lag 0 for the observations; that’s handling measurement noise)
One observation, one estimate
- Given: \(X(t)\)
- Desired: \(S(t)\)
\[\begin{eqnarray}
\hat{S}(t) & = & \beta X(t)\\
\beta & = & \frac{\Cov{S(t), X(t)}}{\Var{X(t)}}\\
&= & \frac{\gamma(0)}{\gamma(0) + \sigma^2} < 1\\
\hat{S}(t) & = & \frac{\gamma(0)}{\gamma(0) + \sigma^2} X(t)
\end{eqnarray}\]
Two observations, one estimate
- Given: \(X(t-1)\), \(X(t)\)
Desired: \(S(t)\)
Estimate is \[\begin{eqnarray}
\hat{S}(t) & = & \beta_0 X(t) + \beta_1 X(t-1)\\
\left[ \begin{array}{cc} \beta_0 \\ \beta_1 \end{array}\right]
& = & \left[\begin{array}{cc} \Var{X(t)} & \Cov{X(t), X(t-1)}\\ \Cov{X(t-1), X(t)} & \Var{X(t-1)} \end{array}\right]^{-1} \left[\begin{array}{c} \Cov{X(t), S(t)} \\ \Cov{X(t-1), S(t)}\end{array} \right]
\end{eqnarray}\]
Two observations, one estimate
\[\begin{eqnarray}
\left[ \begin{array}{cc} \beta_0 \\ \beta_1 \end{array}\right]
& = & \left[\begin{array}{cc} \Var{X(t)} & \Cov{X(t), X(t-1)}\\ \Cov{X(t-1), X(t)} & \Var{X(t-1)}\end{array}\right]^{-1} \left[\begin{array}{c} \Cov{X(t), S(t)} \\ \Cov{X(t-1), S(t)}\end{array} \right]\\
& = & \left[\begin{array}{cc} \gamma(0)+\sigma^2 & \gamma(1)\\ \gamma(1) & \gamma(0)+\sigma^2\end{array}\right]^{-1} \left[\begin{array}{c} \gamma(0) \\ \gamma(1) \end{array} \right]\\
&=& \frac{1}{(\gamma(0)+\sigma^2)^2-\gamma^2(1)}\left[\begin{array}{cc} \gamma(0)+\sigma^2 & -\gamma(1)\\ -\gamma(1) & \gamma(0)+\sigma^2\end{array}\right]\left[\begin{array}{c} \gamma(0) \\ \gamma(1) \end{array} \right]\\
& = & \frac{1}{(\gamma(0)+\sigma^2)^2-\gamma^2(1)}\left[\begin{array}{c} (\gamma(0)+\sigma^2)\gamma(0) - \gamma^2(1) \\ \gamma(1)\sigma^2\end{array}\right]
\end{eqnarray}\]
What’s going on here?
- Slope on \(X(t) < 1\) \(\Rightarrow\) Estimate of \(S(t)\) should not change 1-for-1 with \(X(t)\)
- Weak noise \(\Rightarrow\) pay more attention to the data
- Strong correlation in the signal \(\Rightarrow\) pay more attention to \(X(t-1)\) when estimating \(S(t)\)
- Reduces to the one observation, one signal case when \(\gamma(1) = 0\)
The general pattern: the Wiener filter
- Assume \(X(t) = S(t) + \SignalNoise(t)\)
- Assume \(\SignalNoise\) uncorrelated with \(S\)
- Then \(\Cov{X(t), X(t+h)} = \Cov{S(t), S(t+h)} + \Cov{\SignalNoise(t), \SignalNoise(t+h)}\)
- and \(\Cov{S(t), X(t+h)} = \Cov{S(t), S(t+h)}\)
- Calculate coefficients in the usual way
- Wiener filter = applying these coefficients to the \(X(t)\) series to get \(\hat{S}(t)\)
- Need to know, or guess, at the correlations of either the signal or the noise to do de-noising
- Do not need any direct observations of the signal
Estimating the periodic component
For simplicity, say we have \(X(0), X(1), \ldots X(n-1)\), \(n=kT\)
\[\begin{eqnarray}
\EstRegFunc(0) & = & \frac{1}{k}\sum_{i=0}^{k-1}{X(iT)}\\
\EstRegFunc(1) & = & \frac{1}{k}\sum_{i=0}^{k -1}{X(1+iT)}\\
\EstRegFunc(t) & = & \frac{1}{k}\sum_{i=0}^{k-1}{X(t+iT)}\\
& \vdots & \\
\EstRegFunc(T-1) & = & \frac{1}{k}\sum_{i=0}^{k-1}{X(T-1+iT)}
\end{eqnarray}\]
Estimating the periodic component
library(gstat)
data(wind)
dub.jan1 <- wind[wind$month == 1 & wind$day == 1, "DUB"]
mean(dub.jan1)
## [1] 12.67722
colMeans(wind[wind$month == 1 & wind$day == 1, -(1:3)])
## RPT VAL ROS KIL SHA BIR DUB CLA
## 14.610000 11.200556 12.931667 6.845556 10.798333 7.678333 12.677222 8.952778
## MUL CLO BEL MAL
## 10.057222 9.936667 13.987222 18.536667
Now repeat for every day of the calendar
Summary
- Extracting a random signal from noise looks just like any other linear prediction problem
- \(X(t) = S(t) + \SignalNoise(t)\), predict \(S(t)\) from \(X\)
- Need to make assumptions about the noise, \(\Cov{\SignalNoise(t), \SignalNoise(t+h)}\)
- Extracting periodic components by averaging
- \(X(t) = \TrueRegFunc(t) + \TrueNoise(t)\), with \(\TrueRegFunc\) deterministic
- Periodic/cyclic/seasonal component means \(\TrueRegFunc(t) = \TrueRegFunc(t+T)\)
- Estimate the periodic trend by averaging over periods
- Need to know period \(T\)
Trend + periodicity + fluctuations
\[\begin{eqnarray}
X(t) & = & m(t) + p(t) + \TrueNoise(t)\\
& = & \text{long-run trend} + \text{periodic component} + \text{random fluctuations}
\end{eqnarray}\]
- Smooth and remove the smooth trend
- Often: deliberately under-smooth
- Average over periods and remove the periodic component
- What’s left is \(\approx\) fluctuations
Backup: Trend + Period for CO2 Concentration
- Clearly some sort of repeating deviation from the long-run trend
Backup: Trend + Period for CO2 Concentration
- Extract a linear time trend
Backup: Trend + Period for CO2 Concentration
- Residuals from the linear trend = periodic component + fluctuations
- Notice that each month clearly has its own distribution
- Red line = smoothing spline applied to these data
- Trough in carbon dioxide at end of northern hemisphere summer, and peak at end of northern hemisphere winter
Backup: Trend + Period for CO2 Concentration
- Random fluctuations = residuals after removing the long-run trend and the periodic component
These are very structured-looking because the initial linear trend isn’t that good (there’s acceleration)
Iterative approach: use the initial estimate of the periodic component to get a better idea of the long-run trend
Backup: Trend + Period for CO2 Concentration
Backup: Trend + Period for CO2 Concentration
- We could now go back to re-estimate the annual cycle:
Backup: Trend + Period for CO2 Concentration
- Alternatively: if you know (or are willing to guess) a parametric form for the long-run trend, \(m(t) = f(t;\theta)\), you can introduce extra “dummy” (indicator) variables for each phase in the cycle (here, months), and use least squares to estimate \(\theta\) and the coefficients on those dummies
- This is going to be somewhat more efficient, both statistically and computationally, than the iterative procedure if you have a good idea of the form of the trend
- Here a reasonable guess at the long-run trend would be a quadratic, \(f(t;\theta) = \theta_0 + \theta_1 t + \theta_2 t^2\)
quad.trend.plus.cycle <- lm(co2ppm ~ date + I(as.numeric(date)^2) + factor(month),
data = mauna_loa_monthly)
signif(coefficients(quad.trend.plus.cycle), 2)
## (Intercept) date I(as.numeric(date)^2)
## 3.3e+02 3.0e-03 9.7e-08
## factor(month)2 factor(month)3 factor(month)4
## 6.3e-01 1.4e+00 2.5e+00
## factor(month)5 factor(month)6 factor(month)7
## 3.0e+00 2.2e+00 6.1e-01
## factor(month)8 factor(month)9 factor(month)10
## -1.5e+00 -3.2e+00 -3.3e+00
## factor(month)11 factor(month)12
## -2.1e+00 -9.5e-01
- Note: one fewer monthly dummy coefficient than the length of the cycle
- Always one fewer dummy coefficients than levels of the factor variable (do you remember why from linear regression?)
- Notice that the lowest (most negative) coefficients are for September and October, and the highest (most positive) are for April-May-June
Backup: Trend + Period for CO2 Concentration
- The fitted values will capture the combination of the trend and the periodic cycle:
Backup: Trend + Period for CO2 Concentration
- The residuals will be estimates of the fluctuations, which now are a lot more random looking:
Backup: Trend + Period for CO2 Concentration