Linear Methods for Noise Removal and Filtering
  
36-467/36-667
  
  27 September 2018
 
\[
\newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]}
\newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]}
\newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}\left[ #1 \right]}
\newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]}
\newcommand{\TrueRegFunc}{\mu}
\newcommand{\EstRegFunc}{\widehat{\TrueRegFunc}}
\DeclareMathOperator{\tr}{tr}
\DeclareMathOperator*{\argmin}{argmin}
\DeclareMathOperator{\det}{det}
\newcommand{\TrueNoise}{\epsilon}
\newcommand{\EstNoise}{\widehat{\TrueNoise}}
\newcommand{\SignalNoise}{N}
\]
 
In our previous episodes
- General approach to optimal linear prediction
- Predict \(Y\) from \(\vec{Z} = [Z_1, Z_2, \ldots Z_p]\)
- Prediction is \(\alpha + \vec{\beta} \cdot \vec{Z}\)
- Best \(\alpha = \Expect{Y} - \vec{\beta} \cdot \Expect{\vec{Z}}\)
- Best \(\beta = \Var{\vec{Z}}^{-1} \Cov{\vec{Z}, Y}\)
 
- Previously:
- Interpolating and extrapolating time series
- Interpolating and extrapolating random fields
 
- Today:
- Extracting a random signal from random noise
- Extracting a deterministic, periodic trend from noise
 
 
Optimal linear filtering for time series
- Given: \(X(t_1), X(t_2), \ldots X(t_n)\)
- Assumption: \(X(t) = S(t) + \SignalNoise(t)\), with \(\Expect{\SignalNoise(t)} = 0\)
- Signal \(S(t)\) plus noise \(\SignalNoise(t)\)
- NOTE: Assuming signal \(S\) is also a random process!
 
- Desired: prediction of \(S(t_0)\)
 
Optimal linear filtering for time series
\[\begin{eqnarray}
\hat{S}(t_0) & = & \alpha + \vec{\beta} \cdot \left[\begin{array}{c} X(t_1) \\ X(t_2) \\ \vdots \\ X(t_n) \end{array}\right]\\
\alpha & = & \Expect{S(t_0)} - \vec{\beta} \cdot \left[\begin{array}{c} \Expect{X(t_1)}\\ \Expect{X(t_2)} \\ \vdots \\ \Expect{X(t_n)}\end{array}\right] ~ \text{(goes away if everything's centered)}\\
\vec{\beta} & = &  {\left[\begin{array}{cccc} \Var{X(t_1)} & \Cov{X(t_1), X(t_2)} & \ldots & \Cov{X(t_1), X(t_n)}\\
\Cov{X(t_1), X(t_2)} & \Var{X(t_2)} & \ldots & \Cov{X(t_2), X(t_n)}\\
\vdots & \vdots & \ldots & \vdots\\
\Cov{X(t_1), X(t_n)} & \Cov{X(t_2), X(t_n)} & \ldots & \Var{X(t_n)}\end{array}\right]}^{-1} \left[\begin{array}{c} \Cov{S(t_0), X(t_1)}\\
\Cov{S(t_0), X(t_2)}\\ \vdots \\ \Cov{S(t_0), X(t_n)}\end{array}\right]
\end{eqnarray}\]
This should look familiar
- It’s almost the same as optimal linear prediction for time series
- It’s exactly the same as optimal linear prediction of one time series from another
 
Simple case, to build intuition
- Assume \(S\) is stationary, with \(\Expect{S(t)} = 0\), autocovariance function \(\gamma(h)\)
- Assume \(\SignalNoise\) is stationary, with variance \(\sigma^2\) and no autocorrelation or correlation with \(S\)
\[\begin{eqnarray}
\Cov{X(t), X(t+h)} & = & \Cov{S(t)+\SignalNoise(t), S(t+h)+\SignalNoise(t+h)}\\
& = & \Cov{S(t), S(t+h)} + \Cov{\SignalNoise(t), \SignalNoise(t+h)}\\
& & + \Cov{S(t), \SignalNoise(t+h)} + \Cov{S(t+h), \SignalNoise(t)}\\
& = & \gamma(h) + \sigma^2\mathbf{1}(h=0)\\
\Cov{S(t), X(t+h)} & = & \Cov{S(t), S(t+h) + \SignalNoise(t+h)}\\
& = & \Cov{S(t), S(t+h)} + \Cov{S(t), \SignalNoise(t+h)}\\
& = & \gamma(h)
\end{eqnarray}\] 
One observation, one estimate
- Given: \(X(t)\)
- Desired: \(S(t)\)
\[\begin{eqnarray}
\hat{S}(t) & = & \beta X(t)\\
\beta & = & \frac{\Cov{S(t), X(t)}}{\Var{X(t)}}\\
&= & \frac{\gamma(0)}{\gamma(0) + \sigma^2} < 1\\
\hat{S}(t) & = & \frac{\gamma(0)}{\gamma(0) + \sigma^2} X(t)
\end{eqnarray}\] 
Two observations, one estimate
- Given: \(X(t-1)\), \(X(t)\)
- Desired: \(S(t)\) 
- Estimate is \[\begin{eqnarray}
\hat{S}(t) & = & \beta_0 X(t) + \beta_1 X(t-1)\\
\left[ \begin{array}{cc} \beta_0 \\ \beta_1 \end{array}\right]
& = &  \left[\begin{array}{cc} \Var{X(t)} & \Cov{X(t), X(t-1)}\\ \Cov{X(t-1), X(t)} & \Var{X(t-1)} \end{array}\right]^{-1} \left[\begin{array}{c} \Cov{X(t), S(t)} \\ \Cov{X(t-1), S(t)}\end{array} \right]
\end{eqnarray}\] 
EXERCISE: Find \(\beta_0, \beta_1\) in terms of the function \(\gamma\) and \(\sigma^2\)
Hint 1: \(\left[\begin{array}{cc} a & b \\ c & d\end{array}\right]^{-1} = \frac{1}{ad - bc}\left[\begin{array}{cc} d & -b \\ -c & a\end{array}\right]\)
Hint 2: \(\Cov{X(t), X(t+h)} = \gamma(h) + \sigma^2\mathbf{1}(h=0)\) and \(\Cov{S(t), X(t+h)} = \gamma(h)\)
 
Two observations, one estimate
\[\begin{eqnarray}
\left[ \begin{array}{cc} \beta_0 \\ \beta_1 \end{array}\right]
& = & \left[\begin{array}{cc} \Var{X(t)} & \Cov{X(t), X(t-1)}\\ \Cov{X(t-1), X(t)} & \Var{X(t-1)}\end{array}\right]^{-1} \left[\begin{array}{c} \Cov{X(t), S(t)} \\ \Cov{X(t-1), S(t)}\end{array} \right]\\
& = & \left[\begin{array}{cc} \gamma(0)+\sigma^2 & \gamma(1)\\ \gamma(1) & \gamma(0)+\sigma^2\end{array}\right]^{-1} \left[\begin{array}{c} \gamma(0) \\ \gamma(1) \end{array} \right]\\
&=& \frac{1}{(\gamma(0)+\sigma^2)^2-\gamma^2(1)}\left[\begin{array}{cc} \gamma(0)+\sigma^2 & -\gamma(1)\\ -\gamma(1) & \gamma(0)+\sigma^2\end{array}\right]\left[\begin{array}{c} \gamma(0) \\ \gamma(1) \end{array} \right]\\
& = & \frac{1}{(\gamma(0)+\sigma^2)^2-\gamma^2(1)}\left[\begin{array}{c} (\gamma(0)+\sigma^2)\gamma(0) - \gamma^2(1) \\  \gamma(1)\sigma^2\end{array}\right]
\end{eqnarray}\]
What’s going on here?
- Slope on \(X(t) < 1\) \(\Rightarrow\) Estimate of \(S(t)\) should not change 1-for-1 with \(X(t)\)
- Weak noise \(\Rightarrow\) pay more attention to the data
- Strong correlation in the signal \(\Rightarrow\) pay more attention to \(X(t-1)\) when estimating \(S(t)\)
- Reduces to the one observation, one signal case when \(\gamma(1) = 0\)
 
 
The general pattern: the Wiener filter
- Assume \(X(t) = S(t) + \SignalNoise(t)\)
- Assume \(\SignalNoise\) uncorrelated with \(S\)
- Then \(\Cov{X(t), X(t+h)} = \Cov{S(t), S(t+h)} + \Cov{\SignalNoise(t), \SignalNoise(t+h)}\)
- and \(\Cov{S(t), X(t+h)} = \Cov{S(t), S(t+h)}\)
- Calculate coefficients in the usual way
- Wiener filter = applying these coefficients to the same series to get \(\hat{S}(t)\)
- Need to know, or guess, at the correlations of either the signal or the noise to do de-noising
- Do not need any direct observations of the signal
 
 
Estimating the periodic component
For simplicity, say we have \(X(0), X(1), \ldots X(n)\), \(n=kT\)
\[\begin{eqnarray}
\EstRegFunc(0) & = & \frac{1}{k-1}\sum_{i=0}^{k-1}{X(iT)}\\
\EstRegFunc(1) & = & \frac{1}{k-1}\sum_{i=0}^{k -1}{X(1+iT)}\\
\EstRegFunc(t) & = & \frac{1}{k-1}\sum_{i=0}^{k-1}{X(t+iT)}\\
& \vdots & \\
\EstRegFunc(T-1) & = & \frac{1}{k-1}\sum_{i=0}^{k-1}{X(T-1+iT)}
\end{eqnarray}\]
 
Estimating the periodic component
library(gstat)
data(wind)
dub.jan1 <- wind[wind$month == 1 & wind$day == 1, "DUB"]
mean(dub.jan1)
## [1] 12.67722
colMeans(wind[wind$month == 1 & wind$day == 1, -(1:3)])
##       RPT       VAL       ROS       KIL       SHA       BIR       DUB 
## 14.610000 11.200556 12.931667  6.845556 10.798333  7.678333 12.677222 
##       CLA       MUL       CLO       BEL       MAL 
##  8.952778 10.057222  9.936667 13.987222 18.536667
Now repeat for every day of the calendar
 
Summary
- Extracting a random signal from noise looks just like any other linear prediction problem
- \(X(t) = S(t) + \SignalNoise(t)\), predict \(S(t)\) from \(X\)
- Need to make assumptions about the noise, \(\Cov{\SignalNoise(t), \SignalNoise(t+h)}\)
 
- Extracting periodic components by averaging
- \(X(t) = \TrueRegFunc(t) + \TrueNoise(t)\), with \(\TrueRegFunc\) deterministic
- Periodic/cyclic/seasonal component means \(\TrueRegFunc(t) = \TrueRegFunc(t+T)\)
- Estimate the periodic trend by averaging over periods
- Need to know period \(T\)
 
 
- Next week: how to analyze any process into a sum of periodic components with multiple periods
- Helps with identifying \(T\)
- Helps with covariance estimation
- Helps with any sort of linear filtering
- Changes how you see the world
 
 
Trend + periodicity + fluctuations
\[\begin{eqnarray}
X(t) & = & m(t) + p(t) + \TrueNoise(t)\\
& = & \text{long-run trend} + \text{periodic component} + \text{random fluctuations}
\end{eqnarray}\]
- Smooth and remove the smooth trend
- Average over periods and remove the periodic component
- What’s left is \(\approx\) fluctuations