Linear Prediction for Time Series
36-467/36-667
24 September 2020
\[
\newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]}
\newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]}
\newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}\left[ #1 \right]}
\newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]}
\newcommand{\TrueRegFunc}{\mu}
\newcommand{\EstRegFunc}{\widehat{\TrueRegFunc}}
\DeclareMathOperator{\tr}{tr}
\DeclareMathOperator*{\argmin}{argmin}
\DeclareMathOperator{\det}{det}
\newcommand{\TrueNoise}{\epsilon}
\newcommand{\EstNoise}{\widehat{\TrueNoise}}
\]
In our last episode
- General approach to optimal linear prediction
- Predict \(Y\) from \(\vec{Z} = [Z_1, Z_2, \ldots Z_p]\)
- Prediction is \(\alpha + \vec{\beta} \cdot \vec{Z}\)
- Best \(\alpha = \Expect{Y} - \vec{\beta} \cdot \Expect{\vec{Z}}\)
- Best \(\beta = \Var{\vec{Z}}^{-1} \Cov{\vec{Z}, Y}\)
- Prediction is also \(\Expect{Y} + \Var{\vec{Z}}^{-1} \Cov{\vec{Z}, Y} \cdot (\vec{Z} - \Expect{\vec{Z}})\)
- Today: application to time series
- What can this do for us?
- How do we find the covariances?
Optimal linear prediction for time series
- Given: \(X(t_1), X(t_2), \ldots X(t_n)\)
- Desired: prediction of \(X(t_0)\)
\[\begin{eqnarray}
m(t_0) & = & \alpha + \vec{\beta} \cdot \left[\begin{array}{c} X(t_1) \\ X(t_2) \\ \vdots \\ X(t_n) \end{array}\right]\\
\alpha & = & \Expect{X(t_0)} - \vec{\beta} \cdot \left[\begin{array}{c} \Expect{X(t_1)}\\ \Expect{X(t_2)} \\ \vdots \\ \Expect{X(t_n)}\end{array}\right] ~ \text{(goes away if everything's centered)}\\
\vec{\beta} & = & {\left[\begin{array}{cccc} \Var{X(t_1)} & \Cov{X(t_1), X(t_2)} & \ldots & \Cov{X(t_1), X(t_n)}\\
\Cov{X(t_1), X(t_2)} & \Var{X(t_2)} & \ldots & \Cov{X(t_2), X(t_n)}\\
\vdots & \vdots & \ldots & \vdots\\
\Cov{X(t_1), X(t_n)} & \Cov{X(t_2), X(t_n)} & \ldots & \Var{X(t_n)}\end{array}\right]}^{-1} \left[\begin{array}{c} \Cov{X(t_0), X(t_1)}\\
\Cov{X(t_0), X(t_2)}\\ \vdots \\ \Cov{X(t_0), X(t_n)}\end{array}\right]
\end{eqnarray}\]
Interpolation
- Time series often have gaps
- Instruments fail, people mess up, circumstances…
- What happened at the times in between the observations?
Back to Kyoto
- A lot of what we see in this plot is just made up
What we didn’t tell R to make up
When did the cherries flower in 1015?
- We need \(\Cov{X(1015), X(t_i)}\) for every year \(t_i\) where we have data
- We need \(\Expect{X(t_i)}\), \(\Var{X(t_i)}\) and \(\Cov{X(t_i), X(t_j)}\) ditto
- We need \(\Expect{X(1015)}\)
Getting the expectations and covariances
- We only see each \(X(t_i)\) once
- Maybe gives us an idea of \(\Expect{X(t_i)}\)
- but not \(\Var{X(t_i)}\) or \(\Cov{X(t_i), X(t_j)}\)
- let alone \(\Cov{X(t_0), X(t_i)}\)
- We could repeat the experiment many times
- We could make assumptions
Repeating the experiment
\[\begin{eqnarray}
\overline{x}(t) \equiv \frac{1}{n}\sum_{i=1}^{n}{x^{(i)}(t)} & \rightarrow & \Expect{X(t)}\\
\frac{1}{n}\sum_{i=1}^{n}{(x^{(i)}(t) - \overline{x}(t)) (x^{(i)}(s) - \overline{x}(s))} & \rightarrow & \Cov{X(t), X(s)}
\end{eqnarray}\]
Making assumptions
- Assume some covariances (and expectations) are equal
- Weaker: assume some of them are similar
Stationarity (assuming covariances are equal)
- A time series is weakly stationary when \(\Expect{X(t)} = \Expect{X(s)}\) (constant in time) and \(\Cov{X(t), X(s)} = \gamma(|t-s|)\)
- \(\gamma(h)\) is the autocovariance function at lag \(h\)
- this is also called “second-order” or “wide-sense” stationarity
- We’ll cover strong / full / strict stationarity later
- Weak stationarity implies all the expectations are the same
- Weak stationarity implies \(\Cov{X(t), X(t+h)} = \Cov{X(s), X(s+h)}\) so lots of covariances are the same
- Weak stationarity lets us pool information: \[\begin{eqnarray}
\overline{x} \equiv \frac{1}{n}\sum_{i=1}^{n}{X(t_i)} & \rightarrow & \Expect{X(0)}\\
\frac{1}{n}\sum_{i=1}^{n}{(x(t_i) - \overline{x})(x(t_i + h) - \overline{x})} & \rightarrow & \gamma(h)
\end{eqnarray}\]
The autocovariance function
- \(\gamma(0) = \Var{X(t)}\) (constant in \(t\))
- \(\rho(h) \equiv \frac{\gamma(h)}{\gamma(0)} =\) autocorrelation function
Assuming stationarity…
\[\begin{eqnarray}
\mathbf{v} & \equiv & \left[\begin{array}{ccc} \gamma(0) & \gamma(|t_1 - t_2|) & \ldots & \gamma(|t_1 - t_n|)\\
\gamma(|t_2-t_1|) & \gamma(0) & \ldots & \gamma(|t_2-t_n|)\\
\vdots & \vdots & \ldots & \vdots\\
\gamma(|t_n-t_1|) & \gamma(|t_n-t_2|) & \ldots & \gamma(0)\end{array}\right]\\
\mathbf{c} & = & \left[\begin{array}{c} \gamma(|t_0 - t_1|)\\ \gamma(|t_0-t_2|) & \vdots \\ \gamma(|t_0 -t_n|)\end{array} \right]\\
\vec{\beta} & = & \mathbf{v}^{-1}\mathbf{c}\\
m(t_0) & = & \overline{x} + \vec{\beta} \cdot \left[\begin{array}{c} X(t_1) -\overline{x}\\
X(t_2) - \overline{x} \\ \vdots \\ X(t_n) -\overline{x}\end{array}\right] = \overline{x} + \mathbf{c}^{T}\mathbf{v}^{-1} \left[\begin{array}{c} X(t_1) -\overline{x}\\
X(t_2) - \overline{x} \\ \vdots \\ X(t_n) -\overline{x}\end{array}\right]\\
\Var{X(t_0) - m(t_0)} & = & \gamma(0) - \mathbf{c}^T\mathbf{v}^{-1} \mathbf{c}
\end{eqnarray}\]
In R
acf(x, lag.max, type, plot, na.action, ...)
x
= a time series (e.g., a vector) or a data frame of multiple time series
lag.max
= maximum value of lag \(h\)
- Calculates \(\gamma(0), \gamma(1), \ldots \gamma(h)\) and stops there
type
= correlation (default) or covariance?
plot
= make a plot? (default)
na.action
= how to handle NAs? default is give up, na.pass
will use complete pairs
In R
kyoto.acf <- acf(kyoto$Flowering.DOY, lag.max = 100, type = "covariance", na.action = na.pass)
In R
kyoto.acf[0:5]
##
## Autocovariances of series 'kyoto$Flowering.DOY', by lag
##
## 0 1 2 3 4 5
## 41.54 8.71 8.30 8.46 6.69 6.78
kyoto.acf[0:5]$acf
## , , 1
##
## [,1]
## [1,] 41.540228
## [2,] 8.706940
## [3,] 8.304284
## [4,] 8.464531
## [5,] 6.692514
## [6,] 6.780411
When was the flowering in 1015?
available.years <- with(na.omit(kyoto), Year.AD[Year.AD > 1015 - 49 & Year.AD < 1015 +
49])
historical.mean <- mean(kyoto$Flowering.DOY, na.rm = TRUE)
CovYZ <- matrix(kyoto.acf[abs(available.years - 1015)]$acf, ncol = 1)
year.lags <- outer(available.years, available.years, "-")
year.lags <- abs(year.lags)
VarZ <- kyoto.acf[year.lags]$acf
VarZ <- matrix(VarZ, ncol = length(available.years), byrow = FALSE)
Z <- with(kyoto, Flowering.DOY[Year.AD %in% available.years])
(fitted.value <- historical.mean + (Z - historical.mean) %*% solve(VarZ) %*% CovYZ)
## [,1]
## [1,] 106.2388
(fitted.value.se <- sqrt(as.matrix(kyoto.acf[0]$acf) - t(CovYZ) %*% solve(VarZ) %*%
CovYZ))
## [,1]
## [1,] 5.87276
When was the flowering in 1015?
This is a lot of work…
…and we’d need to re-do most of it for every other year
… so we should write a function
kyoto.prediction <- function(times) {
historical.mean <- mean(kyoto$Flowering.DOY, na.rm = TRUE)
historical.variance <- as.matrix(kyoto.acf[0]$acf)
fits <- matrix(0, nrow = length(times), ncol = 3)
colnames(fits) <- c("time", "fit", "se")
fits[, "time"] <- times
rownames(fits) <- paste(times)
for (t in times) {
available.years <- with(na.omit(kyoto), Year.AD[Year.AD > t - 49 & Year.AD <
t + 49])
available.years <- setdiff(available.years, t)
CovYZ <- matrix(kyoto.acf[abs(available.years - t)]$acf, ncol = 1)
year.lags <- outer(available.years, available.years, "-")
year.lags <- abs(year.lags)
VarZ <- kyoto.acf[year.lags]$acf
VarZ <- matrix(VarZ, ncol = length(available.years), byrow = FALSE)
Z <- with(kyoto, Flowering.DOY[Year.AD %in% available.years])
fits[paste(t), "fit"] <- historical.mean + (Z - historical.mean) %*% solve(VarZ) %*%
CovYZ
fits[paste(t), "se"] <- sqrt(max(0, historical.variance - t(CovYZ) %*% solve(VarZ) %*%
CovYZ))
}
return(data.frame(fits))
}
Finally, some interpolation: close up
Finally, some interpolation: zoom out
Stationarity and trend-plus-fluctuation
- We can always say \(X(t) = \TrueRegFunc(t) + \TrueNoise(t)\), with \(\Expect{X(t)} =\TrueRegFunc(t)\) and \(\Expect{\TrueNoise(t)} = 0\)
- If \(X\) is (weakly) stationary, then
- \(\TrueRegFunc(t) = \mu(0)\), a constant
- \(\Cov{\TrueNoise(t), \TrueNoise(s)} = \Cov{X(t), X(s)} = \gamma(|t-s|)\)
- Assuming a flat trend is pretty strong…
Stationary fluctuations around a trend
- If \(\TrueNoise\) is weakly stationary, then \(X - \TrueRegFunc\) is weakly stationary
- so \(X - \EstRegFunc\) should be (approximately) weakly stationary
- Recipe:
- Use smoothing to estimate \(\Expect{X(t)}\) as \(\EstRegFunc(t)\)
- Find ACF from \(\EstNoise(t) = X(t) - \EstRegFunc(t)\)
- Warning: Yule-Slutsky says there’s bound to be some correlations in \(\EstNoise\)…
- Now get the coefficients and predict
Detrending the cherry blossoms with a spline
kyoto.spline <- with(na.omit(kyoto), smooth.spline(x = Year.AD, y = Flowering.DOY))
residuals.with.NAs <- rep(NA, times = nrow(kyoto))
residuals.with.NAs[!is.na(kyoto$Flowering.DOY)] <- na.omit(kyoto$Flowering.DOY) -
kyoto.spline$y
kyoto$fluctuation <- residuals.with.NAs
Detrending the cherry blossoms with a spline
Detrending the cherry blossoms with a spline
Detrending the cherry blossoms with a spline
Covariance estimation
acf
uses the sample covariance
- Estimates covariance at each lag separately
- Sometimes inefficient
- Covariances might have known form, e.g., \(\gamma(h) = \gamma(0)e^{-h/\tau}\)
- We’ll look more at that next time, in the spatial context
- Even if we don’t believe something like that, we might believe that \(\gamma(h)\) should be close to \(\gamma(h-1)\) and \(\gamma(h+1)\) \(\Rightarrow\) use smoothing
Checking stationarity
- Divide the data into intervals
- Re-estimate ACF on each interval
- ACF shouldn’t change much
Checking stationarity: raw data
Checking stationarity: after spline detrending
Checking stationarity
- Divide the data into intervals
- Re-estimate ACF on each interval
- Shouldn’t change much
- How much difference is too much? (Will answer later with simulation)
- Some inherent pitfalls:
- Very slight non-stationarity looks pretty stationary
- Stationary but with \(\gamma(h) \rightarrow 0\) as \(h\rightarrow\infty\) very slowly looks pretty non-stationary
- “Long-range correlations”, “slowly-decaying correlations”
- Trends show up as slowly-decaying correlations
Multiple time series
- \(X\) and \(U\) are jointly weakly stationary when
- \(X\) is weakly stationary, with ACF \(\gamma_X\)
- \(U\) is weakly stationary, with ACF \(\gamma_U\)
- \(\Cov{X(t), U(s)} = \gamma_{UX}(|t-s|)\)
ccf
in R will calculate the cross-covariance (or cross-correlation) function
- So will
acf
if it’s given a matrix or data frame
- Plug and chug: if predicting \(X\) from \(U\)
- \(\gamma_U\) goes into the variance matrix \(\mathbf{v}\)
- \(\gamma_{UX}\) goes into the covariance matrix \(\mathbf{c}\)
Where did all this come from?
- Norbert Wiener (1894–1964)
- One of the great mathematicians of the 20th century
- If you think you’ve got issues with parents pushing you to succeed in school, read Wiener (1953)
- Worked very closely with engineers
- His ideas are why we talk about “feedback”, “information”, “cyber-”, etc. (Wiener 1961, 1954)
- (Photo from Kline (2017))
Wiener’s research in 1942
- Anti-aircraft fire control
- Aim at where the target aircraft will be
- The target moves erratically
- \(\Rightarrow\) Need to extrapolate a random process
- Wiener’s solution: basically what we’ve just done
- Plus the continuous-time version
- Plus implementation using 1940s electronics
- Declassified after the war as Wiener (1949)
- Parallel work by Kolmogorov (1941)
- Wiener and Kolmogorov’s big idea was to realize that optimal linear prediction didn’t need the usual regression assumptions (Gaussian noise, independent variables, etc.)
Summing up
- Optimal linear prediction needs to know the trend and the autocovariance function
- Estimating them from one time series needs assumptions
- Remove trend
- Find ACF from residuals / estimated fluctuations
- Once we have the ACF, finding coefficients is just linear algebra
Backup: Not inverting the variance matrix
- \(\vec{\beta} = \Var{\vec{Z}}^{-1} \Cov{\vec{Z}, Y}\)
- But inverting an \(n\times n\) matrix takes \(O(n^3)\) operations
- Then matrix multiplying \(\Var{\vec{Z}}^{-1}\) by \(\Cov{\vec{Z}, Y}\) is \(O(n^2)\), and multiplying \(\vec{\beta}\) by \(Z\) is \(O(n)\), and we need to do everything \(n\) times so over-all time complexity of getting \(n\) fits is \(n(O(n^3) + O(n^2) + O(n)) = O(n^4)\)
- Strictly speaking we don’t need to invert \(\Var{\vec{Z}}\); we just need to find the \(\vec{\beta}\) which solves the equation \(\Var{\vec{Z}} \vec{\beta} = \Cov{\vec{Z}, Y}\)
- There are algorithms for solving a linear system of equations like this which take only \(O(n^2)\) time
- So getting all \(n\) predictions is just \(O(n^3)\)
- May not sound impressive but if \(n=1000\) you’ll be very glad of that factor of \(n\)
- Details of how to solve without inverting are left to numerical linear algebra texts
- Programming exercise: re-do my code for the optimal linear predictor so it doesn’t invert the variance matrix but does
solve()
for the optimal coefficients
Backup: How good is the optimal linear predictor?
all.fits$std.residuals <- (kyoto$Flowering.DOY - all.fits$fit)/all.fits$se
mean(all.fits$std.residuals, na.rm = TRUE)
## [1] -0.008035619
sd(all.fits$std.residuals, na.rm = TRUE)
## [1] 1.007411
Ideally: mean 0, standard deviation 1
Backup: How good is the optimal linear predictor?
This is a pretty good random scatter of points
Backup: fancier covariance estimation
- Two (equivalent) definitions of the covariance: \[
\Cov{X(t), X(s)} = \Expect{X(t) X(s)} - \Expect{X(t)}\Expect{X(s)} = \Expect{(X(t) - \Expect{X(t)}) (X(s) - \Expect{X(s)})}
\]
- Use the 2nd form: define \[
\Gamma(t,s) \equiv (X(t) - \Expect{X(t)})(X(s) - \Expect{X(s)})
\] so \(\Cov{X(t), X(s)} = \Expect{\Gamma(t,s)}\)
- Now assume constant mean so we can estimate \[
\widehat{\Gamma}(t,s) = (X(t) - \overline{x}) (X(s) - \overline{x})
\]
- Now assume stationarity so \(\Cov{X(t), X(s)} = \Expect{\Gamma(t,s)} = \gamma(|t-s|)\)
- Finally assume \(\gamma(h)\) changes slowly in \(h\)
- We can estimate \(\gamma(h)\) by finding pairs \(t, s\) with \(|t-s| \approx h\) and averaging \(\widehat{\Gamma}(t,s)\)
- Or kernel smoothing or spline smoothing or…
Backup: stationary linear predictor vs. spline
References
Kline, Ronald R. 2017. The Cybernetics Moment: Or Why We Call Our Age the Information Age. Baltimore, Maryland: Johns Hopkins University Press.
Wiener, Norbert. 1949. Extrapolation, Interpolation, and Smoothing of Stationary Time Series: With Engineering Applications. Cambridge, Massachusetts: The Technology Press of the Massachusetts Institute of Technology.
———. 1953. Ex-Prodigy: My Childhood and Youth. New York: Simon; Schuster.
———. 1954. The Human Use of Human Beings: Cybernetics and Society. 2nd ed. Garden City, New York: Doubleday.
———. 1961. Cybernetics: Or, Control and Communication in the Animal and the Machine. 2nd ed. Cambridge, Massachusetts: MIT Press.