Linear Prediction for Time Series

36-467/36-667

20 September 2018

\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\TrueRegFunc}{\mu} \newcommand{\EstRegFunc}{\widehat{\TrueRegFunc}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator{\det}{det} \newcommand{\TrueNoise}{\epsilon} \newcommand{\EstNoise}{\widehat{\TrueNoise}} \]

In our last episode

Optimal linear prediction for time series

\[\begin{eqnarray} \EstRegFunc(t_0) & = & \alpha + \vec{\beta} \cdot \left[\begin{array}{c} X(t_1) \\ X(t_2) \\ \vdots \\ X(t_n) \end{array}\right]\\ \alpha & = & \Expect{X(t_0)} - \vec{\beta} \cdot \left[\begin{array}{c} \Expect{X(t_1)}\\ \Expect{X(t_2)} \\ \vdots \\ \Expect{X(t_n)}\end{array}\right] ~ \text{(goes away if everything's centered)}\\ \vec{\beta} & = & {\left[\begin{array}{cccc} \Var{X(t_1)} & \Cov{X(t_1), X(t_2)} & \ldots & \Cov{X(t_1), X(t_n)}\\ \Cov{X(t_1), X(t_2)} & \Var{X(t_2)} & \ldots & \Cov{X(t_2), X(t_n)}\\ \vdots & \vdots & \ldots & \vdots\\ \Cov{X(t_1), X(t_n)} & \Cov{X(t_2), X(t_n)} & \ldots & \Var{X(t_n)}\end{array}\right]}^{-1} \left[\begin{array}{c} \Cov{X(t_0), X(t_1)}\\ \Cov{X(t_0), X(t_2)}\\ \vdots \\ \Cov{X(t_0), X(t_n)}\end{array}\right] \end{eqnarray}\]

Interpolation

Back to Kyoto

What we didn’t tell R to make up

When did the cherries flower in 1015?

Similarly for extrapolation

Getting the expectations and covariances

Repeating the experiment

\[\begin{eqnarray} \overline{x}(t) \equiv \frac{1}{n}\sum_{i=1}^{n}{x^{(i)}(t)} & \rightarrow & \Expect{X(t)}\\ \frac{1}{n}\sum_{i=1}^{n}{(x^{(i)}(t) - \overline{x}(t)) (x^{(i)}(s) - \overline{x}(s))} & \rightarrow & \Cov{X(t), X(s)} \end{eqnarray}\]

Making assumptions

Stationarity

The autocovariance function

Assuming stationarity…

\[\begin{eqnarray} \EstRegFunc(t_0) & = & \overline{x} + \left[\begin{array}{c} X(t_1) -\overline{x}\\ X(t_2) - \overline{x} \\ \vdots \\ X(t_n) -\overline{x}\end{array}\right] {\left[\begin{array}{ccc} \gamma(0) & \gamma(|t_1 - t_2|) & \ldots & \gamma(|t_1 - t_n|)\\ \gamma(|t_2-t_1|) & \gamma(0) & \ldots & \gamma(|t_2-t_n|)\\ \vdots & \vdots & \ldots & \vdots\\ \gamma(|t_n-t_1|) & \gamma(|t_n-t_2|) & \ldots & \gamma(0)\end{array}\right]}^{-1} \left[\begin{array}{c} \gamma(|t_0 - t_1|)\\ \gamma(|t_0-t_2|)\\ \vdots\\ \gamma(|t_0 -t_n|)\end{array}\right]\\ & = & \overline{x} + \left[\begin{array}{c} X(t_1) -\overline{x}\\ X(t_2) - \overline{x} \\ \vdots \\ X(t_n) -\overline{x}\end{array}\right] \mathbf{v}^{-1} \mathbf{c}\\ \Var{X(t_0) - \EstRegFunc(t_0)} & = & \gamma(0) - \mathbf{c}^T \mathbf{v}^{-1} \mathbf{c} \end{eqnarray}\]

In-class exercise

\[\begin{eqnarray} \EstRegFunc(t_0) & = & \overline{x} + \left[\begin{array}{c} X(t_1) -\overline{x}\\ X(t_2) - \overline{x} \\ \vdots \\ X(t_n) -\overline{x}\end{array}\right] {\left[\begin{array}{ccc} \gamma(0) & \gamma(|t_1 - t_2|) & \ldots & \gamma(|t_1 - t_n|)\\ \gamma(|t_2-t_1|) & \gamma(0) & \ldots & \gamma(|t_2-t_n|)\\ \vdots & \vdots & \ldots & \vdots\\ \gamma(|t_n-t_1|) & \gamma(|t_n-t_2|) & \ldots & \gamma(0)\end{array}\right]}^{-1} \left[\begin{array}{c} \gamma(|t_0 - t_1|)\\ \gamma(|t_0-t_2|)\\ \vdots\\ \gamma(|t_0 -t_n|)\end{array}\right]\\ \gamma(h) & = & \gamma(0) \rho(h) \end{eqnarray}\]

Re-write the predictor in terms of \(\rho\) — do we need \(\gamma(0)\)?

In-class exercise — solution

\[\begin{eqnarray} \EstRegFunc(t_0) & = & \overline{x} + \left[\begin{array}{c} X(t_1) -\overline{x}\\ X(t_2) - \overline{x} \\ \vdots \\ X(t_n) -\overline{x}\end{array}\right] {\left[\begin{array}{ccc} \gamma(0) & \gamma(|t_1 - t_2|) & \ldots & \gamma(|t_1 - t_n|)\\ \gamma(|t_2-t_1|) & \gamma(0) & \ldots & \gamma(|t_2-t_n|)\\ \vdots & \vdots & \ldots & \vdots\\ \gamma(|t_n-t_1|) & \gamma(|t_n-t_2|) & \ldots & \gamma(0)\end{array}\right]}^{-1} \left[\begin{array}{c} \gamma(|t_0 - t_1|)\\ \gamma(|t_0-t_2|)\\ \vdots\\ \gamma(|t_0 -t_n|)\end{array}\right]\\ \gamma(h) & = & \gamma(0) \rho(h)\\ \EstRegFunc(t_0) & = & \overline{x} + \left[\begin{array}{c} X(t_1) -\overline{x}\\ X(t_2) - \overline{x} \\ \vdots \\ X(t_n) -\overline{x}\end{array}\right] {\left[\begin{array}{ccc} 1 & \rho(|t_1 - t_2|) & \ldots & \rho(|t_1 - t_n|)\\ \rho(|t_2-t_1|) & 1 & \ldots & \rho(|t_2-t_n|)\\ \vdots & \vdots & \ldots & \vdots\\ \rho(|t_n-t_1|) & \rho(|t_n-t_2|) & \ldots & 1\end{array}\right]}^{-1} \frac{1}{\gamma(0)} \gamma(0) \left[\begin{array}{c} \rho(|t_0 - t_1|)\\ \rho(|t_0-t_2|)\\ \vdots\\ \rho(|t_0 -t_n|)\end{array}\right]\\ \end{eqnarray}\]

In R

acf(x, lag.max, type, plot, na.action, ...)

In R

kyoto.acf <- acf(kyoto$Flowering.DOY, lag.max = 100, type = "covariance", na.action = na.pass)

In R

kyoto.acf[0:5]
## 
## Autocovariances of series 'kyoto$Flowering.DOY', by lag
## 
##     0     1     2     3     4     5 
## 41.00  7.96  7.47  7.74  5.96  6.07
kyoto.acf[0:5]$acf
## , , 1
## 
##           [,1]
## [1,] 41.000475
## [2,]  7.963781
## [3,]  7.470460
## [4,]  7.744336
## [5,]  5.960905
## [6,]  6.070874

When was the flowering in 1015?

available.years <- with(na.omit(kyoto), Year.AD[Year.AD > 1015 - 49 & Year.AD < 
    1015 + 49])
historical.mean <- mean(kyoto$Flowering.DOY, na.rm = TRUE)
CovYZ <- matrix(kyoto.acf[abs(available.years - 1015)]$acf, ncol = 1)
year.lags <- outer(available.years, available.years, "-")
year.lags <- abs(year.lags)
VarZ <- kyoto.acf[year.lags]$acf
VarZ <- matrix(VarZ, ncol = length(available.years), byrow = FALSE)
Z <- with(kyoto, Flowering.DOY[Year.AD %in% available.years])
(fitted.value <- historical.mean + (Z - historical.mean) %*% solve(VarZ) %*% 
    CovYZ)
##          [,1]
## [1,] 106.1969
(fitted.value.se <- sqrt(as.matrix(kyoto.acf[0]$acf) - t(CovYZ) %*% solve(VarZ) %*% 
    CovYZ))
##          [,1]
## [1,] 5.886278

When was the flowering in 1015?

This is a lot of work…

kyoto.prediction <- function(times) {
    historical.mean <- mean(kyoto$Flowering.DOY, na.rm = TRUE)
    historical.variance <- as.matrix(kyoto.acf[0]$acf)
    fits <- matrix(0, nrow = length(times), ncol = 2)
    colnames(fits) <- c("fit", "se")
    rownames(fits) <- paste(times)
    for (t in times) {
        available.years <- with(na.omit(kyoto), Year.AD[Year.AD > t - 49 & Year.AD < 
            t + 49])
        available.years <- setdiff(available.years, t)
        CovYZ <- matrix(kyoto.acf[abs(available.years - t)]$acf, ncol = 1)
        year.lags <- outer(available.years, available.years, "-")
        year.lags <- abs(year.lags)
        VarZ <- kyoto.acf[year.lags]$acf
        VarZ <- matrix(VarZ, ncol = length(available.years), byrow = FALSE)
        Z <- with(kyoto, Flowering.DOY[Year.AD %in% available.years])
        fits[paste(t), "fit"] <- historical.mean + (Z - historical.mean) %*% 
            solve(VarZ) %*% CovYZ
        fits[paste(t), "se"] <- sqrt(max(0, historical.variance - t(CovYZ) %*% 
            solve(VarZ) %*% CovYZ))
    }
    return(data.frame(fits))
}

Finally, some interpolation

plot(Flowering.DOY ~ Year.AD, data = kyoto, type = "p", pch = 16, cex = 0.5, 
    ylab = "Day in year of full flowering", xlab = "Year (AD)", xlim = c(1000, 
        1050))
all.fits <- kyoto.prediction(801:2015)
lines(801:2015, all.fits$fit, col = "blue")
lines(801:2015, all.fits$fit - all.fits$se, col = "blue", lty = "dashed")
lines(801:2015, all.fits$fit + all.fits$se, col = "blue", lty = "dashed")

Finally, some interpolation

Some extrapolation

plot(Flowering.DOY ~ Year.AD, data = kyoto, type = "p", pch = 16, cex = 0.5, 
    xlim = c(1940, 2020))
new.fits <- kyoto.prediction(1940:2020)
lines(1940:2020, new.fits$fit, col = "blue")
lines(1940:2020, new.fits$fit - new.fits$se, col = "blue", lty = "dashed")
lines(1940:2020, new.fits$fit + new.fits$se, col = "blue", lty = "dashed")

Stationarity and trend-plus-fluctuation

Stationary fluctuations around a non-stationary trend

Detrending the cherry blossoms with a spline

kyoto.spline <- with(na.omit(kyoto), smooth.spline(x = Year.AD, y = Flowering.DOY))
residuals.with.NAs <- rep(NA, times = nrow(kyoto))
residuals.with.NAs[!is.na(kyoto$Flowering.DOY)] <- na.omit(kyoto$Flowering.DOY) - 
    kyoto.spline$y
kyoto$fluctuation <- residuals.with.NAs

Detrending the cherry blossoms with a spline

Detrending the cherry blossoms with a spline

Detrending the cherry blossoms with a spline

Covariance estimation

Checking stationarity

Multiple time series

Where did all this come from?

Wiener’s research in 1942

Summing up

Backup: Not inverting the variance matrix

Backup: How good is the prediction standard error?

all.fits$std.residuals <- (kyoto$Flowering.DOY - all.fits$fit)/all.fits$se
mean(all.fits$std.residuals, na.rm = TRUE)
## [1] -0.008311344
sd(all.fits$std.residuals, na.rm = TRUE)
## [1] 1.00777

Ideally: mean 0, standard deviation 1

Backup: How good is the prediction standard error?

References

Kline, Ronald R. 2017. The Cybernetics Moment: Or Why We Call Our Age the Information Age. Baltimore, Maryland: Johns Hopkins University Press.

Kolmogorov, Andrei N. 1941. “Interpolation Und Extrapolation von Stationären Zufälligen Folgen.” Bulletin of the Academy Sciences, USSR, Mathematical Series 5:3–14.

Wiener, Norbert. 1949. Extrapolation, Interpolation, and Smoothing of Stationary Time Series: With Engineering Applications. Cambridge, Massachusetts: The Technology Press of the Massachusetts Institute of Technology.