\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\SampleVar}[1]{\widehat{\mathrm{Var}}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\TrueRegFunc}{\mu} \newcommand{\EstRegFunc}{\widehat{\TrueRegFunc}} \DeclareMathOperator{\tr}{tr} \DeclareMathOperator*{\argmin}{argmin} \DeclareMathOperator{\det}{det} \newcommand{\TrueNoise}{\epsilon} \newcommand{\EstNoise}{\widehat{\TrueNoise}} \]

In our last episode

General approach to optimal linear prediction
- Predict \(Y\) from \(\vec{Z} = [Z_1, Z_2, \ldots Z_p]\)
- Prediction is \(\alpha + \vec{\beta} \cdot \vec{Z}\)
- Best \(\alpha = \Expect{Y} - \vec{\beta} \cdot \Expect{\vec{Z}}\)
- Best \(\beta = \Var{\vec{Z}}^{-1} \Cov{\vec{Z}, Y}\)
Today: application to time series
- What can this do for us?
- How do we find the covariances?

Optimal linear prediction for time series

Given: \(X(t_1), X(t_2), \ldots X(t_n)\)
Desired: prediction of \(X(t_0)\)

\[\begin{eqnarray} \EstRegFunc(t_0) & = & \alpha + \vec{\beta} \cdot \left[\begin{array}{c} X(t_1) \\ X(t_2) \\ \vdots \\ X(t_n) \end{array}\right]\\ \alpha & = & \Expect{X(t_0)} - \vec{\beta} \cdot \left[\begin{array}{c} \Expect{X(t_1)}\\ \Expect{X(t_2)} \\ \vdots \\ \Expect{X(t_n)}\end{array}\right] ~ \text{(goes away if everything's centered)}\\ \vec{\beta} & = & {\left[\begin{array}{cccc} \Var{X(t_1)} & \Cov{X(t_1), X(t_2)} & \ldots & \Cov{X(t_1), X(t_n)}\\ \Cov{X(t_1), X(t_2)} & \Var{X(t_2)} & \ldots & \Cov{X(t_2), X(t_n)}\\ \vdots & \vdots & \ldots & \vdots\\ \Cov{X(t_1), X(t_n)} & \Cov{X(t_2), X(t_n)} & \ldots & \Var{X(t_n)}\end{array}\right]}^{-1} \left[\begin{array}{c} \Cov{X(t_0), X(t_1)}\\ \Cov{X(t_0), X(t_2)}\\ \vdots \\ \Cov{X(t_0), X(t_n)}\end{array}\right] \end{eqnarray}\]

What is this good for?

Interpolation

Time series often have gaps
- Instruments fail, people mess up, circumstances…
What happened at the times in between the observations?

Back to Kyoto

A lot of this is just made up

What we didn’t tell R to make up

When did the cherries flower in 1015?

We need \(\Cov{X(1015), X(t_i)}\) for every year \(t_i\) where we do have data
We need \(\Expect{X(t_i)}\), \(\Var{X(t_i)}\) and \(\Cov{X(t_i), X(t_j)}\) ditto
We need \(\Expect{X(1015)}\)

Similarly for extrapolation

What was \(X(800)\)? (“retrodiction”)
What was \(X(2016)\)?
What will \(X(2019)\) be? (“prediction” or “forecast” in the strictest sense)

Getting the expectations and covariances

We only see each \(X(t_i)\) once
- Maybe gives us an idea of \(\Expect{X(t_i)}\)
- but not \(\Var{X(t_i)}\) or \(\Cov{X(t_i), X(t_j)}\)
- let alone \(\Cov{X(t_0), X(t_i)}\)
We could repeat the experiment many times
We could make assumptions

Repeating the experiment

\[\begin{eqnarray} \overline{x}(t) \equiv \frac{1}{n}\sum_{i=1}^{n}{x^{(i)}(t)} & \rightarrow & \Expect{X(t)}\\ \frac{1}{n}\sum_{i=1}^{n}{(x^{(i)}(t) - \overline{x}(t)) (x^{(i)}(s) - \overline{x}(s))} & \rightarrow & \Cov{X(t), X(s)} \end{eqnarray}\]

Making assumptions

Assume some covariances (and expectations) are equal
Weaker: assume some of them are similar

Stationarity

A time series is weakly stationary when \(\Expect{X(t)} = \Expect{X(s)}\) (constant in time) and \(\Cov{X(t), X(s)} = \gamma(|t-s|)\)
- \(\gamma(h)\) is the autocovariance function at lag \(h\)
- a.k.a. “second-order” or “wide-sense” stationarity
Assumes all the expectations are the same
Assumes \(\Cov{X(t), X(t+h)} = \Cov{X(s), X(s+h)}\) so lots of covariances are the same
Pool information: \[\begin{eqnarray} \overline{x} \equiv \frac{1}{n}\sum_{i=1}^{n}{X(t_i)} & \rightarrow & \Expect{X(0)}\\ \frac{1}{n}\sum_{i=1}^{n}{(x(t_i) - \overline{x})(x(t_i + h) - \overline{x})} & \rightarrow & \gamma(h) \end{eqnarray}\]

The autocovariance function

\(\gamma(0) = \Var{X(t)}\) (constant in \(t\))
\(\rho(h) \equiv \frac{\gamma(h)}{\gamma(0)} =\) autocorrelation function

Assuming stationarity…

\[\begin{eqnarray} \EstRegFunc(t_0) & = & \overline{x} + \left[\begin{array}{c} X(t_1) -\overline{x}\\ X(t_2) - \overline{x} \\ \vdots \\ X(t_n) -\overline{x}\end{array}\right] {\left[\begin{array}{ccc} \gamma(0) & \gamma(|t_1 - t_2|) & \ldots & \gamma(|t_1 - t_n|)\\ \gamma(|t_2-t_1|) & \gamma(0) & \ldots & \gamma(|t_2-t_n|)\\ \vdots & \vdots & \ldots & \vdots\\ \gamma(|t_n-t_1|) & \gamma(|t_n-t_2|) & \ldots & \gamma(0)\end{array}\right]}^{-1} \left[\begin{array}{c} \gamma(|t_0 - t_1|)\\ \gamma(|t_0-t_2|)\\ \vdots\\ \gamma(|t_0 -t_n|)\end{array}\right]\\ & = & \overline{x} + \left[\begin{array}{c} X(t_1) -\overline{x}\\ X(t_2) - \overline{x} \\ \vdots \\ X(t_n) -\overline{x}\end{array}\right] \mathbf{v}^{-1} \mathbf{c}\\ \Var{X(t_0) - \EstRegFunc(t_0)} & = & \gamma(0) - \mathbf{c}^T \mathbf{v}^{-1} \mathbf{c} \end{eqnarray}\]

In-class exercise

Re-write the predictor in terms of \(\rho\) — do we need \(\gamma(0)\)?

In-class exercise — solution

\[\begin{eqnarray} \EstRegFunc(t_0) & = & \overline{x} + \left[\begin{array}{c} X(t_1) -\overline{x}\\ X(t_2) - \overline{x} \\ \vdots \\ X(t_n) -\overline{x}\end{array}\right] {\left[\begin{array}{ccc} \gamma(0) & \gamma(|t_1 - t_2|) & \ldots & \gamma(|t_1 - t_n|)\\ \gamma(|t_2-t_1|) & \gamma(0) & \ldots & \gamma(|t_2-t_n|)\\ \vdots & \vdots & \ldots & \vdots\\ \gamma(|t_n-t_1|) & \gamma(|t_n-t_2|) & \ldots & \gamma(0)\end{array}\right]}^{-1} \left[\begin{array}{c} \gamma(|t_0 - t_1|)\\ \gamma(|t_0-t_2|)\\ \vdots\\ \gamma(|t_0 -t_n|)\end{array}\right]\\ \gamma(h) & = & \gamma(0) \rho(h)\\ \EstRegFunc(t_0) & = & \overline{x} + \left[\begin{array}{c} X(t_1) -\overline{x}\\ X(t_2) - \overline{x} \\ \vdots \\ X(t_n) -\overline{x}\end{array}\right] {\left[\begin{array}{ccc} 1 & \rho(|t_1 - t_2|) & \ldots & \rho(|t_1 - t_n|)\\ \rho(|t_2-t_1|) & 1 & \ldots & \rho(|t_2-t_n|)\\ \vdots & \vdots & \ldots & \vdots\\ \rho(|t_n-t_1|) & \rho(|t_n-t_2|) & \ldots & 1\end{array}\right]}^{-1} \frac{1}{\gamma(0)} \gamma(0) \left[\begin{array}{c} \rho(|t_0 - t_1|)\\ \rho(|t_0-t_2|)\\ \vdots\\ \rho(|t_0 -t_n|)\end{array}\right]\\ \end{eqnarray}\]

In R

acf(x, lag.max, type, plot, na.action, ...)

x = time series or data frame of multiple time series
lag.max = maximum value of lag \(h\)
type = correlation (default) or covariance?
plot = make a plot? (default)
na.action = how to handle NAs? default is give up, na.pass will use complete pairs

In R

kyoto.acf <- acf(kyoto$Flowering.DOY, lag.max = 100, type = "covariance", na.action = na.pass)

In R

kyoto.acf[0:5]

## 
## Autocovariances of series 'kyoto$Flowering.DOY', by lag
## 
##     0     1     2     3     4     5 
## 41.00  7.96  7.47  7.74  5.96  6.07

kyoto.acf[0:5]$acf

## , , 1
## 
##           [,1]
## [1,] 41.000475
## [2,]  7.963781
## [3,]  7.470460
## [4,]  7.744336
## [5,]  5.960905
## [6,]  6.070874

When was the flowering in 1015?

available.years <- with(na.omit(kyoto), Year.AD[Year.AD > 1015 - 49 & Year.AD < 
    1015 + 49])
historical.mean <- mean(kyoto$Flowering.DOY, na.rm = TRUE)
CovYZ <- matrix(kyoto.acf[abs(available.years - 1015)]$acf, ncol = 1)
year.lags <- outer(available.years, available.years, "-")
year.lags <- abs(year.lags)
VarZ <- kyoto.acf[year.lags]$acf
VarZ <- matrix(VarZ, ncol = length(available.years), byrow = FALSE)
Z <- with(kyoto, Flowering.DOY[Year.AD %in% available.years])
(fitted.value <- historical.mean + (Z - historical.mean) %*% solve(VarZ) %*% 
    CovYZ)

##          [,1]
## [1,] 106.1969

(fitted.value.se <- sqrt(as.matrix(kyoto.acf[0]$acf) - t(CovYZ) %*% solve(VarZ) %*% 
    CovYZ))

##          [,1]
## [1,] 5.886278

When was the flowering in 1015?

This is a lot of work…

kyoto.prediction <- function(times) {
    historical.mean <- mean(kyoto$Flowering.DOY, na.rm = TRUE)
    historical.variance <- as.matrix(kyoto.acf[0]$acf)
    fits <- matrix(0, nrow = length(times), ncol = 2)
    colnames(fits) <- c("fit", "se")
    rownames(fits) <- paste(times)
    for (t in times) {
        available.years <- with(na.omit(kyoto), Year.AD[Year.AD > t - 49 & Year.AD < 
            t + 49])
        available.years <- setdiff(available.years, t)
        CovYZ <- matrix(kyoto.acf[abs(available.years - t)]$acf, ncol = 1)
        year.lags <- outer(available.years, available.years, "-")
        year.lags <- abs(year.lags)
        VarZ <- kyoto.acf[year.lags]$acf
        VarZ <- matrix(VarZ, ncol = length(available.years), byrow = FALSE)
        Z <- with(kyoto, Flowering.DOY[Year.AD %in% available.years])
        fits[paste(t), "fit"] <- historical.mean + (Z - historical.mean) %*% 
            solve(VarZ) %*% CovYZ
        fits[paste(t), "se"] <- sqrt(max(0, historical.variance - t(CovYZ) %*% 
            solve(VarZ) %*% CovYZ))
    }
    return(data.frame(fits))
}

Finally, some interpolation

plot(Flowering.DOY ~ Year.AD, data = kyoto, type = "p", pch = 16, cex = 0.5, 
    ylab = "Day in year of full flowering", xlab = "Year (AD)", xlim = c(1000, 
        1050))
all.fits <- kyoto.prediction(801:2015)
lines(801:2015, all.fits$fit, col = "blue")
lines(801:2015, all.fits$fit - all.fits$se, col = "blue", lty = "dashed")
lines(801:2015, all.fits$fit + all.fits$se, col = "blue", lty = "dashed")

Finally, some interpolation

Some extrapolation

plot(Flowering.DOY ~ Year.AD, data = kyoto, type = "p", pch = 16, cex = 0.5, 
    xlim = c(1940, 2020))
new.fits <- kyoto.prediction(1940:2020)
lines(1940:2020, new.fits$fit, col = "blue")
lines(1940:2020, new.fits$fit - new.fits$se, col = "blue", lty = "dashed")
lines(1940:2020, new.fits$fit + new.fits$se, col = "blue", lty = "dashed")

Stationarity and trend-plus-fluctuation

We can always say \(X(t) = \TrueRegFunc(t) + \TrueNoise(t)\), with \(\Expect{X(t)} =\TrueRegFunc(t)\) and \(\Expect{\TrueNoise(t)} = 0\)
If \(X\) is (weakly) stationary, then
- \(\TrueRegFunc(t) = \mu(0)\), a constant
- \(\Cov{\TrueNoise(t), \TrueNoise(s)} = \Cov{X(t), X(s)} = \gamma(|t-s|)\)
Assuming a flat trend is pretty strong…

Stationary fluctuations around a non-stationary trend

If \(\TrueNoise\) is weakly stationary, then \(X - \TrueRegFunc\) is weakly stationary
so \(X - \EstRegFunc\) should be (approximately) weakly stationary
Recipe:
- Use smoothing to estimate \(\Expect{X(t)}\) as \(\EstRegFunc(t)\)
- Find ACF from \(\EstNoise(t) = X(t) - \EstRegFunc(t)\)
- Now get the coefficients and predict

Detrending the cherry blossoms with a spline

kyoto.spline <- with(na.omit(kyoto), smooth.spline(x = Year.AD, y = Flowering.DOY))
residuals.with.NAs <- rep(NA, times = nrow(kyoto))
residuals.with.NAs[!is.na(kyoto$Flowering.DOY)] <- na.omit(kyoto$Flowering.DOY) - 
    kyoto.spline$y
kyoto$fluctuation <- residuals.with.NAs

Detrending the cherry blossoms with a spline

Covariance estimation

acf uses the sample covariance
Estimates covariance at each lag separately
Sometimes inefficient
- Covariances might have known form, e.g., \(\gamma(h) = \gamma(0)e^{-h/\tau}\)
- Fancier tricks (when we do Fourier methods)

Checking stationarity

Divide the data into intervals
Re-estimate ACF on each interval
Shouldn’t change much
- How much is too much? (Will answer later with simulation)
Some inherent pitfalls:
- Very slight non-stationarity looks pretty stationary
- Very slowly decaying correlations look pretty non-stationary

Multiple time series

\(X\) and \(U\) are jointly weakly stationary when
- \(X\) is weakly stationary, ACF \(\gamma_X\)
- \(U\) is weakly stationary, ACF \(\gamma_U\)
- \(\Cov{X(t), U(s)} = \gamma_{UX}(|t-s|)\)
ccf in R will calculate the cross-covariance (or cross-correlation) function
- So will acf if it’s given a matrix or data frame
Plug and chug: if predicting \(X\) from \(U\)
- \(\gamma_U\) goes into the variance matrix \(\mathbf{v}\)
- \(\gamma_{UX}\) goes into the covariance matrix \(\mathbf{c}\)

Where did all this come from?

Norbert Wiener (1894–1964)
- One of the great mathematicians of the 20th century
- Worked very closely with engineers
- Why we talk about “feedback”, “information”, “cyber-”, etc.
- (Photo from Kline (2017))

Wiener’s research in 1942

Anti-aircraft fire control
- Aim at where the target will be
- The target moves erratically
- \(\Rightarrow\) Need to extrapolate a random process
Wiener’s solution: basically what we’ve just done
- Plus the continuous-time version
- Plus implementation in 1940s electronics
- Declassified after the war as Wiener (1949)
- Related work by Kolmogorov (1941)
Wiener and Kolmogorov were the first to realize optimal linear prediction didn’t need the usual regression assumptions (Gaussian noise, independent variables, etc.)

Summing up

Optimal linear prediction needs to know the trend and the autocovariance function
Estimating them from one time series needs assumptions
- Remove trend
- Find ACF from residuals / estimated fluctuations
Once we have the ACF, finding coefficients is just linear algebra

Backup: Not inverting the variance matrix

\(\vec{\beta} = \Var{\vec{Z}}^{-1} \Cov{\vec{Z}, Y}\)
But inverting an \(n\times n\) matrix takes \(O(n^3)\) operations
Solving the equation \(\Var{\vec{Z}} \vec{\beta} = \Cov{\vec{Z}, Y}\) for fixed right-hand side can be done in \(O(n^2)\) operations
Details left to numerical linear algebra texts

Backup: How good is the prediction standard error?

all.fits$std.residuals <- (kyoto$Flowering.DOY - all.fits$fit)/all.fits$se
mean(all.fits$std.residuals, na.rm = TRUE)

## [1] -0.008311344

sd(all.fits$std.residuals, na.rm = TRUE)

## [1] 1.00777

Ideally: mean 0, standard deviation 1

Backup: How good is the prediction standard error?

References

Kline, Ronald R. 2017. The Cybernetics Moment: Or Why We Call Our Age the Information Age. Baltimore, Maryland: Johns Hopkins University Press.

Kolmogorov, Andrei N. 1941. “Interpolation Und Extrapolation von Stationären Zufälligen Folgen.” Bulletin of the Academy Sciences, USSR, Mathematical Series 5:3–14.

Wiener, Norbert. 1949. Extrapolation, Interpolation, and Smoothing of Stationary Time Series: With Engineering Applications. Cambridge, Massachusetts: The Technology Press of the Massachusetts Institute of Technology.

Linear Prediction for Time Series

In our last episode

Optimal linear prediction for time series

Interpolation

Back to Kyoto

What we didn’t tell R to make up

When did the cherries flower in 1015?

Similarly for extrapolation

Getting the expectations and covariances

Repeating the experiment

Making assumptions

Stationarity

The autocovariance function

Assuming stationarity…

In-class exercise

In-class exercise — solution

In R

In R

In R

When was the flowering in 1015?

When was the flowering in 1015?

This is a lot of work…

Finally, some interpolation

Finally, some interpolation

Some extrapolation

Stationarity and trend-plus-fluctuation

Stationary fluctuations around a non-stationary trend

Detrending the cherry blossoms with a spline

Detrending the cherry blossoms with a spline

Detrending the cherry blossoms with a spline

Detrending the cherry blossoms with a spline

Covariance estimation

Checking stationarity

Multiple time series

Where did all this come from?

Wiener’s research in 1942

Summing up

Backup: Not inverting the variance matrix

Backup: How good is the prediction standard error?

Backup: How good is the prediction standard error?

References