Previously

Aimed to be as close to descriptive and exploratory as possible
- Very weak assumptions (like stationarity)
- Or no assumptions (as in PCA)
Advantages:
- Security / robustness / reliability: less can go wrong
- Still able to say something
Drawbacks:
- Weak on inferences beyond the observables
- Weak on uncertainty
Going forward:
- Generative models (= distributions over the whole data)
- Uncertainty in inference

Linear autoregressions

The first-order linear autoregression, or AR(1):

\[\begin{eqnarray} X(t) & = & a + b X(t-1) + \epsilon(t)\\ X(0) & = & \text{some random variable or other} \end{eqnarray}\]

The innovations \(\epsilon(t)\) are
- All expectation 0
- All uncorrelated with each other
- All uncorrelated with \(X(0)\)
- Typically constant-variance
- To fully specify the model, need to give the distributions of \(X(0)\) and \(\epsilon\)
- In nice situations, all Gaussian and IID

Generating a new time series

Draw \(X(0)\) from its distribution
Draw \(\epsilon(1)\) from its distribution and set \(X(1) \leftarrow a+bX(0)+\epsilon(1)\)
Iterate:
- Draw \(\epsilon(t)\) from its distribution
- Set \(X(t) \leftarrow a+bX(t-1)+\epsilon(t)\)

Unroll the AR(1) a little

\[\begin{eqnarray} X(t) & = & a + b X(t-1) + \epsilon(t)\\ & = & a + \epsilon(t) + b(a+b X(t-2) + \epsilon(t-1))\\ & = & a + ba + b^2 a + \ldots + b^{t-1} a + \epsilon(t) + b\epsilon(t-1) + b^2 \epsilon(t-2) + \ldots + b^{t-1} \epsilon(1) + b^t X(0)\\ & = & a\sum_{k=0}^{t-1}{b^k} + \sum_{k=0}^{t}{\epsilon(t-k) b^k} + b^t X(0) \end{eqnarray}\]

At each time, get a random (\(\epsilon(t)\)) plus a deterministic (\(a\)) kick, whose impact is multiplied by \(b\) at each subsequent time step, forever

(infinite impulse response)

Think about the deterministic version

Set \(a=0\) to simplify book-keeping

\[\begin{eqnarray} x(t) & = & b x(t-1)\\ & = & \text{???} \end{eqnarray}\]

In-class exercise 1: Write \(x(t)\) in terms of \(b\), \(t\), and \(x(0)\)

Think about the deterministic version

Set \(a=0\) to simplify book-keeping

\[\begin{eqnarray} x(t) & = & b x(t-1)\\ & = & b^t x(0) \end{eqnarray}\]

If \(|b|<1\) then \(b^t \rightarrow 0\) as \(t\rightarrow \infty\)

So if \(|b| < 1\) then \(x(t) \rightarrow 0\)

First-order dynamics are exponential decay to 0 or growth to \(\pm \infty\)

Adding the noise back in

Constantly being perturbed away from the deterministic path

How would we predict?

Intuitively:

\[ \hat{X}(t+k) = b^{k} x(t) \]

Rigorously:

\[\begin{eqnarray} \Expect{X(t+k)|X(t)=x} & = & \Expect{b^k X(t) + \epsilon(t+k) + b \epsilon(t+k-1) + \ldots + b^{k-1}\epsilon(t)|X(t) = x}\\ & = & b^k x + 0 \end{eqnarray}\]

because innovations are uncorrelated

What about covariances?

\[\begin{eqnarray} \Cov{X(t+h), X(t)} & = & \Cov{b^h X(t) + \epsilon(t+h) + b\epsilon(t+h-1) + \ldots + b^{h-1}\epsilon(t), X(t)}\\ & = & b^h \Cov{X(t), X(t)} + 0\\ & = & b^h \Var{X(t)} \end{eqnarray}\]

\(\Rightarrow\) if \(\Var{X(t)}\) and \(\Expect{X(t)}\) are constant, this is stationary

Higher-order autoregressions

\[ X(t) = a + b_1 X(t-1) + b_2 X(t-2) + \ldots b_p X(t-p) + \epsilon(t) \]

Same rules about innovations
Same idea about how to generate
Same rules for prediction: \(\Expect{X(t)|X(t-1)=x_1, \ldots X(t-p)=x_p} = a + b_1 x_1 + b_2 x_2 + \ldots b_p x_p\)

What about multiple variables?

Vector autoregression of order 1, or VAR(1)

\[ \vec{X}(t) = \vec{a} + \mathbf{b} \vec{X}(t-1) + \vec{\epsilon}(t) \]

\(\vec{X}(t) =\) random vector of dimension \(p\), the state at time \(t\)

\(\vec{a} =\) deterministic vector of dimension \(p\)

\(\mathbf{b} =\) deterministic matrix of dimension \(p\times p\)

\(\vec{\epsilon}(t) =\) random vector of dimension \(p\), the innovation

What about multiple variables?

Zero out the offset \(\vec{a}\) for now

\[ \vec{X}(t) = \mathbf{b} \vec{X}(t-1) + \vec{\epsilon}(t) \]

What are the deterministic dynamics?

Linear dynamical systems in multiple dimensions

\[ \vec{x}(t) = \mathbf{b}\vec{x}(t-1) \]

Suppose the eigenvectors of \(\mathbf{b}\) form a basis
Then for coefficients \(c_1, \ldots c_p\), \[ \vec{x}(0) = \sum_{j=1}^{p}{c_j \vec{v}_j} \]

Linear dynamical systems in multiple dimensions

Dynamics are just multiplying: \[\begin{eqnarray} \vec{x}(t) & = & \mathbf{b}\vec{x}(t-1)\\ & = & \mathbf{b}^t \vec{x}(0) \end{eqnarray}\]
In-class exercise: Write \(\vec{x}(t)\) in terms of the \(c_j\), the \(\lambda_j\) and \(\vec{v}_j\)

Linear dynamical systems in multiple dimensions

Dynamics are just multiplying: \[\begin{eqnarray} \vec{x}(t) & = & \mathbf{b}\vec{x}(t-1)\\ & = & \mathbf{b}^t \vec{x}(0)\\ & = & \sum_{j=1}^{p}{ \lambda^t_j c_j \vec{v}_j} \end{eqnarray}\]

Eigenvalues determine the dynamics of a linear system

The easy case: all eigenvalues \(\lambda_1, \ldots \lambda_p\) are real

\(\lambda_j > 1\): grow along that direction
\(0 \leq \lambda_j < 1\): shrink along that direction towards the origin \(\vec{0}\)
\(\lambda_j < -1\): flip around the origin, grow in that direction
\(-1 < \lambda_j \leq 0\): flip around the origin and shrink

Eigenvalues determine the dynamics of a linear system

Some eigenvalues can be complex
- The corresponding coefficients are also complex
These always come in complex-conjugate pairs
- so the coefficients are complex-conjugate pairs
The formula \(\vec{x}(t) = \sum_{j=1}^{p}{\lambda^t_j c_j \vec{v}_j}\) still works
- the imaginary parts always cancel exactly
Complex eigenvalues \(\Leftrightarrow\) rotation

Rotation with complex eigenvalues

##       [,1] [,2]
## [1,]  0.99 0.01
## [2,] -0.01 0.99

eigen(b)

## eigen() decomposition
## $values
## [1] 0.99+0.01i 0.99-0.01i
## 
## $vectors
##                      [,1]                 [,2]
## [1,] 0.0000000-0.7071068i 0.0000000+0.7071068i
## [2,] 0.7071068+0.0000000i 0.7071068+0.0000000i

Mod(eigen(b)$values)

## [1] 0.9900505 0.9900505

Rotation with complex eigenvalues

(x <- matrix(c(1, 2), nrow = 2))

##      [,1]
## [1,]    1
## [2,]    2

b %*% x

##      [,1]
## [1,] 1.01
## [2,] 1.97

b %*% b %*% x

##        [,1]
## [1,] 1.0196
## [2,] 1.9402

Rotation with complex eigenvalues

Reset \(\mathbf{b}^{\prime} = \mathbf{b}/|\lambda_1|\) so now both eigenvalues have modulus 1

What’s going on here?

Having both \(x_1(t-1)\) and \(x_2(t-1)\) is like having \(x(t-1)\) and \(x(t-2)\)
Higher-order ARs are like VARs with extra variables to keep track of past states

Morals on linear, deterministic dynamical systems

Find the eigenvalues and eigenvectors
- \(|\lambda_j| < 1 \Rightarrow\) exponential decay along \(\vec{v}_j\)
- \(|\lambda_j| > 1 \Rightarrow\) exponential growth
- \(|\lambda_j| = 1 \Rightarrow\) eternal recurrence
- \(\mathrm{Im}(\lambda_j) \neq 0 \Rightarrow\) rotations in the space spanned by those eigenvectors
Higher-order dependence on the past \(\Rightarrow\) first-order dependence with extra memory variables

Adding on noise

If the eigenvalues are less than 1, \(\mathbf{b}\vec{X}(t-1)\) has less variance than \(\vec{X}(t-1)\)
The innovation contributes some extra variance
To be stationary, the shrinkage has to exactly balance the new variance

Summarizing

Autoregressive models are linear dynamical systems plus noise
Linear dynamics are governed by the eigenvalues and eigenvectors of the linear operator
The noise perturbs away from the exact linear dynamics
Stationarity means that the perturbations have to counter-balance the dynamics
What I haven’t told you: how to estimate these models

Backup: VAR(1) vs. 2nd-order dynamics

Remember that sine and cosine waves obey \(\frac{d^2 x}{dt^2}(t) = - \omega^2 x(t)\)

Say \(x_1(t)\) is a sine wave, and define \(x_2(t) = dx_1/dt\)

\[ \left[ \begin{array}{c} \frac{dx_1}{dt} \\ \frac{dx_2}{dt} \end{array}\right] = \left[ \begin{array}{c} x_2 \\ -x_1 \end{array}\right] = \left[\begin{array}{cc} 0 & 1 \\ -1 & 0\end{array}\right] \left[\begin{array}{c} x_1 \\ x_2 \end{array}\right] \]

Backup: VAR(1) vs. 2nd-order dynamics

Fix a small time increment \(h\)

\[ \left[ \begin{array}{c} \frac{x_1(t) - x_1(t-h)}{h} \\ \frac{x_2(t)-x_2(t-h)}{h} \end{array}\right] = \left[ \begin{array}{c} x_2(t-h) \\ -x_1(t-h) \end{array}\right] \]

Backup: VAR(1) vs. 2nd-order dynamics

\[ \left[ \begin{array}{c} x_1(t) - x_1(t-h) \\ x_2(t)-x_2(t-h) \end{array}\right] = \left[ \begin{array}{c} hx_2(t-h) \\ -hx_1(t-h) \end{array}\right] \]

Backup: VAR(1) vs. 2nd-order dynamics

\[ \left[ \begin{array}{c} x_1(t) \\ x_2(t) \end{array}\right] = \left[ \begin{array}{c} x_1(t-h) + hx_2(t) \\ x_2(t-h) -hx_1(t) \end{array}\right] \]

Backup: VAR(1) vs. 2nd-order dynamics

\[ \left[ \begin{array}{c} x_1(t) \\ x_2(t) \end{array}\right] = \left[ \begin{array}{cc} 1 & h \\ -h & 1 \end{array}\right] \left[\begin{array}{c} x_1(t-h) \\ x_2(t-h) \end{array}\right] \]

Backup: VAR(1) vs. 2nd-order dynamics

The extra variable \(x_2\) helps \(x_1\) keep track of where it is in its cycle
- Higher-order dynamics = first-order dynamics with extra variables
Tweaking to have \(|\lambda| = 1\) gives an exact cycle
- Complex eigenvalues off the unit = cycle \(\times\) exponential growth or decay

Backup: the intercept and re-centering

In 1D: if \(x(t) = a + bx(t-1)\) and \(y(t)=x(t) - \frac{a}{b-1}\), then \(y(t) = by(t-1)\)

\[\begin{eqnarray} x(t) & = & a+bx(t-1) ~ \text{(assumption)}\\ x(t) - c & = & b(x(t-1) - c) ~\text{(asserted to find right} ~ c)\\ -c & = & -a -bc ~\text{(subtract 2nd line from 1st)}\\ (b-1)c & = & -a ~ \text{(re-arrange)} \end{eqnarray}\]

In multiple dimensions:

\[\begin{eqnarray} \vec{x}(t) & = & \vec{a} + \mathbf{b}\vec{x}(t-1)\\ \vec{x}(t) - \vec{c} & = & \mathbf{b}(\vec{x}(t-1)-\vec{c})\\ (\mathbf{I} - \mathbf{b})\vec{c} & = & \vec{a} \end{eqnarray}\]

Always has a solution \(\vec{c}\) if \((\mathbf{I}-\mathbf{b})\) is invertible
- or if \(\vec{a}\) is in the range space of \(\mathbf{b}\)

Linear Generative Models for Time Series

Previously

Linear autoregressions

Generating a new time series

Unroll the AR(1) a little

Think about the deterministic version

Think about the deterministic version

Adding the noise back in

How would we predict?

What about covariances?

Higher-order autoregressions

What about multiple variables?

What about multiple variables?

Linear dynamical systems in multiple dimensions

Linear dynamical systems in multiple dimensions

Linear dynamical systems in multiple dimensions

Eigenvalues determine the dynamics of a linear system

Eigenvalues determine the dynamics of a linear system

Rotation with complex eigenvalues

Rotation with complex eigenvalues

Rotation with complex eigenvalues

Rotation with complex eigenvalues

What’s going on here?

Morals on linear, deterministic dynamical systems

Adding on noise

Summarizing

Backup: VAR(1) vs. 2nd-order dynamics

Backup: VAR(1) vs. 2nd-order dynamics

Backup: VAR(1) vs. 2nd-order dynamics

Backup: VAR(1) vs. 2nd-order dynamics

Backup: VAR(1) vs. 2nd-order dynamics

Backup: VAR(1) vs. 2nd-order dynamics

Backup: the intercept and re-centering