Markov Chains I

36-467/36-667

13 November 2018

\[ \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Prob}[1]{\mathbb{P}\left[ #1 \right]} \newcommand{\Probwrt}[2]{\mathbb{P}_{#1}\left( #2 \right)} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\Expectwrt}[2]{\mathbb{E}_{#1}\left[ #2 \right]} \newcommand{\InitDist}{p_{\mathrm{init}}} \newcommand{\InvDist}{p^{*}} \]

In our previous episodes…

Markov Processes

The Markov property between two extremes

Markov processes are generative models

rmarkov <- function(n, rinitial, rtransition) {
  x <- vector(length=n)
  x[1] <- rinitial()
  for (t in 2:n) {
    x[t] <- rtransition(x[t-1])
  }
  return(x)
}

Markov Chains

Graph vs. matrix

\(\Leftrightarrow \mathbf{q}=\left[\begin{array}{cc} 0.5 & 0.5 \\ 0.75 & 0.25 \end{array} \right]\)

Your Basic Markov Chain

rmarkovchain <- function(n, p0, q) {
  k <- length(p0)
  stopifnot(k==nrow(q),k==ncol(q),all.equal(rowSums(q),rep(1,time=k)))
  rinitial <- function() { sample(1:k,size=1,prob=p0) }
  rtransition <- function(x) { sample(1:k,size=1,prob=q[x,]) }
  return(rmarkov(n,rinitial,rtransition))
}
q <- matrix(c(0.5, 0.5, 0.75, 0.25), byrow=TRUE, nrow=2)
x <- rmarkovchain(1e4,c(0.5,0.5),q)
head(x)
## [1] 1 1 1 2 1 2

Checking that it works

ones <- which(x[-1e4]==1)  # Why omit the last step?
twos <- which(x[-1e4]==2)
signif(table(x[ones+1])/length(ones),3)
## 
##     1     2 
## 0.499 0.501
signif(table(x[twos+1])/length(twos),3)
## 
##     1     2 
## 0.752 0.248

vs. \((0.5,0.5)\) and \((0.75, 0.25)\) ideally

Why is this a check?

How trajectories evolve

Where trajectories end in the long run

How distributions evolve

More fun with eigenvalues and eigenvectors

Special properties of stochastic matrices

Irreducible, aperiodic Markov chains

Solution

Irreducible, aperiodic Markov chains

Invariant distributions

Ergodicity and Weak Dependence

Sample frequencies vs. probabilities

eigen(t(q))
## eigen() decomposition
## $values
## [1]  1.00 -0.25
## 
## $vectors
##           [,1]       [,2]
## [1,] 0.8320503 -0.7071068
## [2,] 0.5547002  0.7071068
eigen(t(q))$vectors[,1]/sum(eigen(t(q))$vectors[,1])
## [1] 0.6 0.4

table(rmarkovchain(1e4,c(0.5,0.5),q))
## 
##    1    2 
## 6010 3990
table(rmarkovchain(1e4,c(0.5,0.5),q))
## 
##    1    2 
## 6003 3997
table(rmarkovchain(1e4,c(0,1),q))
## 
##    1    2 
## 6057 3943
table(rmarkovchain(1e4,c(1,0),q))
## 
##    1    2 
## 5953 4047

Central limit theorem

time.avgs <- replicate(100, mean(rmarkovchain(1e4, c(0.5, 0.5), q)))
qqnorm(time.avgs); qqline(time.avgs)

What if there’s more than one irreducible set?

What if the state space isn’t finite?

What if the state space is continuous?

Variations on the theme

Summary

Backup: Further reading

Backup: The philosophical origins of the Markov property

Backup: The philosophical origins of the Markov property

Backup: The philosophical origins of the Markov property

Backup: TL;DR on the origins of the Markov property

Backup: Markov chain Monte Carlo

Backup: Markov chain Monte Carlo

  1. Set \(X(0)\) however we like, and initialize \(t \leftarrow 0\).
  2. Draw a {} \(Z(t)\) from some conditional distribution \(r(\cdot|X_t)\) — for instance a Gaussian or uniform centered on \(X(t)\), or anything else easy to draw from and with smooth enough noise.
  3. Draw \(U(t)\) independently from a uniform distribution on \([0,1)\).
  4. If \(U(t) < p(Z(t)))/p(X(t)))\), then \(X(t+1) = Z(t)\), otherwise \(X(t+1)=X(t)\)
  5. Increase \(t\) by 1 and go to step 2.

References

Basharin, Gely P., Amy N. Langville, and Valeriy A. Naumov. 2004. “The Life and Work of A. A. Markov.” Linear Algebra and Its Applications 386:3–26. http://langvillea.people.cofc.edu/MarkovReprint.pdf.

Cassirer, Ernst. 1944. An Essay on Man: An Introduction to a Philosophy of Human Culutre. New Haven, Connecticut: Yale University Press.

Graham, Loren, and Jean-Michel Kantor. 2009. Naming Infinity: A True Story of Religious Mysticism and Mathematical Creativity. Cambridge, Massachusetts: Harvard University Press.

Grimmett, G. R., and D. R. Stirzaker. 1992. Probability and Random Processes. 2nd ed. Oxford: Oxford University Press.

Hacking, Ian. 1990. The Taming of Chance. Cambridge, England: Cambridge University Press.

Lasota, Andrzej, and Michael C. Mackey. 1994. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics. Berlin: Springer-Verlag.

Mackey, Michael C. 1992. Time’s Arrow: The Origins of Thermodynamic Behavior. Berlin: Springer-Verlag.

Metropolis, Nicholas, Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller. 1953. “Equations of State Calculations by Fast Computing Machines.” Journal of Chemical Physics 21:1087–92. https://doi.org/10.1063/1.1699114.

Meyn, S. P., and R. L. Tweedie. 1993. Markov Chains and Stochastic Stability. Berlin: Springer-Verlag.

Porter, Theodore M. 1986. The Rise of Statistical Thinking, 1820–1900. Princeton, New Jersey: Princeton University Press.