Lessons (?) for causal discovery from Markov models

Cosma Shalizi

22 September 2018

\[ \newcommand{\indep}{\perp} \]

Two pieces of conventional wisdom

Causal discovery in multivariate data

The classic view (Spirtes, Glymour, and Scheines 1993)

This seems fine

Consistent conditional independence tests

So what’s the problem?

Brief excursion into information theory

The point of the information theory

Let’s talk about Markov models

More conditional independence

Focus on learning the Markov order

More information theory

Likelihood ratio to the rescue?

A general result

What’s the intuition?

Nonetheless, there is consistency

A closer look at one of those approaches

(continued)

(continued)

(continued)

Why is the Peres-Shields estimator consistent?

A little more about variable-length chains

Summing up

What’s the moral?

(original)

References

Billingsley, Patrick. 1961. Statistical Inference for Markov Processes. Chicago: University of Chicago Press.

Cappé, Olivier, Eric Moulines, and Tobias Rydén. 2005. Inference in Hidden Markov Models. New York: Springer.

Csiszár, Imre, and Paul C. Shields. 2000. “The Consistency of the BIC Markov Order Estimator.” Annals of Statistics 28:1601–19. http://projecteuclid.org/euclid.aos/1015957472.

Csiszár, Imre, and Zsolt Talata. 2006. “Context Tree Estimation for Not Necessarily Finite Memory Processes, via BIC and MDL.” IEEE Transactions on Information Theory 52:1007–16. https://doi.org/10.1109/TIT.2005.864431.

Foulkes, J. D. 1959. “A Class of Machines Which Determine the Statistical Structure of a Sequence of Characters.” In Wescon Convention Record, 4:66–73. Institute of Radio Engineers.

Galves, Antonio, and Florencia Leonardi. 2008. “Exponential Inequalities for Empirical Unbounded Context Trees.” In In and Out of Equilibrium 2, edited by V. Sidoravicius and M. E. Vares, 257–69. Basel: Birkhäuser. https://doi.org/10.1007/978-3-7643-8786-0_12.

Palomar, Daniel P., and Sergio Verdú. 2008. “Lautum Information.” IEEE Transactions on Information Theory 54:964–75. http://www.princeton.edu/~verdu/lautum.info.pdf.

Peres, Yuval, and Paul Shields. 2005. “Two New Markov Order Estimators.” arxiv:math.ST/0506080. http://arxiv.org/abs/math.ST/0506080.

Robins, James M., Richard Scheines, Peter Spirtes, and Larry Wasserman. 2003. “Uniform Consistency in Causal Inference.” Biometrika 90:491–515. http://www.stat.cmu.edu/tr/tr725/tr725.html.

Shalizi, Cosma Rohilla, and Kristina Lisa Klinkner. 2004. “Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences.” In Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (Uai 2004), edited by Max Chickering and Joseph Y. Halpern, 504–11. Arlington, Virginia: AUAI Press. http://arxiv.org/abs/cs.LG/0406011.

Shannon, Claude E. 1948. “A Mathematical Theory of Communication.” Bell System Technical Journal 27:379–423.

Spirtes, Peter, Clark Glymour, and Richard Scheines. 1993. Causation, Prediction, and Search. 1st ed. Berlin: Springer-Verlag.