The Statistical Analysis of Complex Systems Models

Cosma Rohilla Shalizi

Attention conservation notice (November 2018): I started working on this book in 2007--2008, but the associated course hasn't been offered since then, and I haven't really worked on it in a decade. I'd like to come back to it at some point, but I can't make any promises about when. In the meanwhile, some of the planned material has been incorporated into Advanced Data Analysis from an Elementary Point of View, and into my lecture notes for Data over Space and Time. I leave this page up, more or less as it was ten years ago, mostly to help remind me that writing the book would be a Good Thing. I have no suggestions for a substitute book that you could read instead.

Why This Book?

Complex systems are ones with a large effective number of strongly-interdependent variables. This excludes both low-dimensional systems, and high-dimensional ones where the variables are either independent, or so strongly coupled that only a few variables effectively determine all the rest. Since the 1980s, an interdisciplinary movement of physicists, mathematicians, economists, computer scientists, biologists, anthropologists and other scientists has explored techniques for modeling a broad range of such systems, and their common features and inter-connections. These techniques rely heavily on intensive, sophisticated computer simulations, and notions of information, search and adaptation feature prominently in the theories.

Complex systems can now point to a solid record of scientific accomplishment, improving our understanding of processes ranging from pattern formation in chemical oscillators and metabolic networks to ecological succession, Balinese agriculture, and the persistence of concentrated poverty in wealthy societies. Someone wishing to assimilate these results can now find reasonable textbooks on the construction of such models, as well as on the mathematical foundations of the complex systems approach, and a range of excellent specialized monographs.

What is not available, either in books or in the journal literature, is any systematic treatment of the statistical analysis of these models: that is, how to fit, test, compare and otherwise evaluate these models in the light of data from the real world. Within complex systems, it is increasingly recognized that confronting models with data is crucial to further progress, but almost no one in the field has been trained in modern methods of statistics, which has evolved considerably beyond fitting straight lines subject to independent additive Gaussian noise. Not so coincidentally, the period during which the field of complex systems developed was also the period during which statistical theory coalesced with machine learning, to develop powerful methods for reliably inferring models large numbers of variables which interact in complex, nonlinear ways. The reason this is not a coincidence is that the new statistical learning is also founded on the mathematical theories of information and search, and its applications is also completely reliant on cheap, high-speed computing.

Put slightly differently, there are two essential components to statistical analysis: there must be a class of stochastic models, and inferential procedures for linking the models to data. The new statistical learning theory has developed a range of such procedures, as well as general principles for evaluating their reliability and performance. What complex systems can provide is, precisely, interesting stochastic models of important phenomena. Many of the main complex systems models fall under broad categories which are already familiar in statistics and machine learning (agent-based models can be seen, for instance, as interacting hidden Markov models), but with wrinkles and special features of intrinsic interest.

Complex systems models and statistical learning theory, then, are pretty much made for each other. The purpose of this book is to perform an introduction.

Why This Page?

Currently, this is a place-holder, intended to nag me to work on the manuscript more regularly. (I am was supposed to deliver it to the publisher in the fall of 2009. Obviously, that didn't happen.) Portions of it will draw from earlier manuscripts on "Methods and Techniques of Complex Systems Science" and lecture notes on basic probability, statistics and stochastic processes.

Contents

  1. Introduction
  2. General ideas of statistical learning and data-mining, including cross-validation and bootstrapping
  3. Information theory, hypothesis testing, large deviations principle
  4. Graphical models and conditional independence
  5. Geometric view of statistical inference, including maximum-likelihood estimation and the EM algorithm
  6. More advanced theory of statistical learning, emphasizing structural risk minimization and process-oriented evaluation
  7. Using simulations: Monte Carlo and indirect inference
  8. Power laws and other heavy-tailed distributions
  9. Time series analysis, prediction, state estimation
  10. Symbolic dynamics, discrete time series, and the construction of optimal nonlinear models
  11. Network models: structure
  12. Cellular automata
  13. Network models: dynamics
  14. Agent-based models
  15. General issues in evaluating complex systems models
  16. Complexity measures
  17. Appendix: guide to further reading
  18. Appendix: review of basics of probability, stochastic processes, and statistical procedures

Why That Image?

Because I really like Bosch; and, while a lot of strange stuff is happening in the Garden of Earthly Delights, not only is it possible to understand what's going on, it looks like it's fun. (By the end of the writing project I might feel more like using Melencolia I as an emblem.) The publisher, however, seems to prefer more subdued cover designs.
Page made 2 December 2007; updated 5 May 2008; updated again 12 November 2018.