Advanced Data Analysis from an Elementary Point of View
This is a draft textbook on data analysis methods, intended for a
one-semester course for advance undergraduate students who have already taken
classes in probability, mathematical statistics, and linear regression. It
began as the lecture notes for 36-402 at Carnegie Mellon
University.
By making this draft generally available, I am not promising to provide any
assistance or even clarification whatsoever. Comments are,
however, generally welcome.
The book is under contract to Cambridge
University Press; it should be turned over to the press at the end
of 2013 or beginning of 2014 in early before
the end of 2015 by the end of 2018 2019,
inshallah when I can manage. A copy of the next-to-final version will
remain freely accessible here permanently.
What you're probably looking for
Complete draft in PDF
Directory of chapter-by-chapter R files for examples
Directory of data sets used in examples
Table of contents
I. Regression and Its Generalizations
- Regression Basics
- The Truth about Linear Regression
- Model Evaluation
- Smoothing in Regression
- Simulation
- The Bootstrap
- Splines
- Additive Models
- Testing Regression Specifications
- Weighting and Variance
- Logistic Regression
- Generalized Linear Models and Generalized Additive Models
- Classification and Regression Trees
II. Distributions and Latent Structure
- Density Estimation
- Principal Components Analysis
- Factor Models
- Mixture Models
- Graphical Models
III. Causal Inference
- Graphical Causal Models
- Identifying Causal Effects
- Estimating Causal Effects
- Discovering Causal Structure
IV. Dependent Data
- Time Series
- Simulation-Based Inference
Online-only Appendices
- Big O and Little o Notation
- Taylor Expansions
- Propagation of Error, and Standard Errors for Derived Quantities
- Optimization
- Relative Distributions and Smooth Tests of Goodness of Fit
- Nonlinear Dimensionality Reduction
- Rudimentary Graph Theory
- Missing Data
- Writing R Functions
Data-Analysis Assignments
Planned changes
- Remove redundant versions of the data-analysis assignments; provide solutions as a separate document through publisher
- Unified treatment of information theory as an appendix
- Improved (=correct) treatment of nonparametric instrument variables
- Trim time-series chapter so it's less of a catalog of everything that might be useful
- Break out stuff on heuristic essential asymptotics as a separate appendix
- Make sure notation is consistent throughout: insist that vectors are
always matrices, or use more geometric notation?
- Figure out how to cut at least 50 pages
- Index: currently (21 March 2021) done for selected (i.e., shorter) chapters, but not proofed or unified; need to do this for all chapters and then go back to fix on style, terms, etc.
(Text last updated 15 February 2024; this page last updated 15 January 2024)