The Truth About Linear Regression

This is basically a compilation of the lecture notes I wrote when teaching 36-401, Modern Regression, in fall 2015. I offer it here on the chance that it might be of interest to those learning, or teaching, linear regression. There's no shortage of resources on that, but I have tried to present the subject as though statistics had made some progress since 1960, de-emphasizing bits of theory which rely on Gaussian noise and correctly-specified linear models, in favor of more computationally-intensive, but robust, techniques. If anything, I did not go far enough.

The manuscript has some over-lap with Advanced Data Analysis from an Elementary Point of View (especially that book's second chapter, "The Truth About Linear Regression"), but also a lot of new and lower-level material. Comments and (especially) corrections are appreciated.

---Cosma Shalizi

Current outline

  1. Optimal Prediction
  2. Introducing Statistical Modeling
  3. Simple Linear Regression Models, with Hints at Their Estimation
  4. The Method of Least Squares for Simple Linear Regression
  5. The Method of Maximum Likelihood for Simple Linear Regression
  6. Diagnostics and Modifications for Simple Regression
  7. Inference on Parameters
  8. Predictive Inference for the Simple Linear Model
  9. Interpreting Parameters after Transformation
  10. F-Tests, R^2, and Other Distractions
  11. Simple Linear Regression in Matrix Format
  12. Multiple Linear Regression
  13. Diagnostics and Inference for Multiple Linear Regression
  14. Polynomial and Categorical Regression
  15. Multicollinearity
  16. Tests and Confidence Sets
  17. Interactions
  18. Outliers and Influential Points
  19. Model Selection
  20. Review
  21. Weighted and Generalized Least Squares
  22. Variable Selection
  23. Trees
  24. The Bootstrap I
  25. The Bootstrap II

(Last text update: typo corrections, 6 May 2024)