References

Recommended Books

Statistics for High-Dimensional Data: Methods, Theory and Applications, by P. Buhlman and S. van de Geer, Springer, 2011.
Statistical Learning with Sparsity: The Lasso and Generalizations, by T. Hastie, R. Tibshirani and M Wainwright, Chapman & Hall, 2015.
Introduction to High-Dimensional Statistics, by C. Giraud, Chapman & Hall, 2015.
Testing Statistical Hypotheses, by Lehmann and Romano, 2005, Spinger, 3rd Edition.
Asymptotic Statistics, by A. van der Vaart, Springer, 2000.
Concentration Inequalities: A Nonasymptotic Theory of Independencei, by S. Boucheron, G. Lugosi and P. Massart, Oxford University Press, 2013.
Rigollet, P. (2015) High-Dimensional Statistics - Lecture Notes Lecture Notes for the MIT course 18.S997.

Lecture 1, Mon Aug 29

To read more about what I referred to as the "master theorem on the asymptotics of parametric models" see these notes by Jon Wellner. In particular, I highly recommend looking at the excellent notes he made for the sequence of three classes on theoretical statistics he has been teaching at the Unievrsity of Washington.

Parameter consistency and central limit theorems for models with increasing dimension d (but still d < n):

Wasserman, L, Kolar, M. and Rinaldo, A. (2014). Berry-Esseen bounds for estimating undirected graphs, Electronic Journal of Statistics, 8(1), 1188-1224.
Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters, the Annals of Statistics, 32(3), 928-961.
Portnoy, S. (1984). Asymptotic Behavior of M-Estimators of p Regression, Parameters when p^2/n is Large. I. Consistency, tha Annals of Statistics, 12(4), 1298--1309.
Portnoy, S. (1985). Asymptotic Behavior of M Estimators of p Regression Parameters when p^2/n is Large; II. Normal Approximation, the Annals of Statistics, 13(4), 1403-1417.
Portnoy, S. (1988). Asymptotic Behavior of Likelihood Methods for Exponential Families when the Number of Parameters Tends to Infinity, tha Annals of Statistics, 16(1), 356-366.

Some central limit theorem results in increasing dimension (in the second mini we will see more specialized and stronger results).

Chernozhukov, V., Chetverikov, D. and Kato, K. (2016). Central Limit Theorems and Bootstrap in High Dimensions, arxiv
Bentkus, V. (2003). On the dependence of the Berry–Esseen bound on dimension, Journal of Statistical Planning and Inference, 113, 385-402.
Portnoy, S. (1986). On the central limit theorem in R p when $p \rightarrow \infty$, Probability Theory and Related Fields, 73(4), 571-583.

Lecture 2, Wed Aug 31

Some references to concentration inequalities:

Concentration Inequalities: A Nonasymptotic Theory of Independencei, by S. Boucheron, G. Lugosi and P. Massart, Oxford University Press, 2013.
Concentration Inequalities and Model Selection, by P. Massart, Springer Lecture Notes in Mathematics, vol 1605, 2007.
The Concentration of Measure Phenomenon, by M. Ledoux, 2005, AMS.
Concentration of Measure for the Analysis of Randomized Algorithms, by D.P. Dubhashi and A, Panconesi, Cambridge University Press, 2012.
R. Vershynin, Introduction to the non-asymptotic analysis of random matrices. In: Compressed Sensing: Theory and Applications, eds. Yonina Eldar and Gitta Kutyniok. Cambridge University Press

For a comprehensive treatment of sub-gaussian variables and processes (and more) see:

Metric Characterization of Random Variables and Random Processes, by V. V. Buldygin, AMS, 2000.
Introduction to the non-asymptotic analysis of random matrices, by R. Vershynin, Chapter 5 of: Compressed Sensing, Theory and Applications. Edited by Y. Eldar and G. Kutyniok. Cambridge University Press, 210–268, 2012. pdf

References for Chernoff bounds for Bernoulli (and their multiplicative forms):

Check out the Wikipedia page.
A guided tour of chernoff bounds, by T. Hagerup and C. R\"{u}b, Information and Processing Letters, 33(6), 305--308, 1990.
Chapter 4 of the book Probability and Computing: Randomized Algorithms and Probabilistic Analysis, by M. Mitzenmacher and E. Upfal, Cambridge University Press, 2005.
The Probabilistic Method, 3rd Edition, by N. Alon and J. H. Spencer, Wiley, 2008, Appendix A.1.

Finally, here is the traditional bound on the mgf of a centered bounded random variable (due to Hoeffding), implying that bounded centered variables are sub-Guassian. It should be compared to the proof given in class.

Lecture 4, Mon Sep 12

For an example of the improvement afforded by Bernstein versus Hoeffding, see Theorem 7.1 of

László Györfi, Michael Kohler, Adam Krzyżak, Harro Walk (2002). A Distribution-Free Theory of Nonparametric Regression, Springer.

available here. By the way, this is an excellent book. For details on the derivation of concentration inequality for quadratic forms of Gaussians, see

Example 2.12 in Concentration Inequalities: A Nonasymptotic Theory of Independencei, by S. Boucheron, G. Lugosi and P. Massart, Oxford University Press, 2013.
Lemma 1 in Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection, Annals of Statistics, 28(5), 1302-1338.

For the Hanson-Wright inequality, see

Rudelson, M., and Vershynin, R. (2013). Hanson-Wright inequality and sub-gaussian concentration. Electron. Commun. Probab., 18(82), 1- 9.

I strongly encourage to read the paper!

Lecture 7, Wed Sep 21

To read up about matrix concentration inequalities, I recommend:

Tropp, J. (2012). User-friendly tail bounds for sums of random matrices, Found. Comput. Math., Vol. 12, num. 4, pp. 389-434, 2012.
Tropp, J. (2015). An Introduction to Matrix Concentration Inequalities, Found. Trends Mach. Learning, Vol. 8, num. 1-2, pp. 1-230

An excellent paper on the linear regression model. Recall: you almost never can make the assumption of linearity and the X is random!!

Andreas Buja, Richard Berk, Lawrence Brown, Edward George, Emil Pitkin, Mikhail Traskin, Linda Zhao and Kai Zhang (2015). Models as Approximations — A Conspiracy of Random Regressors and Model Deviations Against Classical Inference in Regression. pdf

Lecture 9, Wed Sep 28

To read about ridge regression and lasso-type estimators a good reference is

Fu, W. and Knight, K. (2000). Asymptotics for lasso-type estimators, The Annals of Statistics, 8(5), 1356-1378.

About uniqueness of the lasso (and other interesting properties):

Tibshirani, R. (2013). The lasso problem and uniqueness, EJS, 7, 1456-1490.

For the use of cross validation in selecting the lasso parameter see:

Homrighausen, D. and McDonald, D. (2013). The lasso, persistence, and cross-validation,” Proceedings of the 30th International Conference on Machine Learning, JMLR W&CP, 28. pdf
Homrighausen, D. and McDonald, D. (2013b). Risk consistency of cross-validation with Lasso- type procedures. arxiv:1308.0810.
Chatterjee, S. and Jafarov, J. (2015). Prediction error of cross-validated Lasso, arxiv:1502.06291
Chetverikov, D. and Liao Z. (2016). On cross-validated Lasso, arxiv:1605.02214

And for the one standard error rule, which seems to work well in practice (but apparently has no theoretical justification), see these lecture by Ryan Tibshirani: pdf and pdf.

Lecture 10, Wed Oct 5

For further references on rates for the lasso, restricted eigenvalue conditions, oracle inequalities, etc, see

Statistics for High-Dimensional Data: Methods, Theory and Applications, by P. Buhlman and S. van de Geer, Springer, 2011. Chapter 6 and Chapter 7.
Belloni A., Chernozhukov, D> and Hansen C. (2010) Inference for High-Dimensional Sparse Econometric Models, Advances in Economics and Econometrics, ES World Congress 2010, arxiv link
Bickel, P. J., Y. Ritov, and A. B. Tsybakov (2009), Simultaneous analysis of Lasso and Dantzig selector, Annals of Statistics, 37(4), 1705–1732.

Someone asked about references for selective inference. Here is a nicely compiled list of papers from the WHOA-PSI 2016 website, a very recent conference on this topic.