References
Books, Lecture Notes, Expository Articles
- Concentration Inequalities: A Nonasymptotic Theory of
Independencei, by S. Boucheron, G. Lugosi and P. Massart, Oxford
University Press, 2013.
- Concentration Inequalities and Model Selection, by P. Massart,
Springer Lecture Notes in Mathematics, vol 1605, 2007.
- The Concentration of Measure Phenomenon, by M. Ledoux, 2005,
AMS.
- Lecture notes on concentration of measure
Inequalities, by G. Lugosi, pdf.
- A Probabilistic Theory of Pattern
Recognition, by L. Devroye, L. Gyöfi and
G. Lugosi, Springer, 1996 pdf.
- Concentration of Measure Inequalities
in Information Theory, Communications
and Coding, by M. Raginski and I. Sason,
Foundations and Trends
in Communications and Information
Theory, 2014.
- A New Look at Independence -- Special
Invited Paper, by M.
Talagrand, the Annals of Applied
Probability, 24(1),1--34, 1996.
- Concentration of Measure for the Analysis of Randomized
Algorithms, by D.P. Dubhashi and A, Panconesi, Cambridge
University Press, 2012.
- Concentration by C. McDiarmid. In M. Habib. C.
McDriamid, J.Ramirez-Alfonsin and B. Reed, editors,
Probabilistic Methods for Algorithmic Discrete
Mathematics, 195--248, Springer, 1998.
- R. Vershynin, Introduction to the non-asymptotic
analysis of random matrices. In: Compressed Sensing:
Theory and Applications, eds. Yonina Eldar and Gitta
Kutyniok. Cambridge University Press,
- Measure Concentration, Lecture Notes for Math 710, by
Alexander Barvinok, 2005. pdf
- Probability Theory and Combinatorial Optimization,
by M. Steele, CBMS Regionel Conference Series in
Applied Mathematics, vol. 69, 1997.
Lecture 1, Tue Sep 1
|
Some applications of concentration inequalities in high-dimensional statistics:
- Theorem 1 in M. J.
Wainwright (2009), Sharp
thresholds for noisy and
high-dimensional recovery of
sparsity using
$\ell_1$-constrained
quadratic programming
(Lasso). IEEE Transactions
on Information Theory,
55:2183--2202.
- Theorem 5.2 in A simple proof of the
restricted isometry
property for random
matrices
R Baraniuk, M Davenport,
R DeVore, M Wakin
Constructive
Approximation 28 (3),
253-263
- For the covariance
estimation example
see HW1.
Parameter consistency and central limit theorems for models with increasing
dimension d (but still d < n):
- Wasserman, L, Kolar, M. and Rinaldo, A. (2014). Berry-Esseen bounds for
estimating undirected graphs, Electronic Journal of Statistics, 8(1),
1188-1224.
- Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a
diverging number of parameters, the Annals of Statistics, 32(3),
928-961.
- Portnoy, S. (1984). Asymptotic Behavior of M-Estimators of p
Regression,
Parameters when p^2/n is Large. I. Consistency, tha Annals of
Statistics, 12(4), 1298--1309.
- Portnoy, S. (1985). Asymptotic Behavior of M Estimators of p Regression Parameters
when p^2/n is Large; II. Normal Approximation, the Annals of
Statistics, 13(4), 1403-1417.
- Portnoy, S. (1988). Asymptotic Behavior of Likelihood Methods
for Exponential Families when the Number of Parameters Tends to
Infinity, tha Annals of Statistics, 16(1), 356-366.
Some central limit theorem results in increasing dimension (in the second mini we will
see more specialized and stronger results).
- Bentkus, V. (2003). On the dependence of the Berry–Esseen bound on
dimension, Journal of Statistical Planning and Inference, 113,
385-402.
- Portnoy, S. (1986). On the central limit theorem in R p when
$p \rightarrow \infty$, Probability Theory and Related Fields,
73(4), 571-583.
Lecture 2, Thu Sep 3
|
For those interested in the mathematical theory of the conecntration of
measure, these are good references:
- The Concentration of Measure Phenomenon, by M. Ledoux, 2005,
AMS.
- Measure Concentration, Lecture Notes for Math 710, by
Alexander Barvinok, 2005. pdf
- Concentration of measure, lecture notes by N. Berestycki and R.
Nickl pdf
Lecture 3, Tue Sep 8
|
Most of the material on sub-gaussianity was taken from here:
- Omar Rivasplata, Subgaussian random variables: An expository
note,
Sections
1, 2
and
3. pdf
- Lecture 6 of the course "Machine learning and appplications", by Z.
Harchaui, J. Mairal and J. Salmon, pdf
More comprehensive treatment of sub-gaussian variables and processes (and
more) are:
- Metric Characterization of Random
Variables and Random Processes, by V. V. Buldygin, AMS, 2000.
- Introduction to the non-asymptotic analysis of random matrices, by R.
Vershynin, Chapter 5 of: Compressed Sensing, Theory and Applications.
Edited by Y. Eldar and G. Kutyniok. Cambridge University Press,
210–268, 2012. pdf
References for Chernoff bounds for Bernoulli (and their multiplicative
forms):
- A guided tour of chernoff bounds, by T. Hagerup and C. R\"{u}b,
Information and Processing Letters, 33(6), 305--308, 1990.
- Chapter 4 of the book Probability and Computing: Randomized
Algorithms and Probabilistic Analysis, by M. Mitzenmacher and
E. Upfal, Cambridge University Press, 2005.
- The Probabilistic Method, 3rd Edition, by N. Alon and J. H.
Spencer, Wiley, 2008, Appendix A.1.
Improvement of Hoeffding's ineqaulity by Berend and Kantorivich:
- On the concentration of the missing mass, by D. Berend and A.
Kntorovich, Electron. Commun. Probab. 18 (3), 1–7, 2013.
- Section 2.2.4 in Raginski's monograph (see references at the
top).
Example of how the relative or multiplicative version of Chernoff
bounds will lead to substantial improvements:
-
Minimax-optimal classification with dyadic decision trees, by
C. Scott and R. Nowak, iEEE Transaction on Information Theory,
52(4), 1335-1353.
Lecture 4, Thu Sep 10
|
For more information on sub-exponential variables, consult:
- Metric Characterization of Random
Variables and Random Processes, by V. V. Buldygin, AMS, 2000.
Lecture 5, Tue Sep 15
|
Empirical Bernstein Inequality:
-
Empirical Bernstein Bounds and Sample-Variance Penalization
by: Andreas Maurer, Massimiliano Pontil
In COLT (2009)
Concentration of chi-squared variables:
- Lemma 1 in Adaptive estimation of a quadratic functional by model
selection, by B. Laurent and P. Massart, Annals of Statistics, 2000,
28(5), 1302-1338.
For classic proofs of Hoeffding, Bennet and Bernstein, see, e.g.,
- Chernoff, Hoeffding’s and Bennett’s Inequalities, a write-up by
Jimmy Jin, James Wilson and Andrew Nobel. pdf>
For an example of how Bernstein's inequality is preferable to Hoeffding,
see Lemma 13 in
- Minimax Rates for Homology Inference, by S. Balakrishnan, A.
Rinaldo, D. Sheehy, A. Singh and L. Wasserman, 2011, AISTATS.
Lecture 6, Thu Sep 17
|
Some references on JL Lemma and random projections:
- An Elementary Proof of a Theorem of Johnson and Lindenstrauss
S. Dasgupta and A. Gupta, Random Structures & Algorithms, 22(1),
60-65, 2008.
- D. Achlioptas, Database friendly random projections, Proc 20th ACM
Symp Principles of
Database Systems, 2001.
- P. Indyk and R. Motwani, Approximate nearest neighbors:
Towards removing the curse of dimensionality, Proc 30th Annu
ACM Symp Theory of Computing, Dallas, TX, 1998, pp. 604 – 613.
- The Fast Johnson–Lindenstrauss Transform and Approximate
Nearest Neighbors, N. Ailon and B. Chazelle, SIAM J.
Comput., 39(1), 302–322, 2009.
- Locality-sensitive hashing scheme based on p-stable
distributions, by M. Datar, N. Immorlica, P. Indyk, V.
Mirrokni, Proceeding
SCG '04 Proceedings of the twentieth annual symposium
on Computational geometry, 253-262, 2004.
On embedding of of finite metric spaces (of which the JL Lemma is an example):
- Chapter 15 of Lectures on Discrete Geometry, by G.
Matousek, 2002, Springer.
A nice, dense monograoh devoted to uses of random projections:
- The Random Projection Method, by S. Vampala. AMS,, 2004.
A sherpening of the JL Lemma, based on empirical process theory:
- Empirical processes and random projection Journal of Functional
Analysis, k. Klartag and S. Mendelson, 2005 225(1), 1 August 2005, Pages
229–245.
JL Lemma and the RIP condition in compressed sensing:
- The Restricted Isometry Property
and Its Implications for Compressed Sensing, E. Candes, Comptes Rendus
Mathematique, Vol. 346, No. 9-10
- Theorem 5.2 in A simple proof of the restricted isometry property for random
matrices R. Baraniuk, M. Davenport, R. DeVore, M Wakin, Constructive
Approximation 28 (3), 253-263
The JL Lemma and manifolds:
- Distance Preserving Embeddings for General n-Dimensional Manifolds, N.
Verma, JMLR, 14 (2013) 2415-2448.
- Tighter bounds for random projections of manifolds, K. Clarkson. Symposium on
Computational Geometry (SoCG), 24:39–49, 2008.
- Random projections of smooth
manifolds, R. Baraniuk and M. Wakin, Foundations of Computa- tional Mathematics (FoCM),
9(1):51–77, 2009.
And finally, a nice reading course:
Lecture 7, Thu Sep 22
|
The DKW inequality with the best constant:
- Massart, P. (1990), x"The tight constant in the
Dvoretzky–Kiefer–Wolfowitz inequalityx", The Annals of Probability 18
(3): 1269–1283
Lecture 8, Thu Sep 24
|
Ale out of town.
Lecture 9, Tue Sep 29
|
References for martingale methods to get concentration inequalities:
- Colin McDiarmid. Chapter titled Concentration in Probabilistic Methods for Algorithmic Discrete
Mathematics
Volume 16 of the series Algorithms and Combinatorics pp 195-248.
- Chung F. and L. Lu, Concentration inequalities and martingale
inequalities — a survey, 2006, pdf
- Section 2.2 in Raginski's monograph (see references at the
top).
Interesting improvements:
- E. Rio, On McDiarmid's concentration inequality
Emmanuel Rio, Electronic Communications in Probability, 18(44), 1-11.
- Kontorovich, A. (2014). Concentration in unbounded metric spaces and algorithmic stability.
ICML 2014: 28-36
- Sanon, I. (2011). On Refined Versions of the Azuma-Hoeffding Inequality
with Applications in Information Theory, pdf
Lecture 10, Thu Oct 1
|
References the entropy method
- Ledoux, M. (1997). On Talagrand’s deviation inequalities for
product measures. ESAIM Probability
and Statistics, 1, 63–87.
- Massart, P. (2000a). About the constants in Talagrand’s
concentration inequalities for empirical processes. The Annals of
Probability, 28, 863–884.
- S. Boucheron, G. Lugosi and P Massart: On concentration of
self-bounding functions. Electronic Journal of Probability.14 (2009)
1884-1899.
- Maurer, A. (2006). Concentration inequalities for functions of
independent variables. Random Structures & Algorithms, 29,
121–138.
- S. Boucheron and G. Lugosi and P. Massart: Concentration
inequalities using the entropy method (2003), Annals of
Probability, 31, 1583-1614.
- S Boucheron and G. Lugosi and P. Massart : A sharp
concentration inequality with applications Random Structure
and Algorithms 16 (2000), 277-292
Lecture 11, Tue Oct 6
|
References for Max's lectture:
-
Arratia, Goldstein, Gordon. Two moments suffice for Poisson
approximations: the Chen-Stein method, 1989
- Arratia, Goldstein, Gordon. Poisson Approximation and the Chen-Stein
Method, Statistical Science, 1990
Ledoux, M. (1997). On Talagrand’s deviation inequalities for
product measures. ESAIM Probability
and Statistics, 1, 63–87.
Lecture 12, Thu Oct 8
|
For today's lecture, the main references are:
- S. Boucheron and G. Lugosi and P. Massart: Concentration
inequalities using the entropy method (2003), Annals of
Probability, 31, 1583-1614.
- Maurer, A. (2006). Concentration inequalities for functions of
independent variables. Random Structures & Algorithms, 29,
121–138.
- A. Maurer (2012). Thermodynamics and concentration,
Bernoulli, 18(2), 434-454.
Lecture 13, Tue Oct 13
|
Same references as the previous lecture.
Lecture 13, Thu Oct 15
|
The basic references for matrix concentration inequalities are:
- Tropp, J. (2012).
User-friendly tail bounds for
sums of random matrices, Found.
Comput. Math., Vol. 12, num. 4,
pp. 389-434, 2012.
- Tropp, J. (2015). An
Introduction to Matrix
Concentration Inequalities,
Found. Trends Mach. Learning,
Vol. 8, num. 1-2, pp. 1-230
and references therein.
Lecture 14, Tue Oct 20
|
Same references as the last time.
Lecture 15, Tue Oct 22
|
- Hoeffding, W. (1963). Probability inequalities for sums of
bounded random variables. Journal of the American Statistical
Association 58 (301): 13–30.
- Hoeffding, W. (1948). A Class of Statistics with Asymptotically Normal
Distribution, The Annals of Mathematical Statistics, 3(3), 293-325.
|