36-465/665, Spring 2021
25 March 2021 (Lecture 15)
\[ \newcommand{\Prob}[1]{\mathbb{P}\left( #1 \right)} \newcommand{\Expect}[1]{\mathbb{E}\left[ #1 \right]} \newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]} \newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]} \newcommand{\Risk}{r} \newcommand{\EmpRisk}{\hat{\Risk}} \newcommand{\Loss}{\ell} \newcommand{\OptimalStrategy}{\sigma} \DeclareMathOperator*{\argmin}{argmin} \newcommand{\ModelClass}{S} \newcommand{\OptimalModel}{s^*} \DeclareMathOperator{\tr}{tr} \newcommand{\Indicator}[1]{\mathbb{1}\left\{ #1 \right\}} \newcommand{\myexp}[1]{\exp{\left( #1 \right)}} \newcommand{\eqdist}{\stackrel{d}{=}} \newcommand{\Rademacher}{\mathcal{R}} \newcommand{\EmpRademacher}{\hat{\Rademacher}} \newcommand{\Growth}{\Pi} \newcommand{\VCD}{\mathrm{VCdim}} \newcommand{\OptDomain}{\Theta} \newcommand{\OptDim}{p} \newcommand{\optimand}{\theta} \newcommand{\altoptimand}{\optimand^{\prime}} \newcommand{\ObjFunc}{M} \newcommand{\outputoptimand}{\optimand_{\mathrm{out}}} \newcommand{\Hessian}{\mathbf{h}} \newcommand{\Penalty}{\Omega} \newcommand{\Lagrangian}{\mathcal{L}} \newcommand{\HoldoutRisk}{\tilde{\Risk}} \]
Pick \(\hat{k}\) by data splitting. Suppose the loss function is bounded, \(0 \leq \Loss \leq m\). Then, for any probability \(\alpha \in (0,1)\), \[ \Prob{\Risk(\hat{s}_{\hat{k}}) \leq \Risk(\hat{s}_{k^*}) + m\sqrt{\frac{2\log{(2q/\alpha)}}{n_s}}} \geq 1-\alpha \]
Akaike, Hirotugu. 1970. “Statistical Predictor Identification.” Annals of the Institute of Statistical Mathematics 22:203–17. https://doi.org/10.1007/BF02506337.
———. 1973. “Information Theory and an Extension of the Maximum Likelihood Principle.” In Proceedings of the Scond International Symposium on Information Theory, edited by B. N. Petrov and F. Caski, 267–81. Budapest: Akademiai Kiado.
Arlot, Sylvain, and Alain Celisse. 2010. “A Survey of Cross-Validation Procedures for Model Selection.” Statistics Surveys 4:40–79. https://doi.org/10.1214/09-SS054.
Claeskens, Gerda, and Nils Lid Hjort. 2008. Model Selection and Model Averaging. Cambridge, England: Cambridge University Press.
Cornec, Matthieu. 2017. “Concentration Inequalities of the Cross-Validation Estimator for Empirical Risk Minimizer.” Statistics 51:43–60. https://doi.org/10.1080/02331888.2016.1261479.
Geisser, Seymour. 1975. “The Predictive Sample Reuse Method with Applications.” Journal of the American Statistical Association 70:320–28. https://doi.org/10.1080/01621459.1975.10479865.
Geisser, Seymour, and William F. Eddy. 1979. “A Predictive Approach to Model Selection.” Journal of the American Statistical Association 74:153–60. https://doi.org/10.1080/01621459.1979.10481632.
Györfi, László, Michael Kohler, Adam Krzyżak, and Harro Walk. 2002. A Distribution-Free Theory of Nonparametric Regression. New York: Springer-Verlag.
Homrighausen, Darren, and Daniel J. McDonald. 2013. “The Lasso, Persistence, and Cross-Validation.” In Proceedings of the \(30^{th}\) International Conference on Machine Learning, edited by Sanjoy Dasgupta and David McAllester, 28:1031–9. http://jmlr.org/proceedings/papers/v28/homrighausen13.html.
———. 2014. “Leave-One-Out Cross-Validation Is Risk Consistent for Lasso.” Machine Learning 97:65–78. https://doi.org/10.1007/s10994-014-5438-z.
———. 2017. “Risk Consistency of Cross-Validation with Lasso-Type Procedures.” Statistica Sinica 27:1017–36. https://doi.org/10.5705/ss.202015.0355.
Kearns, Michael J., and Dana Ron. 1999. “Algorithmic Stability and Sanity-Check Bounds for Leave-One-Out Cross-Validation.” Neural Computation 11:1427–53. https://doi.org/10.1162/089976699300016304.
Laan, Mark J. van der, and Sandrine Dudoit. 2003. “Unified Cross-Validation Methodology for Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples.” 130. U.C. Berkeley Division of Biostatistics Working Paper Series. http://www.bepress.com/ucbbiostat/paper130/.
Lecué, Guillaume, and Charles Mitchell. 2012. “Oracle Inequalities for Cross-Validation Type Procedures.” Electronic Journal of Statistics 6:1803–37. https://doi.org/10.1214/12-EJS730.
Mitchell, Charles, and Sara van de Geer. 2009. “General Oracle Inequalities for Model Selection.” Electronic Journal of Statistics 3:176–204. https://doi.org/10.1214/08-EJS254.
Schwarz, Gideon. 1978. “Estimating the Dimension of a Model.” Annals of Statistics 6:461–64. http://projecteuclid.org/euclid.aos/1176344136.
Stone, M. 1974. “Cross-Validatory Choice and Assessment of Statistical Predictions.” Journal of the Royal Statistical Society B 36:111–47. http://www.jstor.org/stable/2984809.
Tibshirani, Ryan J., and Robert Tibshirani. 2009. “A Bias Correction for the Minimum Error Rate in Cross-Validation.” Annals of Applied Statistics 3:822–29. http://arxiv.org/abs/0908.2904.
Vaart, Aad W. van der, Sandrine Dudoit, and Mark J. van der Laan. 2006. “Oracle Inequalities for Multi-Fold Cross Validation.” Statistics and Decisions 24:1001–21. https://doi.org/10.1524/stnd.2006/24.3.351.
Wahba, Grace. 1990. Spline Models for Observational Data. Philadelphia: Society for Industrial; Applied Mathematics.
White, Halbert. 1994. Estimation, Inference and Specification Analysis. Cambridge, England: Cambridge University Press.