First I will list a couple of websites. Then I will list a couple of papers. In my comments for the papers are some suggestions about how to use lasso in practice, using library(glmnet) in R. The same R practices apply to ridge regression and elasticnet. You can get a quick idea of how and why the lasso works from the following two websites: https://stats.stackexchange.com/questions/74542/why-does-the-lasso-provide-variable-selection This gives a nice "calculus" explanation of why lasso forces some coefficients to be zero if lambda is large enough, using simple regression (y = b0 + b1 x + epsilon); for the explanation when p>1, the Tibshirani paper below is useful. https://newonlinecourses.science.psu.edu/stat508/lesson/5/5.4 This page gives a nice overview of lasso, and explains a little about how the geometry of L1 for the lasso (vs L2 for ridge regression) forces some coefficients to be zero. It is more intuitive than the prev website, but also more in line with the mathematics (Tibshirani paper below) that is actually needed when p>1. The webpage also gives some hints about how one could construct standard errors for beta-hat's estimated by lasso (Park & Casella's, 2008, "bayesian lasso" seems best to me, although a complete recipe is not given), and an extension called "group lasso" (Yuan & Lin, 2007) which could work with categorical variables. You can learn much more by googling * why does the lasso work * how does the lasso work The basic papers/books on lasso are here: Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, Vol 58, 1, 267-¡V288. The "orignal paper". A basic idea, which I did not go into in class, is that predictions using beta-hats from the lasso can have lower MSE for predicting new observations, than using least-squares beta-hats. https://web.stanford.edu/~hastie/glmnet/glmnet_alpha.html A quick tutorial on the glmnet package. If you are more interested in variable selection per se (because you want a few good models to discuss with a client or collaborator in order to choose the "scientifically best one"), then the approach in the class notes is better, i.e. just use a few different values of lambda to select a few different models to discuss with your collaborator. Using library(glmnet) you might do something like this: lasso.fits <- glmnet(Xmatrix,Yvector,alpha=1) # alpha=1 for lasso plot(lasso.fits,xvar="lambda") # to get a visualization abline(h=0,lty=2) # helpful guideline coef(lasso.fits) # coefficients for "all" values of lambda coef(lasso.fits,s=30) # coefficients for lambda=s (30, in this case) predict(lasso.fits,s=30,newx=Xmatrix.new) # predictions based on beta-hats at lambda=s If you are more interested in prediction, then using the beta-hats from lasso is definitely better. In that case you want the "single best lambda", which you can choose with cross-validation. Using library(glmnet) you might do something like this: cv.choice < cv.glmnet(Xmatrix,Yvector,alpha=1) # alpha=1 for lasso plot(cv.choice) # to get a visualization cv.choice$lambda.min # to get the lambda that minimized cv mse cv.choice$lambda.1se # to get the lambda 1 se above lambda.min # (this choice may guard against overfitting) coef(cv.choice,s=30) # to get beta-hats at lambda=s (s=30 here) predict(cv.choice,newx=Xmatrix.new) # to make predictions using # the best lasso beta-hats predict(cv.choice,s=30,newx=Xmatrix.new) # to make predictions using # beta-hats at lambda=s Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, 67, 2, 301-¡V320. Introduces "elasticnet" which combines lasso and ridge penalties (for glmet, alpha between 0 and 1). For problems with high colinearity among the X's (and even when p>n) elasticnet can priduce lower prediction mse than lasso. Friedman, J., Tibshirani, R., and Hastie, T. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33, 1, https://www.jstatsoft.org/v33/i01 lasso and related ideas for generalized linear models (logistic regression, poisson regression, etc.). Basically all about the guts and uses of library(glmnet) Hastie, T., Tibshirani, R., and Friedman, J. (2008). The Elements of Statistical Learning, 2nd edition. Springer, New York. great book. you can get a free pdf from one of the authors' websites (I think it's Hastie, but I forget...) --------------------------------------------------------------