ch 7 variable selection

7.1 Evaluating subsets of predictors

7.1.1 R^2_adj

(interlude ML thy for regression - seems I might have done or alluded
to this before...)

(what might be more useful is a review of the LR test & the partial F
test...)

    - nested vs non-nested

7.1.2 AIC

* related to LR test
* p+2 vs p+1 (sigma^2 I assume?)

7.1.3 CAIC, corrected AIC ("magic")

7.1.4 BIC (just "magic")

7.1.5 comparison... "eh..."

7.2 "deciding the collection" ... really: organizing the search

     - nested vs non-nested...

7.2.1 All subsets

7.2.2 stepwise

      - backward vs forward

      - relative merits

7.2.3 Inference after selection - double-dipping

7.3 Cross-validation methods

two-sample methods protect against double-dipping
  (small sample effects may occur...)
k-fold produces a better estimate of prediction accuracy

7.4 lasso...

---------

Ch 6 of ISLR

Gelman-Hill recs p 68

---------

https://www.stata.com/support/faqs/statistics/stepwise-regression-problems/

https://www4.stat.ncsu.edu/~post/josh/LASSO_Ridge_Elastic_Net_-_Examples.html
  (bit lengthy)

https://www.analyticsvidhya.com/blog/2017/06/a-comprehensive-guide-for-linear-ridge-and-lasso-regression/
  (long but has the right details about regularization)

https://onlinecourses.science.psu.edu/stat501/node/330/
  (good description of R2adj and mallows' Cp)

---------

1. MSE-related measures

   MSE and in-sample error
   R2adj
   Mallows' Cp

   (both basically track MSE)

   problems with in-sample approaches -
     * capitalization on chance,
     * double-dipping for inference

2. Other in-sample methods

   F & t tests - we've seen those

   Likekihood ratio tests  - boils down to minimizing RSS

   penalized likelihood methods

   -2 LL + penality

   in regression:

   n*log(RSS/n) + penality

   AIC, CAIC, BIC

3. Typical methods

   - All subsets
   - stepwise (forward or backwards)

   Harrell warning

4. Penalized estimation methods

   recall LL ~ RSS

   RSS + (smoothing penality)

   ridge

   lasso

   elastic net

5. inference after fitting

   two-sample cross-validation

   k-fold cross-validation (on the training sample!)

   in-sample corrections (AIC, BIC, current research on lasso &
   related, to get good SE's of parameters etc.)

6. What do I really do?

   One can use automatic methods to suggest variables that might /
   might not be important but subject matter expertise rules

   Gelman & Hill recs

   The people you work with will expect
     (a) you are a high priest of variable selection and know the
         "right" way to do it - they will tend to accept anything you
         say 
     (b) you can provide them with statistical cover for whatever
         model they want
   neither is correct!