Journal Club: Hot Ideas in Statistics

Statistics 36-825

Instructors:
Rob Tibshirani (tibs at stanford dot edu)
Ryan Tibshirani (ryantibs at cmu dot edu)

Class time: ~~Wednesdays 1-2:30pm, Baker 232M~~
Fridays 1:30-3pm, Porter A18C

Office hour: Mondays 1:30-2:30pm, Baker 228A or 229B

We will critically read and discuss hot papers in statistics. These papers can be new and potentially influential works, or they can be older important works that you may not have seen in other classes.

Participation

Each week, a pair of students will lead the presentation and discussion of the paper. Rough format: ~30 minutes paper summary, and ~50 minutes class discussion. Sign up here in pairs to lead one of the sessions.

In addition, the pair of students will produce scribed notes of their led session. This is to be submitted (by email to the instructors) no later than 1 week after the session. Rough format: overview of the paper, small simulations or examples if possible, comprehensive summary of the points made during the class discussion. Aim for 4-10 pages. Click here for the Latex template.

Paper list

Statistical Modeling: The Two Cultures by Leo Breiman, 2001

----- Golden Oldies -----
Regression Models and Life-Tables by David Cox, 1972
- Scribed notes:
- R code (Rob): cox.R
The Central Role of the Propensity Score in Observational Studies for Causal Effects by Paul Rosenbaum and Don Rubin, 1983
- Scribed notes (Mike and Jess): propensity.pdf, zipped source folder: propensity.zip
- R code (Mike and Jess): prop.R, R code (Rob): prop2.R
- Suggested further reading: Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data
----- False Discovery Rates -----
Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing by Yoav Benjamini and Yosef Hochberg, 1995
- Scribed notes (Philipp and Beau): fdr.pdf, zipped source folder: fdr.zip
- R code (Philipp and Beau): FDR shiny app
- R code (Ryan): fdr-test.R, fdr-funs.R; explanation: fdr-note.pdf
- Suggested further reading: A direct approaching to false discovery rates by John Storey, 2002
- Nice retrospective paper: Discovering the false discovery rate by Yoav Benjamini, 2010
Sequential Selection Procedures and False Discovery Rate Control by Max Grazier G'Sell et al., 2014
- Scribed notes (Mattia and Calvin): seqfdr.pdf, zipped source folder: seqfdr.zip
- R code (Mattia and Calvin): forstop.zip, glasso.zip
----- Testing and Correlation -----
Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis by Jeff Leek and John Storey, 2007
- R code (Ryan): sva-test.R
A Kernel Two-Sample Test by Arthur Gretton et al., 2012
- Scribed notes (Anthony and Sam): mmd.pdf, zipped source folder: mmd.zip
- Matlab code (Anthony and Sam): mmd-sim.zip
Brownian Distance Covariance by Gabor Szekely and Maria Rizzo, 2009
- Scribed notes (Giuseppe and Ben): dcov.pdf, zipped source folder: dcov.zip
- R code (Giuseppe and Ben): dcov-r.zip, Matlab code: dcov-matlab.zip
- Related earlier paper: A Consistent Test for Bivariate Dependence
- Interesting connection to MMD: Equivalence of Distance-Based and RKHS-Based Stastistics in Hypothesis Testing
----- High-dimensional Inference -----
Confidence Intervals and Hypothesis Testing for High-Dimensional Regression by Adel Javanmard and Andrea Montanari, 2014
- Scribed notes (Yu-Xiang and Yen-Chi): hdconf.pdf, zipped source folder: hdconf.zip
Exact-Post Selection Inference for Forward Stepwise and Least Angle Regression by Jonathan Taylor et al., 2014
- Scribed notes (Veeru and Jisu): spacing.pdf, zipped source folder: spacing.zip
Controlling the False Discovery Rate via Knockoffs by Rina Foygel Barber and Emmanuel Candes, 2014
- Scribed notes (Justin and Willie): knockoff.pdf, zipper source folder: knockoff.zip
- R code (Justin and Willie): knockoff-sim.zip
----- Supervised Learning -----
Stability Selection by Nicolai Meinshausen and Peter Buhlmann, 2010
- Scribed notes (Micol and Shiqiong): stability.pdf, zipper source folder: stability.zip
- R code (Micol and Shiqiong): stability.R
Dropout Training as Adaptive Regularization by Stefan Wager and Sida Wang and Persy Liang, 2013
- Scribed notes (Calvin and Kirstin): dropout.pdf, zipped source folder: dropout.zip
- Matlab code (Calvin and Kirstin): dropout-matlab.zip
Why Does Unsupervised Pre-training Help Deep Learning? by Dumitru Erhan et al., 2010
- Scribed notes (Avi, Mrinmaya, Jerzy): deep.pdf, zipped source folder: link
- Matlab code (Jerzy): link, simulation results (Avi, Mrinmaya, Jerzy): link
----- Reproducibility -----
Why Most Published Research Findings are False by John Ioannidis, 2005
- Interesting counter argument: Empirical Estimates Suggest Most Published Medical Research is True
  - Scribed notes (Dallas and Sashank): ioannidis.pdf, zipped source folder: ioannidis.zip
  - Link to IPython widget (Dallas and Sashank): http://nbviewer.ipython.org/github/dallascard/ioannidis_demo/blob/master/Ioannidis%20widgets.ipynb