36-743/744: (Some) Statistical methods for reproducibility (2019)

(aka: How to stop you from fooling yourself)

Location: BH 235B (MW 10:30am)

Tentative course syllabus
Crowd-scribing sign-up sheet
Crowd-scribed class notes can be viewed here

Homework 1

Homework 2

Non-technical opinion/blurb: (Inspired by some recent conversations with a collaborator.) There are many differentiating factors between the scientific pursuit and the AI pursuit, but one of them is the following: in one, we are trying to create algorithms+data to do things that humans are very good at (speech, vision, math, etc), and in the other we are trying to use algorithms+data to explore areas that we have much less idea what the ground truth is (how the brain works, how the genetic system works, what’s there in outer space, etc). The first set are high SNR problems — the signal is very strong (the fact that humans can solve many of the language/vision/RL tasks easily tell us that there is a clear signal, even if we don’t yet know how to make machines capture that signal). The second set are typically very low SNR problems — the signal in genetic or brain or astronomical data is often weak and it is very hard for any human (or machine) to tell signal from noise. In such settings, when humans are poor judges, and signal and noise are easily confused, we need methods to stop us from fooling ourselves — to provide some guarantees that the discoveries we claim are likely to be correct (most of them at least) even without knowing the ground truth, and will stand up to further scrutiny by future scientists. This course will explore both classical and modern ideas related to how biases can creep into data analysis, how one can identify them and perhaps also quantify them and correct for them.

Technical blurb: Some topics that we will cover are multiple hypothesis testing, post-selection inference, selective and conditional inference, simultaneous inference, adaptive data analysis, interactive testing with a human-in-the-loop, doubly sequential experimentation, false discovery rate and false coverage rate. Some methods you may or may not have heard of: knockoffs, SLOPE, reusable holdout, Benjamini-Hochberg procedure, closed testing, differential privacy, confidence sequences and always valid p-values, data carving. The course will be somewhat mathematically rigorous (we will not necessarily cover many proofs in class, but we will care about methods that have provable error control and what assumptions they require). The central focus of the course is methodology: interesting algorithms to control a variety of error rates in different structured settings, and conceptual ideas that connect these algorithms and error rates together.

Who should take it: People working with scientific data of any kind could benefit. Folks doing research in these areas (or want to) are obviously welcome. Other curious people may also attend to learn more about these areas (but must audit).

Logistics: The class is from 10:30am-11:50am in Baker Hall 235B, on Mondays and Wednesdays. It is split into two minis (for some logistical reason). Feel free to audit the class if you don’t want to credit it (but you must register for the class, no sit-ins allowed unless you are a faculty member). It is a PhD level class, but it is not really targeted at first year students, unless you already have achieved a first year PhD level background in statistics prior to CMU.

Prerequisites: Everyone must be familiar with p-values, confidence intervals, their duality, and more generally should have taken at least one intermediate statistics class.

Tentative lectures [Mini 1: Static settings]

L1 (Aug 26): Motivation, defining reproducibility and replicability, computational and systemic issues
L2 (Aug 28): The jellybeans-acne case study: p-values, multiple testing, global-null, FWER, FDR (and more)
L3 (Sep 2): no class (Labor day)
L4 (Sep 4): Global null testing using Fisher, Stouffer, Simes, higher criticism, Vovk (and more)
L5 (Sep 9): Algorithms for FWER control like Bonferroni, Sidak, Holm (and more)
L6 (Sep 11): Coherence, consonance, the principles of closure, partitioning and sequential rejection
L7 (Sep 16): Special topics: gatekeeping and graphical procedures, adapting to dependence using permutation testing
L8 (Sep 18): FDR (BHY, proof under independence, PRDS and arbitrary dependence)
L9 (Sep 23): Weights and null-proportion adaptivity (prior information)
L10 (Sep 25): The Empirical Bayes perspective and local FDR methods
L11 (Sep 30): From p-hacking to interval-hacking: false coverage rate, false sign rate
L12 (Oct 2): The variable selection problem: a solution using knockoffs, model-X knockoffs
L13 (Oct 7): Interactive multiple testing using data carving
L14 (Oct 9): Special topic TBD or spillover class
L15 (Oct 14): Reading day: historical perspectives, recap and summary
L16 (Oct 16): In-class exam

Tentative lectures [Mini 2: Dynamic settings]

L17 (Oct 21): The peeking problem: power-posing, and sequential type-1 error
L18 (Oct 23): The LIL (law of the iterated logarithm) and nonasymptotic variants, confidence sequences
L19 (Oct 28): The doubly-sequential setting: online control of error rates
L20 (Oct 30): The lasso post-selection inference problem: conditional inference using the polyhedral lemma
L21 (Nov 4): Simultaneous inference using the PoSI framework
L22 (Nov 6): The post-hoc inference problem: closed testing to the rescue
L23 (Nov 11): Overfitting the test set and the Kaggle leaderboard paradox: a solution using the reusable holdout
L24 (Nov 13): Overfitting by adaptively querying a database: a solution using differential privacy (adaptive data analysis)
L25 (Nov 18): The garden of forking paths
L26 (Nov 20): A generalized-Kaggle solution by revealing only single bits
L27 (Nov 25): Selection bias in adaptive sequential experimentation
L28 (Nov 27): no class (Thanksgiving)
L29 (Dec 2): Special topic TBD or spillover class
L30 (Dec 4): Fresh data: the universal solution, recap and summary
L31 (Dec 9): In-class exam