Ann B Lee

Ann B Lee

Professor, Co-Director of PhD Program in Statistics

Department of Statistics & Data Science / Machine Learning Department, Carnegie Mellon University

About me

I am a professor in the Department of Statistics & Data Science at Carnegie Mellon University, with a joint appointment in the Machine Learning Department. Prior to joining CMU, I was the J.W. Gibbs Assistant Professor in the department of mathematics at Yale University, and before that I served a year as a visiting research associate in the department of applied mathematics at Brown University.

My research interests are in developing statistical methodology for complex data and problems in the physical sciences. I am particularly interested in trust-worthy scientific inference and reliable uncertainty quantification, and in bridging classical statistics and machine learning for simulation-based inference and experimental design. My recent work includes likelihood-free inference, calibrated probabilistic forecasting, interpretable diagnostics of generative models, and applications in astronomy and hurricane intensity guidance involving satellite imagery and large surveys.

In 2018, I started the STAtistical Methods for Physical Sciences (STAMPS) research group together with Mikael Kuusela. STAMPS is hosting public colloquia-style webinars open to all members of the scientific community, in addition to weekly research group meetings for students and faculty at CMU and UPitt. In Fall 2024, STAMPS is becoming a CMU Research Center (public launch event on September 20th, 2024, TBA)

Interests

  • Scientific Machine Learning
  • Trust-Worthy UQ
  • Likelihood-Free Inference
  • Statistical Methods for the Physical Sciences

Education

  • PhD in Physics

    Brown University

  • MSc/BSc in Engineering Physics

    Chalmers University of Technology, Sweden

News & Events

Recent Papers

(2024). Classification under Nuisance Parameters and Generalized Label Shift in Likelihood-Free Inference. Proceedings of the Forty-First International Conference on Machine Learning (ICML 2024), PMLR 235, 2024.

Preprint

(2022). Simulator-Based Inference with WALDO: Confidence Regions by Leveraging Prediction Algorithms and Posterior Estimators for Inverse Problems. Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS 2023), PMLR 206:2960-2974, 2023. (Finalist at the ASA SPES and Q&P Student Paper Competition).

Preprint PDF Code

(2022). Detecting Distributional Differences in Labeled Sequence Data with Application to Tropical Cyclone Satellite Imagery. Annals of Applied Statistics 17(2):1260-1284, June 2023. (Selected for The Best of AOAS invited paper session at JSM 2023).

Preprint DOI

(2021). Diagnostics for Conditional Density Models and Bayesian Inference Algorithms. Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021). PMLR 161:1830-1840, 2021.

Preprint PDF

(2020). Wildfire Smoke and Air Quality: How Machine Learning Can Guide Forest Management. Tackling Climate Change with Machine Learning workshop at NeurIPS 2020 (Spotlight talk).

Preprint Slides Video

(2020). Confidence Sets and Hypothesis Testing in a Likelihood-Free Inference Setting. Proceedings of the Thirty-Seventh International Conference on Machine Learning (ICML 2020), PMLR 119:2323-2334, 2020.

Preprint PDF

Talks

(some recorded)

  • Calibrated Uncertainty Quantification in Simulator-Based Inference” at Hammers & Nails: Frontiers in Machine Learning in Cosmology, Astro & Particle Physics, Ascona, November 2, 2023. Slides.
  • Detecting Distributional Differences in Labeled Sequences of Tropical Cyclone Satellite Imagery", “Best of AOAS” invited session, Joint Statistical Meeting, Toronto, August 9, 2023. Slides.
  • 2-Sample and GoF Testing via Regression” at PHYSTAT-2samples workshop, June 2, 2023. Slides. Video recording.
  • Likelihood-Free Frequentist Inference: Confidence Sets with Correct Conditional Coverage", ISSI-STAMPS joint seminar with discussant Minge Xie (Rutgers University), June 16, 2022. Poster. Slides. Video recording.

Workshops

Some recent workshops in Stats/ML for physics that I’ve co-organized:

Group

I coordinate the STAtistical Methods for the Physical Sciences (STAMPS) Research Group at CMU together with Mikael Kuusela.

I am fortunate to advise the following amazing students:

Current PhD Students

Luca Masserano (thesis) James Carzon (thesis) Alex Shen (thesis)
Antonio Carlos Herling Ribeiro Junior (project)

Alumni & Collaborators

Biprateep Dey (Pitt Physics & Astronomy) Tria McNeely (Microsoft, PhD 2022)

PhD Graduates

  • David Zhao
    – PhD May 2023, Department of Statistics & Data Science and MLD, CMU
    – Thesis title: Calibrated Conditional Density Models and Predictive Inference via Local Diagnostics

  • Trey (Tria) McNeely
    – PhD June 2022, Department of Statistics & Data Science, CMU
    – Thesis title: Quantifying Spatio-temporal Convective Structure in Tropical Cyclones

  • Niccolò (Nic) Dalmasso
    – PhD May 2021, Department of Statistics & Data Science, CMU
    – Thesis title: Uncertainty Quantification in Simulation-based Inference
    – 2021 ASA Student of the Year, Pittsburgh Chapter

  • Taylor Pospisil
    – PhD May 2019, Department of Statistics & Data Science, CMU
    – Thesis title: Conditional Density Estimation for Regression and Likelihood-Free Inference

  • Rafael Izbicki
    – PhD April 2014, Department of Statistics, CMU
    – Thesis title: A Spectral Series Approach to High-Dimensional Nonparametric Inference
    – 2014 Best Thesis Award, Department of Statistics, CMU

  • Di Liu
    – PhD July 2012, Department of Statistics, CMU
    – Thesis title: Comparing Data Sources in High Dimensions

  • Andrew Crossett
    – co-advised with Kathryn Roeder
    – PhD May 2012, Department of Statistics, CMU
    – Thesis title: Using Dimension Reduction Techniques to Model Genetic Relationships for Association Studies

  • Susan Buchman
    – co-advised with Chad Schafer
    – PhD March 2011, Department of Statistics, CMU
    – Thesis title: High-Dimensional Adaptive Basis Density Estimation

  • Joseph W. Richards
    – co-advised with Chad Schafer
    – PhD July 2010, Department of Statistics, CMU
    – Thesis title: Fast and Accurate Estimation for Astrophysical Problems in Large Databases
    – 2010 ASA Student of the Year, Pittsburgh Chapter

  • Diana Luca
    – co-advised with Kathryn Roeder
    – PhD Sept 2008, Department of Statistics, CMU
    – Thesis title: Genetic Matching by Ancestry in Genome-Wide Association Studies

Teaching

  • Probability and Mathematical Statistics (STAT 36-700). Fall 2024.
  • Regression Analysis (STAT 36-707). Fall 2021, 2023.
  • Modern Ideas in Statistics and AI for Climate and Environmental Sciences (STAT 36-722). Spring 2021.
  • Advanced Methods for Data Analysis (STAT 36-402/608). Spring 2017-2023.
  • Modern Regression (STAT 36-401/607). Fall 2018, 2022.
  • Advanced Data Analysis II (STAT 36-758). Fall 2015-2017.
  • Mathematical Statistics Honors (STAT 36-326). Spring 2014-2016.
  • Probability and Statistics I (STAT 36-625). Fall 2005-2007, 2013-2014.
  • Statistical Practice (STAT 36-726). Spring 2012, 2016.
  • Engineering Statistics and Quality Control (STAT 36-220). Fall 2010-2011.
  • Machine Learning Journal Club (ML 10-915), Machine Learning Department, CMU. Fall 2009-2010.
  • Probability and Statistics II (STAT 36-626). Spring 2006-2008, 2010.
  • Probability and Statistics for Business Applications (STAT 36-207). Fall 2009.
  • Applied Mathematics and Engineering I (AMTH 251), Yale University. Fall 2003, 2004.
  • Introduction to Calculus in Several Variables (MATH 118), Yale University. Spring 2004.
  • Pattern Theory and its Applications (STAT 2), 12th Jyväskylä Ph.D. Summer School, Aug 2002, Finland.

Contact