Sam Ventura

I earned my Ph.D. in Statistics from Carnegie Mellon University in 2015, after which I was hired as a visiting assistant professor for 36-217: Probability Theory and Random Process in the Statistics department. I am president-elect, American Statistical Association Pittsburgh Chapter; co-creator and alumnus, statistical analysis in ice hockey WAR On Ice; consultant, Pittsburgh Penguins professional hockey club; faculty advisor, Tartan Sports Analytics Club; assistant coach, Carnegie Mellon University Hockey Club; and 2015 winner of the "Turkey For A Day" charity competition.

Lab Groups

Specific Research Interests

Broadly, I am interested in prediction, clustering, and statistical computing (including software design). For the Census research group, I am working on problems involving hierarchical clustering and classification/prediction. I am interested in prediction with ensembles (in the classification setting) and clustering with a distribution of dissimilarity estimates. The specific application area of these methods is in "record linkage" (or "deduplication", "entity resolution", "disambiguation", etc), where I am also interested in designing open-source software (R packages). NSF Census Research Network ( I am also part of the "Models of Infectious Disease Agent Study" (MIDAS) group at CMU and Pitt. We design software for generating synthetic ecosystems of different populations, for use in agent-based modeling. The goal is to model the spread of (new? deadly?) infectious diseases in a given population ( Finally, my other research areas include statistics in sports and model-based clustering.