Classical multi-level and Bayesian approaches to population size estimation using multiple lists

Draft

Stephen E. Fienberg

Matthew S. Johnson

Brian W. Junker

October 9, 1998

To be presented October 20, 1998 at the Royal Statistical Society Conference on Applications of Random Effects/Multilevel Models to Categorical Data in Social Sciences and Medicine.

One of the major objections to the standard multiple-recapture approach to population estimation is the assumption of homogeneity of individual ``capture'' probabilities. Modeling individual capture heterogeneity is complicated by the fact that it shows up as as a restricted form of interaction between lists in the contingency table cross-classifying list memberships for all individuals. Traditional log-linear modeling approaches to capture-recapture problems are well-suited to modeling interactions among lists, but ignore the special dependence structure that individual heterogeneity induces. A random-effects approach, based on the Rasch (1960) model from educational testing and introduced in this context by Darroch, et al. (1993) and Agresti (1994), provides one way to introduce the dependence resulting from heterogeneity into the log-linear model; however, previous efforts to combine the Rasch-like heterogeneity terms additively with the usual log-linear interaction terms suggest that a more flexible approach is required. In this paper we consider both classical multi-level approaches and fully Bayesian hierarchical approaches to modeling individual heterogeneity and list interactions. Our framework encompasses both the traditional log-linear approach and various elements from the full Rasch model. We compare these approaches on two examples, the first arising out of an epidemiological study of a population of diabetics in Italy, and the second a study intended to assess the ``size'' of the World Wide Web. We also explore extensions allowing for interactions between the Rasch and log-linear portions of the models in both the classical and Bayesian contexts.

Keywords: Log-linear models; Markov chain Monte Carlo methods; Multiple-recapture census; Quasi-symmetry; Rasch model.

View the draft paper in PDF format (for Adobe Acrobat and compatible previewers); or in PS format (for PostScript printers and previewers).
View the handout to accompany the talk in PDF format (for Adobe Acrobat and compatible previewers); or in PS format (for PostScript printers and previewers).

by Brian Junker

Brian Junker                    (412) 268 - 8873
Department of Statistics        brian@stat.cmu.edu
232 Baker Hall                  FAX: (412) CMU-STAT
Carnegie Mellon University        or (412) 268-7828
Pittsburgh PA 15213

brian@stat.cmu.edu