The following abstracts have been accepted and will be part of the poster session at the Bayesian Workshop.

NAMETITLE
Jean-Francois Angers
Atanu Biswas
A Bayesian analysis of the four-year follow-up data of the Wisconsin epidemiologic study of diabetic retinopathy
Sudipto Banerjee
Bradley P. Carlin
Hierarchical Semiparametric Proportional Hazards Models for Spatially Correlated Survival Data
Michael Baron Bayes and asymptotically pointwise optimal stopping rules for the detection of influenza epidemics
Sam Behseta and Robert E. Kass Bayes Factors for Testing Equality of Neuron Firing Intensity Functions
Halima Bensmail A Case Study of Bayesian Cluster Analysis of Galaxy Data
Peter Bouman
Vanja Dukic
Xiao-Li Meng
Interval Censored Reporting Delays in CDC AIDS Data
Eric Bradlow
David Schmittlein
Launching New Nondurables in Japan: Marketing Practices and Market Consequences
Angela Maria de Souza Bueno
Carlos Alberto de Bragana Pereira
M. Nazareth Rabello-Gay
Julio Michael Stern
Environmental genotoxicity evaluation: Bayesian approach
Can Cai
Robert E. Kass
Valerie Ventura
Bayesian Bivariate Free-Knot Splines for Neuron Spike Train Processes
Catherine A. Calder
David Higdon
Christopher Holloman
A Space-Time Model for Ozone Concentrations Using Process Convolutions
Yu-mei Chang
Daniel Gianola
Bjxrg Heringstad
Gunnar Klemetsdal
Inferring genetic and residual correlations between clinical mastitis in different periods of first lactation for Norwegian Cattle with a multivariate threshold model
Meng Chen
Mario Peruggia
Trisha Van Zandt
Was it a car or a cat I saw? An analysis of response times for word recognition
Erin M. Conlon Bayesian Markov Chain Monte Carlo Oligogenic Segregation Analysis of Familial Prostate Cancer Pedigrees
Samantha Cook Working Memory Impairments in Schizophrenia Patients and their Relatives: A Bayesian Item Response Theory Analysis
Ciprian Crainiceanu
David Ruppert
Jery Stedinger
Christoper Behr
Bayesian Hierarchical Modeling to Assess Pathogen Risk in Natural Water Supplies
J.M. Marin Diazaraque
R.Montes Diez
D. Rios Insua
Screening Models for Down's Syndrome
Michele DiPietro Risk-neutral Valuation of Financial Derivatives in a Bayesian Framework
Richard Evans
Helen Stein
Comparing Measures of Adult Attachment
Marco Ferreira
Zhuoxin Bi
Mike West
Herbie Lee
David Higdon
Multi-scale Modeling of 1-D Permeability Fields
Christina L. Geyer Detecting Fraud in Datasets Using Benford's Law
Mark E. Glickman Rating Colleges Through Choice Modeling
Cong Han
Kathryn Chaloner
Alan S. Perelson
Bayesian Analysis of a Population HIV Dynamic Model
Murli Haran
Bradley P. Carlin
John L. Adgate
Gurumurthy Ramachandran
Lance Waller
Alan E. Gelfand
Hierarchical Bayes Models for Relating Particulate Matter Exposure Measures
Jennifer Hill Who benefits most from higher child care quality?
Christopher Holloman
Dave Higdon
Herbert Lee
Parallel Computing for Multi-scale Problems
Gabriel Huerta
Bruno Sanso
Jonathan R. Stroud
A Space-time model for Mexico City ozone levels
Lurdes Y.T. Inoue
Peter F. Thall
Donald A. Berry
Seamlessly Expanding a Randomized Phase II Trial to Phase III
Telba Z. Irony Bayesian Methodology at the Center for Devices and Radiological Health - Past, Present and Perspectives for the Future
Shane T. Jensen A Bayesian approach to reducing heterogeneity in laboratory performance measures: An illustration from schizophrenia research
Beatrix Jones A Hierarchical Approach to Modeling Sperm Fitness in Insects
Marc Kennedy
David Higdon
Bayesian calibration of a resistance spot weld model
Jacob Laading
Tore Anders Husebo
Thor Ange Dragstn
Recalibration of a credit risk model using multiple data soureces
Michael Lavine
Brian Beckage
James Clark
Statistical Modelling of Seedling Mortality
Herbert Lee
Dave Higdon
A Flexible Convolution Approach to Modeling Spatial Processes in Porous Media
Ilya A. Lipkovich
Eric P. Smith
Evaluating the Impact of Environmental variables on Benthic Microinvertebrate Community via Bayesian Model Averaging
J.R. Lockwood
Mark Schervish
Patrick Gurian
Mitchell Small
Hierarchical bayesian Methods for Estimating Joint Contaminant Occurrence in Community Water Systems
Tanya Logvinenko Hidden Markov Model Approach to Local and Global Protein or DNA Sequence Alignements
Hedibert F. Lopes
Helio S. Migon
Comovements and Contagion in Emergent Markets: Stock Indexes Volatilities
Louis T. Mariano The Hierarchical Rater Model: Accounting for Information Accumulation and Rater Behavior in Constructed Response Student Assessments
German Molina
Susie Bayarri
James Berger
Assessing and Propagating Uncertainty in Model Inputs in Computer Traffic Simulators (CORSIM)
Vicente J. Monleon
Alix I. Gitelman
Andrew Gray
Multiscale Relationships Between Coarse Woody Debris and Presence/Absence of Western Hemlock in the Oregon Coast Range
Peter Mueller
Gary L. Rosner
Maria de Iorio
Borrowing Strength: Incorporating Information from Early Phase cancer Clinical Studies into the Analysis of Large, Phase III Cancer Clinical Trials
Stephen Ponisciak
Valen Johnson
Bayesian Analysis of Essay Grading
Surajit Ray
Bruce Lindsay
Multivariate Mixture Models: A Tool for Analyzing Gene Expression Data
C. Shane Reese
James A. Calvin
John C. George
Raymond J. Tarpley
Estimation of Fetal Growth and Gestation in Bowhead Whales
Cavan Reilly
Ashley Haase
Timothy Schacker
David Krason
Steve Wietgreft
The clustering of infected SIV cells in lymphatic tissue
Marc Sobel
Indrajit Sinha
A Bayesian Analysis of Consumer Preferences
Elizabeth Stuart A Bayesian Method for Using Administrative Records to Predict Census Day Residency
Franz Torres
Gisela Gonzalez
Tania Crombet
Agustin Lage
Use of the Bayesian approach in the analysis of clinical trials in patients with advanced lung cancer
Mario Trottini
M. Jesus Bayarri
Stephen E. Fienberg
Disclosure Risk and Information Loss in an Attitudinal Survey
David van Dyk
Yaming Yu
Accounting for Pile-Up in the Chandra X-ray Observatory
Garrick Wallstrom
Robert E. Kass
Correction of Ocular Artifacts in the EEG using Bayesian Adaptive Regression Splines
Lara J. Wolfson Who Did Nader Really Raid? A bayesian Analysis of Exit Poll Data from the 2000 US Presidential Elections
Xiao Yang
Keying Ye
Ina Hoeschele
Identifying Differentially Expressed Genes in cDNA Microarray Experiments: An Application of Bayesian Methods using Noninformative Priors
Yangang Zhang
Mark Schervish
Ercan U. Acar
Howie Choset
Probabilistic Methods for Robotic Landmine Search

Back to Bayes 01 Homepage




A Bayesian analysis of the four-year follow-up data of the Wisconsin epidemiologic study of diabetic retinopathy

by
Jean-Francois Angers and Atanu Biswas
Univerite de Montreal and Indeian Statistical Institute
Dep. de mathematiques et de statistique
C.P. 6128, succ "Centre-ville"
Montreal, Qc h3C 3J7
Applied Statistics Unit
Applied Statistics Unit
Indian Statistical Institute
203 B.T. Road, Calcutta - 700 035 India
jean-francois.angers@umontreal.caandatanu@isical.ac.in

Abstract:

The Wisconsin Epidemiologic Study of Diabetic Retinopathy (WESDR) is a population-based epidemiologic study carried out in Southern Wisconsin during the eighties of the last century. The resulting data were analyzed by different statisticians and ophthalmologists during the last two decades. Most of the analyses were carried out on the baseline data, although there were two follow-up studies on the same population. In this present paper we provide a Bayesian analysis of the first follow-up data which were taken four years after the baseline study. Our Bayesian analysis provides estimates of the associated covariate effects. Choice of the best model in terms of the covariate inclusion is also done. The baseline data was used to set the prior for the parameters. Extensive numerical computations illustrate our present methodlogy.

Back to the top of the page






Hierarchical Semiparametric Proportional Hazards Models for Spatially Correlated Survival Data

by
Sudipto Banerjee and Bradley P. Carlin
Div. of Biostatistics, School of Public Health, University of Minnesota
A460 Mayo Building
Minneapolis, Minnesota 55455
sudiptob@biostat.umn.edu

Abstract:

Recent developments in GIS have encouraged health science databases to incorporate geographical information about subjects under study. Such databases have in turn generated interest among statisticians and epidemiologists to develop and analyse models that account for spatial clustering, variation etc. Among such databases, the National Cancer Institute's SEER program is the most authoritative source on cancer incidence and survival in the United States. While geographical clustering and smoothing issues have found lot of prominence for disease incidence (based upon count data), the same cannot be said for time-to-event (survival) data. Based upon the SEER program, we first present the investigation of survival data on breast cancer patients in Iowa. In addition to the individual level covariate information, the database also includes geographical information (e.g. county of residence). In the absence of appropriate surrogates for standard of health in the different counties, epidemiologists and health professionals are particularly interested in discerning spatial patterns that might be present among the counties for treatment of breast cancer. We develop a hierarchical Bayesian framework to model such data. In particular, we incorporate stratum-specific frailties (deemed as partial realizations of a spatial process) and expand upon the more popular survival models (such as the Cox's model, cure-rate models etc.). We will also present the analysis of a challenging dataset for infant mortality in Minnesota. While the basic modelling framework is similar to the SEER data, the complexity of this data is increased by the large number of "censored" individuals (99.5%) and the consideration of post-natal and neo-natal deaths. Model comparisons and model selection issues are addressed. Chloropleth maps are used to present the spatial patterns, while most of the modelling technicalities are illustrated through graphical models (BUGS style).

Back to the top of the page






Bayes and asymptotically pointwise optimal stopping rules for the detection of influenza epidemics

by
Michael Baron
Department of Mathematical Sciences, University of Texas at Dallas
Richardson, TX 75083-0688
mbaron@utdallas.edu

Abstract:

In this study, Morbidity and Mortality Weekly Reports published by the Centers for Disease Control and Prevention (CDC) are used for the fast detection of the beginning of influenza epidemics. The data represent the percentage of specimen tested positive for influenza-like illnesses in two national labs during consecutive weeks of the year and the proportion of death attributed to pneumonia and influenza. It is customary to determine the beginning of an epidemic when influenza mortality exceeds ``epidemic'' threshold. On the other hand, one can often detect the beginning of epidemics even before this threshold is exceeded, by solving a suitable change-point problem. A hierarchical Bayesian model is proposed where the prior probabilities of a change point depend on (random) factors that affect the spread of influenza. Theory of optimal stopping is used to obtain Bayes stopping rules under the loss functions penalizing for the delay and false alarms. The solution is based on the corresponding payoff function whose form is rather complicated. Alternatively, asymptotically pointwise optimal stopping rules can be computed easily and under much weaker assumptions.

Back to the top of the page






Bayes Factors for Testing Equality of Neuron Firing Intensity Functions

by
Sam Behseta and Robert E.Kass
Carnegie Mellon University
Department of Statistics
Pittsburgh, PA 15213
sbehseta@stat.cmu.edu

Abstract:

In an experiment concerning the function of primary motor cortex, data from 347 neurons were collected in Dr. Peter Strick's laboratory at the Center for the Neural Basis of Cognition (CNBC) here in Pittsburgh. The high-level goal of the study was to contrast neuron firing patterns in repetitive and randomly assigned hand movements. A "Serial Reaction Time" (SRT) experiment on a rhesus monkey was performed. In SRT experiment, the subject is supposed to respond to a series of appearing targets in a touch-sensitive monitor. A sequence of successive targets are highlighted in a repeating or a pseudo-random order on the screen and the monkey is trained to respond by hitting those targets. We fit the neuronal firing rate as a function of time using cubic splines by assuming the firing rate is a Poisson process intensity function and applying Poisson regression. We used both fixed-knots splines and free-knots splines as described by DiMatteo, Genovese, Kass (2000). We then computed Bayes factors to test equality of firing rate intensity functions in the repetitive and random conditions.

Back to the top of the page






A Case Study of Bayesian Cluster Analysis of Galaxy Data

by
Halima Bensmail
Department of Statistics
University of Tennessee
Knoxville, TN 37996-0532
hbensmai@utk.edu

Abstract:

Until fairly recently, it was believed that the Galaxy consists of two stellar populations, the disk and the halo. More recently, it has been hypothesized that there are in fact three stellar populations, the old (or thin) disk, the thick disk, and the halo, distinguished by their spatial distributions, their velocities, and their metallic elements. Theses hypotheses have different implications for theories of the formation of the Galaxy. In this paper, we propose a new Bayesian mixture-model cluster analysis approach to answer the questions of finding the number of stellar populations. To choose the best fitting model and the number of populations (i.e., clusters) we use Bayes factors. For comparative purposes we also introduce a new information complexity (ICOMP) criterion via the Gibbs sampling in our calculations.

Back to the top of the page






Interval Censored Reporting Delays in CDC AIDS Data

by
Peter Bouman, Vanja Dukic, and Xiao-Li Meng
University of Chicago, Harvard University
bouman@galton.uchicago.edu

Abstract:

A standard AIDS surveillance data source is the Centers for Disease Control (CDC) AIDS Public Information Dataset (APIDS), which includes month and year of diagnosis and month and year of case report receipt for each reported United States AIDS case. Because projecting the course of the AIDS epidemic involves adjusting the number of observed AIDS cases for the delay in case reporting to the CDC, it is of interest to estimate the underlying distribution of this delay. This estimation is complicated by the fact that date of diagnosis and date of report receipt are truncated to the month for privacy reasons, creating up to two months' uncertainty in the delay time. For our case study, we study a lognormal family of reporting delay distributions, complicated by interval censoring and convolution with uniformly distributed starting times. Bayesian inference is used to obtain posterior distributions for the delay time parameters as well as the "missing" (i.e., censored) dates of diagnosis and report. This poster will address Bayesian methods of inference and model criticism, as well as implications for larger issues in censored data analysis and the construction of general AIDS epidemic models.

Keywords: AIDS, interval censoring, delay distributions, Bayesian inference, MCMC and epidemic models.

Back to the top of the page






Launching New Nondurables in Japan: Marketing Practices and market Consequences

by
Eric T. Bradlow and David Schmittlein
The Wharton School
ebradlow@wharton.upenn.edu

Abstract:

In this research we are interested in summarizing, across a large number of nondurable product categories, both the marketing programs used to launch new products in Japan, and the Japanese market's responsiveness to those marketing mix variables. This project represents, then, a meta-analysis of the market's responsiveness to marketing mix activities for 1193 new items launched covering 83 product categories. The main contribution of the research is to provide a reliable and valid measure of the impact of price and promotional variables on the key outcomes of percent trial and market share for new nondurables launched. The empirical work is based on a hierarchical Bayes model specification that considers only the variation across stores in adoption patterns for a single new item launched, avoiding the endogeneity problems that have characterized some prior research. The work investigates how the impact of price and promotion varies across type of product and type of manufacturer and test several hypotheses regarding this variation. Having assessed the general impact of these marketing variables, we next match up the differential responsiveness across product groups with actual new product launch practice. That is, since the impact of price/promotional variables is shown to vary across products, do manufacturers and retailers tend to use the most impactful type of promotions for a particular product category? Finally, the empirical results enable us to answer a basic question regarding the determinants of new product success. Namely, for the Japanese market, how much of the observed sales variation across stores (or regions) and across products is due to: product design and brand image factors, the price/promotional program at launch, differences in new product acceptance across stores/regions, and the product/market fit for a particular set of customers. Our results will provide guidelines for introductory marketing programs of firms launching new products in Japan.

Back to the top of the page






Bayesian Bivariate Free-Knot Splines for Neuron Spike Train Processes

by
Can Cai
Carnegie Mellon University
Department of Statistics
Pittsburgh, PA 15232
ccai@stat.cmu.edu

Abstract:

Ventura, Kass and colleagues at the Center for the Neural Basis of Cognition have used inhomogeneous Poisson processes to analyze neural data collected from the supplementary eye field of a Macaque monkey while performing a delayed eye movement task. Inhomogeneous Poisson processes are commonly used to model the point processes of single neuron spike trains, which assumes that the spikes are generated independently at a time-vary intensity rate $\lambda(t)$. However, in reality, due to biological features of neurons, the intensity rate at a certain time $t$ is always affected by the previous spike time. An inhomogeneous Markov interval (IMI) process model was proposed by Mark and Miller (1992) and Kass and Ventura (2001), in which the intensity rate is assumed to be a function of the current time and the backward recurrence period after the immediately preceding spike. Estimating the two-dimensional intensity function is a non-trivial problem. A bivariate Bayesian free-knot spline sitting method is suggested, which is an extension of the free-knot curve fitting method of DiMatteo (2001) to the two dimensional case. This method uses reversible jump Markov chain Monte Carlo methods to sample the number and location of the knots of the spline basis functions from their posterior distributions. The simulation study shows the method captures features of the IMI process well. The method is also applied to the supplementary eye field data.

Back to the top of the page






Environmental genotoxicity evaluation: Bayesian approach

by
Angela Maria de Souza Bueno
Universidade Federal de Santa Catarina
Departamento de Biologia Celular
Embriologia e Gentica, CCB Florianpolis
SC, Brazil. 88040-900
Fax: 55 048 331 9672.
bueno@ccb.ufsc.br

Carlos Alberto de Bragana Pereira
Universidade de Sao Paulo
Departamento de Estatstica, IME. Cx. Postal 66281
Sao Paulo, SP, Brazil. 05389-970
e-mail: cpereira@ime.usp.br

M. Nazareth Rabello-Gay
Instituto Butantan, Laboratrio de Gentica
Avenida Vital Brasil, 1500
Sao Paulo, SP, Brazil. 05503-900

Julio Michael Stern
Universidade de Sao Paulo
Departamento de Estatstica, IME. Cx. Postal 66281
Sao Paulo, SP, Brazil. 05389-970
jstern@ime.usp.br

Abstract:

Two Cytogenetic end-points were analyzed in three populations of a specie of wild rodents - Akodon montensis - living in an industrial, an agricultural, and a preservation area at the Itaja Valley, state of Santa Catarina, Brazil. The purpose was to evaluate the performance of the Mitotic Index and the MNPCE frequency the frequency of micronucleated polychromatic erythrocytes - in the establishment of a genotoxic profile of each area. The polychromatic/normochromatic ratio is also analyzed to show that the three populations are in equal conditions concerning the influence of confounding factors such as animal age, health, nutrition status, presence of pathogens, and intra- and inter-populational genetic variability. The Statistical Models used in this paper are mixtures of Negative-Binomials and Poisson variables. The Poisson variables are used as approximations of Binomials for rare events. The mixturing distributions are Beta densities. The Statistical Analyses are under the Bayesian Perspective, a lternatively to the frequentist ones, most considered in the literature.

Key words: Cell proliferative indices; Micronucleated cells; Prior and posterior probabilities; Beta-(negative)Binomial distribution; Beta-Poisson distribution; Mixture of Beta distributions

Back to the top of the page






Space-Time Model for Ozone Concentrations Using Process Convolutions

by
Catherine A. Calder, David Higdon, Christopher Holloman
ISDS, Duke University
Box 90251
Durham, NC 27708
kate@stat.duke.edu

Abstract:

Given daily ozone readings from 512 weather stations in the Eastern United States, we are interested in both predicting future ozone concentrations and in gaining insight into the space-time dependence structure of the data. We model ozone concentration as a process that moves across the region over time and exhibits spatial dependence locally in time. Our hope is to understand the latent mechanism that drives ozone concentration levels as well as to predict future levels. Process convolutions not only provide a framework for incorporating time dependence in spatial modeling, but also remain computationally tractable with large datasets. Standard dynamic linear modeling methods can be used to specify the time dependence allowing efficient posterior exploration. We consider a few variations of these space-time process convolution models that incorporate different levels of spatial dependence in time.

Back to the top of the page






A Space-Time Model for Ozone Concentrations Using Process Convolutions

by
Catherine A. Calder, David Higdon, Christopher Holloman
ISDS, Duke University
Box 90251
Durham, NC 27708
kate@stat.duke.edu

Abstract:

Given daily ozone readings from 512 weather stations in the Eastern United States, we are interested in both predicting future ozone concentrations and in gaining insight into the space-time dependence structure of the data. We model ozone concentration as a process that moves across the region over time and exhibits spatial dependence locally in time. Our hope is to understand the latent mechanism that drives ozone concentration levels as well as to predict future levels. Process convolutions not only provide a framework for incorporating time dependence in spatial modeling, but also remain computationally tractable with large datasets. Standard dynamic linear modeling methods can be used to specify the time dependence allowing efficient posterior exploration. We consider a few variations of these space-time process convolution models that incorporate different levels of spatial dependence in time.

Back to the top of the page






Inferring genetic and residual correlations between clinical mastitis in different periods of first lactation for Norwegian Cattle with a multivariate threshold model

by
Yu-mei Chang, Daniel Gianola
Department of Animal Sciences
University of Wisconsin, Madison
chang@calshp.cals.wisc.edu Bjxrg Heringstad and Gunnar Klemetsdal
Department of Animal Science
Agricultural University of Norway

Abstract:

Mastitis, an inflammation of the mammary gland, is the most frequent and costly disease affecting dairy cattle. Clinical mastitis records on 36,178 first-lactation cows from 5,286 herds, progeny of 245 sires, were analyzed. The opportunity for infection ranging from 30 days pre-calving to 300 days post-partum was divided into 11 periods of equal length. Within period, it was checked whether mastitis occurred or not. Mastitis incidence was 4.7% and 10.1% in the first two periods, respectively, and ranged between 1.4% and 2.2% subsequently. The objective was to infer genetic and residual correlations between mastitis in the eleven periods. An 11-variate analysis was carried out with a Bayesian threshold model, assuming that mastitis (presence vs. absence) was a different trait in each period. Using a multivariate normal link of 11 dimensions for each cow, unobserved liabilities (latent variables) were modeled as a linear function of year of calving, age-season of calving, her! d, sire of the cow and residual effects. The Bayesian model assigned vague proper priors to all parameters, except for herd and sire effects. Since about 25% of the herds had no mastitis occurring at all in the 330-day study period, an 11-dimensional normal prior with a null mean was assigned to herd effects, to alleviate the "extreme category problem" (all cows scored as 0 in such herds). Herds were assumed independent, but effects of the same herd on liabilities were correlated between periods. Sire effects were also assumed to follow an 11-dimensional multivariate normal distribution. Using known pedigree information, genetic relationships between sires were incorporated into the (co) variance matrix. The vector of sire effects was augmented with male ancestors lacking female progeny in the data file. Under additive inheritance, the genetic correlation between periods is given by the ratio of the covariance between sire effects for a pair of periods, divided by the square! root of the product of the corresponding variances. All residual variances were set equal to one, since these parameters cannot be identified. Gibbs sampling and a random walk Metropolis-Hastings algorithm (RWMH) were used to draw from posterior distributions of interest. Inferences were based on 210,000 samples, after discarding 40,000 iterations as burn-in. The acceptance rate for RWMH was about 32%. Heritability of liability to clinical mastitis was 0.12 before calving, and ranged between 0.05 and 0.08 after calving. Genetic correlations were positive and ranged between 0.13 and 0.55, suggesting that clinical mastitis resistance is not the same trait across periods. Residual correlations ranged between -0.12 and 0.36, and were smaller for non-adjacent intervals. These correlations may reflect carry-over effects of infections from period to period over and above similarity due to co-expression of the same genes in different periods. The information on genetic and residual! correlations between different stages of lactation can be used to develop models for longitudinal analysis of mastitis data.

Back to the top of the page







Was it a car or a cat I saw? An Analysis of Response Times for Word Recognition

by
Meng Chen, Mario Peruggia
Department of Statistics
The Ohio State University
and Trisha Van Zandt
Department of Psychology
Columbus, OH 43210
dream@stat.ohio-state.edu

Abstract:

We model the response times for word recognition collected in experimental trials conducted on five subjects. In a typical trial, a subject was presented with one of several lists of 32 words chosen at random from a master pool of 2000 common English words and given the opportunity to study it for a short time. Next, the subject was sequentially shown words from a scrambled list containing 20 words selected from the study list and 20 other words selected from the master pool. For each word, the subject was asked to classify it as belonging or not belonging to the study list and the response time was recorded.

Because of the sequential nature of the experiment and the fact that several replications of similar trials were conducted on each subject, the assumption of i.i.d. response times (often encountered in the psychology literature) is untenable. We consider Bayesian hierarchical models in which the response times are described as conditionally i.i.d. Weibull random variables given the parameters of the Weibull distribution. The sequential dependencies, as well as the effects of response accuracy, word characteristics, and subject specific learning processes are incorporated via a linear regression model for the logarithm of the scale parameter of the Weibull distribution.

We compare the inferences from our analysis with those obtained by means of instruments that are commonly used in the cognitive psychology arena. In both cases, we pay close attention to the quality of the fit, the adequacy of the assumptions, and their impact on the inferential conclusions. Finally, we discuss briefly the extent to which our approach can be generalized to other types of human response data.

Back to the top of the page






Bayesian Markov Chain Monte Carlo Oligogenic Segregation Analysis of Familial Prostate Cancer Pedigrees

by
Erin M. Conlon
Department of Statistics, Harvard University
One Oxford Street
Cambridge, MA 02138
conlon@stat.harvard.edu
Ellen M. Wijsman, Ellen L. Goode, Michael Badzioch and Gail P. Jarvik
University of Washington
Seattle, Washington
Mark Gibbs, Janet L. Stanford, Suzanne Kolb and Elaine A. Ostrander
Fred Hutchinson Cancer Research Center
Seattle Washington
Marta Janer and Leroy Hood
Institute for Systems Biology
Seattle, Washington

Abstract:

Previous studies have suggested strong evidence for a familial component to prostate cancer (PC) susceptibility. Here, we analyze 3,796 individuals in 263 prostate cancer families recruited as part of the ongoing Prostate Cancer Genetic Research Study (PROGRESS). We use Bayesian Markov chain Monte Carlo oligogenic segregation analysis to estimate the number of quantitative trait loci (QTLs) contributing to hereditary prostate cancer (HPC). We use age-at-diagnosis of HPC as the quantitative trait. We estimate the contribution of each QTL to the variance in age-at-diagnosis of HPC, and the mode of inheritance of each QTL. We find evidence that a mean of 4-5 QTLs contribute to the variance in age-at-diagnosis of HPC in these families. We find that genetic effects largely account for the variance in age-at-diagnosis of HPC, with environmental effects accounting for the remainder of the variance. Our findings for the number of QTLs contributing to HPC and the variance contribution of these QTLs will be instructive in mapping and identifying these genes.

Back to the top of the page






Working Memory Impairments in Schizophrenia Patients and their Relatives: A Bayesian Item Response Theory Analysis

by
Samantha Cook
Harvard University, Department of Statistics
One Oxford Street
Cambridge, MA 02138
cook@stat.harvard.edu

Abstract:

Several studies have shown that spatial working memory is impaired in schizophrenia patients. In our study, schizophrenia patients and normal controls participated in a memory test designed to measure both spatial and object working memory. The test items were designed to have differing levels of difficulty, making standard analyses inappropriate. The data were analyzed using a Bayesian Item Response Theory (IRT) model. Item response theory is a method for analyzing test scores in which the test items themselves are analyzed in addition to the test-takers' abilities. Analyzing the data in this way accounts for the fact that the questions were not all equally difficult, and also produces results which are more generalizable and less test-dependent. The analysis was carried out using Gibbs sampling, a Markov Chain Monte Carlo technique which improves upon standard EM methods for IRT models by producing standard error estimates which more accurately represent uncertainty about the parameters. This is joint work with John Barnard, Yungtai Lo, and Donald B. Rubin, of Harvard University, Department of Statistics, and Michael J. Coleman, Philip S. Holzman, Deborah J. Levy, and Steven Matthysse of McLean Psychiatric Hospital.

Back to the top of the page






Bayesian Hierarchical Modeling to Assess Pathogen Risk in Natural Water Supplies

by
Ciprian Crainiceanu, David Ruppert, Jery Stedinger, and Christopher Behr
Cornell University
Department of Statistical Science, 301 Malott Hall, Cornell University Ithaca NY 14853
cmc59@cornell.edu

Abstract:

Objectives/Hypothesis: Cryptosporidium parvum is a microscopic waterborne organism that once ingested can produce self-limiting gastrointestinal illness, and even death in individuals with a weakened immune system. In response to recent outbreaks (400,000 individuals are estimated to have been infected in Milwaukee in 1993), the Environmental Protection Agency (EPA) conducted national investigations of Cryptosporidium concentrations under the Information Collection Rule (ICR). The ultimate goal of the investigations was to develop revised water treatment standards. The basic ICR survey was conducted over a period of 18 months and included 350 major water sources corresponding to large systems. Sources of water in the survey included streams, reservoirs and ground water. Data was collected monthly and included: the number of Crypto oocysts, the number of Giardia cysts, total coliform bacteria, the volume of sampled water, and water turbidity. The statistical analysis of this data is challenging because of the discrete nature of the response variable (the number of oocysts actually counted), the high frequency of zero counts (90%), seasonality, regional effects, and missing observations.

Approach: Observed count data serves as the basis of a Generalized Linear Mixed Model (GLMM) with a hierarchical structure that includes sites, regions and an overall national average. Possible covariates include site characteristics, such as the category of the water source and the population served, and time dependent covariates including sampling date, flow rate, and water turbidity. A fully Bayesian approach is used for modeling and subsequent risk analysis. Markov Chain Monte Carlo (MCMC) simulation is employed to compute the posterior distributions of the parameters. A very powerful and flexible statistical software package called WinBugs is used for the Bayesian computations.

Results: Results illustrate the steps involved in parameter estimation, model selection, and risk assessment. The replicates generated by the simulation are used to describe parameter uncertainty and the predictive distribution of Cryptosporidium concentrations in the subsequent analysis of the cost-effectiveness of alternative EPA information collection strategies and treatment rules. Different distributions are used to model random effects (gamma or lognormal); some choices (gamma) allow the time-site effects to be integrated analytically (Poisson-gamma yields a negative-binomial distribution), which can affect the efficiency of the computations. Research addresses MCMC simulation performance. Examples are used to show the impact of hierarchical centering of site and regional random effects, centering and orthogonalization of covariates and the information content of data sets on MCMC mixing properties.

Keywords: Bayesian analysis, waterborne pathogens, Generalized Linear Mixed Model

Back to the top of the page






Screening Models for Down's Syndrome

by
J.M. MARIN DIAZARAQUE, R. MONTES DIEZ, D. RIOS INSUA
ESCET. Universidad Rey Juan Carlos
28933 Mostoles. Madrid. Espaqa
rmontes@escet.urjc.es

Abstract:

New technology has improved the methods of antenatal detection for Down's syndrome by obtaining fetal tissue samples (amniocentesis). In addi\-tion to greatly increasing the cost of medical care, this invasive technique carries a slight amount of risk to the foetus, which makes it inappro\-priate to examine every pregnancy this way. Screening tests, based on mother's age and the concentration of certain chemicals in the mother's blood (alpha-fetoprotein (AFP), human chorionic gonadotropin (HCG), as well as certain risk factors (gestational age, mother's overweight, smo\-king, etc.)) have been developed to try to identify those pregnancies with a high risk of Down's syndrome. A recent study, (Muller et al. (1999)) compares six software packages (Prenatal Interpretive Software, Prisca, DIANASoft, etc) that calculate DS risk and the authors conclude that subs\-tan\-tial varia\-tions are observed between them. Their speculation about the software discrepancies consider, for instance, errors in the estimation of gestational age, (Bishop et. al (1997)) and differences in the estimate of the correlation coefficients between the markers AFP and HCG (Dustan et. al. (1999)). We have looked into this problem from a fully Bayesian point of view and have developed a decision analysis and screening procedure which will allow the antenatal diagnosis of the Down's syndrome.

Back to the top of the page






Risk-neutral Valuation of Financial Derivatives in a Bayesian Framework

by
Michele DiPietro
Department of Statistics
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15213
dipietro@stat.cmu.edu

Abstract:

In this poster I will consider the problem of finding the equilibrium, or risk-neutral, price of a financial instrument anchored to the Federal Funds rate (FedFunds). This series governs all other short rates, and is therefore central in finance. FedFunds are often modeled in continuous time, but they can only be observed discretely (my data consists of monthly averages from 1963 to 1998, as released by the Federal Reserve). This discrepancy creates significant inferential challenges, especially in a Bayesian context. On the other hand, the Bayesian approach is very appealing because it provides a simple and effective way of quantifying a key concept for financial operators, namely the uncertainty induced on the equilibrium price by both the stochastic differential equation (SDE) used to model the FedFunds and the data. I develop an inferential technique, which I term Normalized Hermite Polynomial, that enables easy MCMC posterior inference, both for the parameters of the SDE and for the price itself. I also consider several spot rate models (mean-reverting Ornstein-Uhlenbeck, Cox-Ingersoll-Ross and Ahn-Gao) as a way of assessing the sensitivity of the equilibrium price under various frameworks. The particular instrument considered in this poster is a pure discount bond, but the algorithm is very easily generalized to more complex derivatives.

Back to the top of the page






Comparing Measures of Adult Attachment

by
Richard Evans and Helen Stein
Iowa State University and the Menninger Clinc
Ames IA

revans@iastate.edu

Abstract:

Adult attachment theory is an area of psychology that examines and classifies a subject's perceptions and expectations about close relationships. There is no unified theory of attachment, but generally adults can be classified as having secure or not-secure attachment styles. For example, a not-secure attachment style is the preoccupied style demonstrated by the actress Glen Close in the 1987 move "Fatal Attraction." In behavioral health it is important to assess the attachment style of new patients so that the clinician is able to establish appropriate and effective clinician - patient boundaries. In this paper, we test two correlated questionnaire measures of adult attachment to determine which is more accurate. One test has ordinal outcomes and the other has continuous outcomes. These data are the result of an ongoing Menninger Clinic Child and Family Center study to assess existing instruments and develop new instruments that measure adult attachment behavior. The! instruments classify the subject by eliciting expectations and perceptions about how close relationships operate and how the subject typically functions in close relationships (Stein et al., 1998). The approach is to provide inference for the difference in the areas under the parametric receiver operating characteristic (ROC) curves using the posterior distribution of the difference.

Back to the top of the page






Multi-scale Modeling of 1-D Permeability Fields

by
Marco Ferreira, Zhuoxin Bi, Mike West, Herbie Lee and David Higdon
Duke University
ISDS - Old Chemistry Bldg
Durham, NC 27708-0251
marco@isds.duke.edu

Abstract:

Permeability plays an important rule in subsurface fluid flow studies, being one of the most important quantities for the prediction of fluid flow patterns. The estimation of permeability fields is therefore critical and necessary for the prediction of the behavior of contaminant plumes in aquifers and the production of petroleum from oil fields. In the particular case of production of petroleum, part of the available data for the estimation of permeability fields is a "production curve". In formal statistical analysis to incorporate such information, corresponding likelihood functions for the high-dimensional random field parameters representing permeability field can be computed with the help of a fluid flow simulator (FFS). In addition, there usually exists information about the permeability fields relevant at different scales of resolution as a result of studies of the geological characteristics of the oil field, well tests, and laboratory measurements. Our work reported here uses a recently developed multi-scale model as a prior for 1-D permeability fields in order to incorporate the information available at the different scales of resolution. Estimation of the permeability field is then performed using an MCMC algorithm with an embedded FFS to incorporate the information given by the observed production curve. The performance of the proposed approach with respect to the recovery of the original permeability field is studied with simulated data.

Back to the top of the page






Detecting Fraud in Datasets Using Benford'S Law

by
Christina L. Geyer
ISDS Duke University
Box 90251
Durham, NC 27708
cgeyer@stat.duke.edu

Abstract:

An important need of governments, for tax purposes, and corporations, for internal audits, is the ability to detect fraudulently reported financial data. We present Bayesian methods, based on Benford's Law, that can be used to identify suspicious corporate tax returns for further investigation of fraud. Benford's Law is a numerical phenomenon in which sets of data that are counting or measuring some event follow a certain distribution. A history of the origins of Benford's Law is given and some examples of the types of data sets expected to follow Benford's Law are presented. Two Bayesian approaches are developed and compared to a classical statistical detection method developed by Nigrini (1996) and an alternative classical approach.

Back to the top of the page






Rating Colleges Through Choice Modeling

by
MARK E. GLICKMAN
BOSTON UNIVERSITY
DEPT OF MATH AND STATISTICS,
BOSTON, MA 02215
mg@math.bu.edu

Abstract:

Most published rankings of colleges and universities, such as those published by US News and World Report, Barron's, and Peterson's, analyze a school's quantitative characteristics to produce a summary rating. Despite the ad hoc nature of these approaches, they often produce ratings that generally agree with public perception. We consider an alternative model-based approach that is based on college admission and student matriculation choices. We demonstrate our approach to a dataset consisting of applications of 2026 high school seniors to 30 competitive schools, and produce rankings of the schools. Our model assumes that the merit of each student and each college can be characterized by a scalar merit parameter. These parameters enter into our model through two major components. One component addresses the probability that a student is admitted to a college, and has close ties to the Bradley-Terry (1952) model for paired comparisons. The second component is a multinomial choice model for a student's decision to matriculate among the schools to which he/she is admitted. The student merit parameters can be treated as random effects which, in combination with the probability model for the data, enables a straightforward Bayesian analysis via Markov chain Monte Carlo simulation from the posterior distribution. School rankings can then be based on posterior summaries of school merit parameters.

Back to the top of the page






Bayesian Analysis of a Population HIV Dynamic Model

by
Cong Han
Division of Biostatistics
University of Minnesota
Minneapolis, MN 55414
congh@biostat.umn.edu
Kathryn Chaloner
School of Statistics
University of Minnesota
Minneapolis, MN 55414
kathryn@stat.umn.edu
Alan S. Perelson
Theoretical Biology and Biophysics
Los Alamos National Laboratory
Los Alamos, NM 87545
asp@t10.lanl.gov

Abstract:

A Bayesian analysis of an HIV dynamic model using data from Perelson et al. (1996, Science 271, 1582-1586) is presented. The data are repeated measurements of plasma HIV RNA concentrations of patients receiving protease inhibitor treatment. A nonlinear mixed-effects model is introduced that explicitly models variability between subjects. The prior distribution is based on the scientific literature prior to 1996. Point and interval estimates are reported for the rates of disappearance of viruses and virus-producing cells. Issues of outliers and sensitivity to prior distribution are also investigated.

Back to the top of the page






Hierarchical Bayes Models for Relating Particulate Matter Exposure Measures

by
Haran, Murali, Carlin, Bradley P., Adgate, John L., Ramachandran, Gurumurthy, Waller, Lance, Gelfand, Alan E.
School of Statistics, University of Minnesota
Division of Biostatistics and Division of Environmental and Occupational Health, School of Public Health, University of Minnesota,
Statistics Department, Emory University,
and Statistics Department, University of Connecticut
mharan@stat.umn.edu

Abstract:

Understanding the effects of pollutants on the health of individuals requires consideration of different pollution sources. However, it is well known that there are large discrepancies between exposure measurements taken by indoor pollutant monitors, and an individual's actual particulate matter exposure. In this paper, we study data from the Hazardous Air Pollutants (HAPs) study, which collected reported emissions of particulates smaller than 2.5 microns (PM2.5) in three neighborhoods in the Minneapolis-St.Paul metropolitan area. The data consist of measurements of personal exposure for each individual, exposure inside the individual's home, and ambient exposure in the individual's neighborhood. The three neighborhoods exemplify areas with different pollutant sources: multiple source, single source and ``no major'' source. This data set also features time activity diary information on the amount of time each individual spent inside the home, outside near the home, and elsewhere. We use a Dirichlet-normal hierarchical structure to model the relationships among the exposure levels measured by the different monitors. Implemented via a hybrid Gibbs-Metropolis algorithm, our model allows for relevant covariates for the various types of exposure, measurement error in the observations, the large proportion (roughly 60\%) of missing data, and differences among exposure levels for different neighborhoods, seasons, and data monitoring sessions. We conclude with a discussion of our results, along with the limitations of (and possible improvements to) both our data set and analytic approach.

Back to the top of the page






Who benefits most from higher child care quality?

by
Jennifer Hill
Columbia University
School of Social Work
622 W. 113th St.
New York, NY 10027
jh1030@columbia.edu

Abstract:

Interest has been increasing in the past few years about valid ways of exploring the role of post-treatment variables such as dosage levels, so-called "mediating" variables, and surrogate endpoints on the effect of final outcomes. Frangakis and Rubin (2001) present a framework called principal stratification within which valid estimands (i.e. ones that measure true causal effects) can be conceptualized. We use data from a randomized experiment where infants assigned to the treatment group received intensive child care and education from 12 to 36 months of age. Using the principal stratification framework as a guide we implement both frequentist and Bayesian estimators to examine the role of post-randomization variables measuring child care choices in the control group. This allows us to investigate, for example, the differential causal impact of the intervention on children who would have stayed home with their mothers in the first few years of life in the absence of the intervention as opposed to those who would have been placed in center-based care in the absence of the intervention. A comparison of the Bayesian and non-Bayesian approaches will be presented.

Back to the top of the page






Parallel Computing for Multi-scale Problems

by
Christopher Holloman, Dave Higdon, and Herbert Lee
Duke University
Box 90251
Durham, NC 27708-0251
chh2@duke.edu

Abstract:

Related data on several different scales naturally arise in many applications including hydrology, finance, and environmental problems. Our research focuses on a problem in hydrology in which we model the permeability properties of an aquifer given data from flow experiments. In this case, it is computationally convenient to fit the model on both coarse and fine scales, with improved efficiency on the coarse scale, but a better scientific model on the fine scale. In order to improve model fitting via MCMC, we take advantage of parallel computing by fitting models simultaneously on multiple scales.

Back to the top of the page






A Space-time model for Mexico City ozone levels

by
Gabriel Huerta, Bruno Sanso and Jonathan R. Stroud
CIMAT, Universidad Simon Bolivar and University of Chicago
Apartado 402, Guanajuato, Gto. 36000, M\'exico
Apartado 890000, Caracas 1080-A Venezuela
5734 S. University Ave., Chicago, IL 60637, U.S.A
ghuerta@cimat.mx, bruno@cesma.usb.ve, stroud@galton.uchicago.edu

Abstract:

We consider hourly readings of ozone concentrations over Mexico City and propose a model for spatial as well as temporal interpolation and prediction. The model is based on regressing the observed readings on a set of meteorological variables, such as temperature and humidity. A few harmonic components are added to account for the main periodicities that ozone presents during a given day. The model incorporates spatial covariance structure for the observations and the parameters that define the harmonic components. Using the Dynamic Linear model framework, we show how to compute smoothed means and predictive values for ozone and missing values of the covariates. The methodology is applied to analyze data corresponding to September of 1997.

\noindent Key Words: Ozone time series; Spatio-temporal models; Bayesian modeling; Dynamic Linear models; Smoothed means; Predictive values

Back to the top of the page






Bayesian Methodology at the Center for Devices and Radiological Health - Past, Present and Perspectives for the Future

by
Telba Z. Irony, Gene Pennello, and Greg Campbell CDRH - FDA\\ TZI@CDRH.FDA.GOV

Abstract:

A couple of years ago, the Division of Biostatistics in the Center for Devices and Radiological Health (CDRH) at the FDA decided to explore the possibility of introducing Bayesian statistics to evaluate results of medical device clinical trials. The idea was that, in many cases, the Bayesian methodology could increase the efficiency of such trials since there is often availability of prior information about medical devices. The initiative was pioneered by Dr. Gregory Campbell, the director of the Division of Biostatistics, and supported by Dr. Bruce Burlington, who was at that time, the director of CDRH.

In this presentation, we will report what has happened in the last four years, discuss the Bayesian techniques that have been successfully used, and consider the perspectives for the future. We will also discuss the advantages, difficulties, and appropriateness of the Bayesian approach in medical device clinical trials.

Back to the top of the page






Seamlessly Expanding a Randomized Phase II Trial to Phase III

by
Lurdes Y.T. Inoue, Peter F. Thall, Donald A. Berry
Department of Biostatistics, MD Anderson Cancer Center, The University of Texas
1515 Holcombe Boulevard
Houston, TX 77030
lurdes@odin.mdacc.tmc.edu

Abstract:

A sequential Bayesian phase II/III design is proposed for comparative clinical trials. The design is based on both survival time and discrete early events that may be related to survival, and assumes a parametric mixture model. Phase II involves a small number of centers. Patients are randomized between treatments throughout, and sequential decisions are based on predictive probabilities of concluding superiority of the experimental treatment. Whether to stop early, continue, or shift into phase III is assessed repeatedly in phase II. Phase III begins when additional institutions are incorporated into the ongoing phase II trial. Simulation studies in the context of a non--small-cell lung cancer trial indicate that the proposed method maintains overall size and power while usually requiring substantially smaller sample size and shorter trial duration, when compared to conventional group-sequential phase III designs.

Back to the top of the page






A Bayesian approach to reducing heterogeneity in laboratory performance measures: An illustration from schizophrenia research

by
Shane T. Jensen
Department of Statistics, Harvard University
One Oxford Street
Cambridge, MA 02138
jensen@fas.harvard.edu

Abstract:

Heterogeneity in the performance of persons affected with schizophrenia or schizotypy psychopathology on laboratory tasks has long been recognized for the challenges it poses for experimental psychopathology, genetic, and other investigations. Traditional techniques such as factor analysis, discriminant function analysis, and cluster analysis have all been deemed inadequate for resolving heterogeneity due to one or another statistical shortcoming or limitation. A group of experimental subjects was initially identified as schizotypic using the well-known Perceptual Aberration Scale. We present a Bayesian approach, involving a Gibbs sampling strategy, that enables one to effectively parse this experimental group in a manner that reduces heterogeneity and allows for the separation of what are termed ``true" and ``false positive" schizotypes. This study is complemented by a maximum likelihood approach, based on the expectation-maximization (EM) algorithm. The validity of our parsing strategy is supported by reference to other laboratory indexes of relevance to schizophrenia and schizotypy that were not included in the initial analyses.

Back to the top of the page






A Hierachical Approach to Modeling Sperm Fitness in Insects

by
Beatrix Jones
Department of Statistics, Penn State University
311 Thomas Building
University Park PA 16801
trix@stat.psu.edu

Abstract:

The revolution in genetic typing technology has produced the ability to collect data which are potentially very informative about ecological and evolutionary processes. Relatively sophisticated stochastic modeling is often needed to get the most out of these data: this talk will examine a problem where this is the case. Laboratory experiments have suggested that when females of many insects mate with more than one successive male, sperm from the later matings displaces that from earlier matings, resulting in more offspring sired by the later mating fathers. Experiments have also suggested that the ability of new sperm to displace the sperm already present varies across males. We will attempt to assess this variability in a natural population of flies in Sonoma County California. Microsatellite markers were used to gauge multiple paternities. We then develop a hierarchical model for the fraction of already-present sperm displaced by each male. The parameters of this model are then inferred in a Bayesian framework.

Back to the top of the page






Bayesian calibration of a resistance spot weld model

by
Marc Kennedy, David Higdon
National Institute of Statistical Sciences
PO Box 14006, Research Triangle Park, Durham, NC 27713
marc@niss.org

Abstract:

Complex computer codes are increasingly used to model real-world processes in science and engineering. Many difficulties are faced by users of such models. We consider the example of a finite element simulation of a resistance spot welding process. The inputs include controllable features (current, load, gauge) and an unknown `tuning' parameter t. The output is weld diameter. Data obtained from real weld measurements for 1mm and 2mm aluminium are combined with data from a designed computer experiment to perform a Bayesian calibration to learn about the unknown input t, and also about the bias function.

Two possible predictions are presented: The calibrated code prediction and the bias corrected prediction of reality. Each takes account of all relevant uncertainty, including remaining uncertainty in t after calibration. Posterior distributions of the tuning parameter and bias functions are also used to answer model validation questions.

Back to the top of the page






Recalibration of a credit risk model using multiple data sources

by
Jacob Laading, Tore Anders Husebo and Thor Aage Dragsten
Den norske Bank
Stranden 21
Oslo, Norway
jacob.laading@dnb.no

Abstract:

After some experience with a credit risk (probability of default) model, it was found that this model was wrongly calibrated. The model is built in two segments (financial and non-financial), and while the two separate models rank-ordered well, the absolute level of a model combining the two was giving a clearly skewed estimate of the portfolio risk. This work describes how short-term default data and expert opinions were used in reestimating the model. Special emphasis is put on the modeling process in the organization, where the graphical model approach and the elicitation of expert opinions was used for tutorial and trust-building purposes within the business organization.

Back to the top of the page






Statistical Modelling of Seedling Mortality

by
Michael Lavine, Brian Beckage and James Clark
Duke University
Box 90251
Durham NC 27708
michael@stat.duke.edu

Abstract:

Seedling mortality in tree populations limits population growth rates and controls the diversity of forests. To learn about seedling mortality, ecologists use repeated censuses of forest quadrats to determine the number of tree seedlings that have survived from the previous census and to find new ones. Typically, newly found seedlings are marked with flags. But flagging is labor intensive and limits the spatial and temporal coverage of such studies. The alternative of not flagging has the advantage of ease but suffers from two main disadvantages. It complicates the analysis and loses information. The contributions of this paper are (i) to introduce a method for using unflagged census data to learn about seedling mortality and (ii) to quantify the information loss so ecologists can make informed decisions about whether to flag. Based on results presented here, we believe that not flagging is often the preferred alternative. The labor saved by not flagging can be used to better advantage in extending the coverage of the study.

A seedling's survival probability is the chance that it survives from one year to the next. Seedlings that have survived through at least one winter can be identified by the presence of bud scale scars. We denote seedlings as either Old (having bud scale scars) or New (not having bud scale scars). Old and New seedlings have different survival probabilities but roughly speaking, the survival probability of Old seedlings is not further explained by age. Therefore we adopt a model with two parameters of interest --- p_Old and p_New --- survival probabilities for Old and New seedlings respectively.

The methods are illustrated on red maple (Acer rubrum) data from the Long Term Ecological Research (LTER) site at Coweeta, in the Appalachian Mountains of North Carolina. The data come from a collection of plots, each containing a collection of 1m^2 quadrats which are the units of analysis. Quadrats within a plot differ slightly in seedling survival rates. Survival rates are also affected by factors such as altitude, the presence of light gaps and the presence of rhododendron cover. Our model contains fixed effects altitude, gaps and rhododendron and random effects for quadrat and year.

The data are subject to errors of various sorts. For red maple, one of the most important is that red maple seedlings sometimes emerge from the ground late in the Fall, after the census has been taken. Thus Old seedlings in year j can be recorded even though there were no New or Old seedlings in year j-1. Late emergence means that N_i,j, the true number of New seedlings in quadrat i in year j is unknown. To accomodate the uncertainty in N_i,j we adopt a Poisson arrival rate model for New seedlings in which the rate is subject to fixed and random effects.

In the course of illustrating the method we evaluate sensitivity to the prior and to some modelling choices. We also quantify the information gained by flagging and find that not flagging is often the more attractive alternative. In the past it was not known how to extract useful mortality information from unflagged data. Now that a method has been illustrated, and because unflagged data is much easier to collect, we may begin to see data sets of greater spatial and temporal coverage that will increase our understanding of seedling survival and ultimately our understanding of the survival and spread of past, present and future populations of trees.

Much of the work has already been completed, is available on the web as ISDS Discussion Paper #99-33 and will be published in JABES. Other parts are in Brian Beckage's completed PhD thesis in Botany and will be published. The major work to be completed before September is the inclusion of flagged and unflagged data and multiple plots all in the same analysis.

Back to the top of the page






A Flexible Convolution Approach To Modelling Spatial Processes In Porous Media

by
Herbert Lee, Dave Higdon
Duke University
ISDS, Box 90251
Durham, NC 27708
herbie@stat.duke.edu, higdon@stat.duke.edu

Abstract:

In situ cleanup of contaminated soil requires knowledge of the soil permeability, a spatial process. Here we take a Bayesian approach to allow straightforward estimation of uncertainty, and we demonstrate our methodology with data from an actual flow experiment. A spatial Gaussian Process can be represented as the convolution of a continuous white noise process and a smoothing kernel, where the choice of kernel relates to the covariogram of the process. In practice, a coarse discrete approximation to the white noise process gives an efficient and accurate method for generating realizations from the Gaussian process. We expand upon this model by allowing the underlying process to be other than white noise. For example, a Markov random field can be convolved with a smoothing kernel to produce a new spatial process.

Back to the top of the page






Evaluating the Impact of Environmental variables on Benthic Microinvertebrate Community via Bayesian Model Averaging

by
Ilya A. Lipkovich and Eric. P. Smith Statistics Department Virginia Polytechnic Institute and State University
406-A Hutcheson Hall
Blacksburg, VA 24061-0439 USA
ilipkovi@vt.edu

Abstract:

Variable selection is one of the most important and controversial issues in modern data analysis. In the study of relationships between biological communities and environmental conditions, variable selection is especially important as it guides decisions about environmental management. Using a case study from Eastern Corn Belt Plains Ecoregion (Norton, 1999) we use Bayesian Model Averaging (BMA) to select interesting subsets of environmental variables (such as metal composition, silt level, etc), that can impact the abundance of benthic microinvertebrates taxa. We implement BMA for a multivariate technique called Canonical Correspondence Analysis (CCA) and use its results to represent sites, species and selected environmental variables on a single ordination diagram (triplot) along with error bars representing uncertainty due to both sampling variability and model selection. BMA output can be also used to construct prediction areas for new observations which allows ! the researcher to evaluate the limits of impact due to possible changes in benthic ecosystem variables. BMA provides data analysts with an efficient tool for discovering promising models and obtaining estimates of their posterior probabilities via Markov chain Monte Carlo (MCMC). These probabilities are further used as weights for model averaged predictions and estimates of the parameters of interest. As a result, variance components due to model selection can be estimated and accounted for, contrary to the practice of conventional data analysis. In our study we adopt an approach to BMA called Model Composition MCMC (MC^3, Madigan and Raftery, 1994) and we implement BMA methodology by treating CCA within a general framework of reduced rank regression for which we develop a Bayes Information Criterion (BIC) approximation to posterior model probabilities in the spirit of MC3. In addition to applying BMA to the case study, we developed a general purpose Visual Basic macr! o that allows the user to easily perform BMA with any data set of similar structure, and produce various useful outputs for both full and reduced rank multivariate regression, such as individual model weights, variable activation probabilities, estimation of model selection and biplot and triplot diagrams with error bars representing model selection uncertainty associated with projections of individual sites and taxa.

Back to the top of the page






Hierarchical Bayesian Methods for Estimating Joint Contaminant Occurrence in Community Water Systems

by
J.R. Lockwood, Mark Schervish, Patrick Gurian, and Mitchell Small
The RAND Corporation (J.R. Lockwood), University of Texas at El Paso (Patrick Gurian)
and Carnegie Mellon University (Mark Schervish and Mitchell Small)

jlock@stat.cmu.edu, mark@stat.cmu.edu, gurian@andrew.cmu.edu, ms35@andrew.cmu.edu

Abstract:

The 1996 amendments to the U.S. Safe Drinking Water Act mandate revision of current maximum contaminant levels (MCLs) for various harmful substances in community drinking water supplies. The choice of a MCL for a given contaminant must balance the potential costs and benefits of lowering exposure, which requires detailed information about the occurrence of the contaminant and the costs and efficiencies of the available treatment technologies. Although community water systems must comply concurrently with the MCLs for over 80 regulated substances, regulations generally are set one contaminant at a time. The failure to consider the joint behaviors of multiple contaminants during the regulatory process can lead to mischaracterization of the actual costs and benefits. In order to estimate more effectively the true costs and benefits of simultaneous compliance with standards for several contaminants, the U.S. Environmental Protection Agency is attempting to expand existing regulatory evaluation methods to account for multiple contaminants. Such technology requires not only the joint consideration of treatment options, but also the joint occurrence distributions of the contaminants. Our work focuses on the latter topic, extending existing methods for modeling the distributions of a single contaminant in community water system source waters to the simultaneous consideration of multiple contaminants. We consider alternatives for addressing the implementation difficulties inherent in the multivariate setting, providing solutions of general methodological interest. Through case studies involving arsenic, sulfate, magnesium and calcium, we show how jointly modeling contaminants provides better fit and predictive power than marginal models, emphasizing how inferences about important regulatory quantities can be improved through joint modeling. Our methods make significant progress in redressing several shortcomings of existing analyses.

Back to the top of the page






Hidden Markov Model Approach to Local and global Protein or DNA sequence Alignemnts (Pairwise and Multiple)

by
Tanya Logvinenko
Stanford University
tanyalog@stat.stanford.edu

Abstract:

Global and local sequence alignments are tools widely used in biomedical research. But despite the long history there is a number of short-comings in existent methods. Dynamic programming methods yield a single optimal alignment which is highly dependent on the scoring matrix and gap penalties used. We will describe Bayesian algorithms for local and global pairwise sequence alignments (using Hidden Markov Models) which will produce representable samples of alignments and give posterior distribution of all the alignments considering the set of different parameters used. To show the potential of these methods, we apply them to identify regions of sequences conserved to different degrees. We will present an extension of the algorithms to aligning multiple sequences.

Back to the top of the page






Comovements and Contagion in Emergent Markets: Stock Indexes Volatilities

by
Hedibert F. Lopes and Helio S. Migon
Federal University of Rio de Janeiro
Caixa Postal 68530
21945-970, Rio de Janeiro - BRAZIL
hedibert[migon]@im.ufrj.br

Abstract:

The past decade has witenessed a series of (well accepted and defined) financial crises periods in the world economy. Most of these events are country specific and eventually spreaded out across neighbor countries, with the concept of vicinity extrapolating the geographic maps and entering the contagion maps. Unfortunately, what contagion represents and how to measure it are still unanswered questions.

In this article we measure the transmission of shocks by cross-market correlation coefficients following Forbes and Rigobon's (2000) notion of shift-contagion. Our main contribution relies upon the use of traditional factor model techniques combined with stochastic volatility models to study the dependence among Latin American stock price indexes and the North American index. More specifically, we concentrate on situations where the factor variances are modeled by a multivariate stochastic volatility structure.

From a theoretical perspective, we improve currently available methodology by allowing the factor loadings, in the factor model structure, to have a time-varying structure and to capture changes in the series' weights over time. By doing this, we believe that changes and interventions experienced by those five countries are well accommodated by our models which learns and adapts reasonably fast to those economic and idiosyncratic shocks.

We empirically show that the time varying covariance structure can be modeled by one or two common factors and that some sort of contagion is present in most of the series' covariances during periods of economical instability, or crisis. Open issues on real time implementation and natural model comparisons are thoroughly discussed.

Back to the top of the page






The Hierarchical Rater Model: Accounting for Information Accumulation and Rater Behavior in Constructed Response Student Assessments

by
Louis T. Mariano
Carnegie Mellon University
Department of Statistics
Pittsburgh, PA 15213
ltm@stat.cmu.edu

Abstract:

Open-ended (i.e. constructed response) test items have become a stock component of standardized educational tests. Responses to open-ended items are usually evaluated by human ``raters'', often with multiple raters judging each response. In this paper we contrast the FACETS model (Linacre, 1989), a mixed-effects multivariate logistic regression model that has been a a popular tool for modeling data from rated test items, with a fully hierarchical Bayes model for rating data (the hierarchical rater model, HRM, of Patz, Junker, Johnson and Mariano, 2000). The HRM makes more realistic assumptions about the dependence between multiple ratings of the same student work, and thus provides a more realistic view of the uncertainty of inferences on parameters and latent variables from rated test items. A rigorous treatment of the approach to dependence and uncertainty in each model is presented, followed by two new applications of the HRM. The first application uses simulated data to explore the accumulation of information under the HRM, under various scenarios of rater performance (especially poor performance). The second application shows how the HRM can be used to make inferences about examinees, test items and raters, in a statewide mathematics exam given in the State of Florida. In particular we explore the effect of modality---the design for distributing items among raters---on the severity and consistency of individual raters' performance.

Back to the top of the page






Assessing and Propagating Uncertainty in Model Inputs in Computer Traffic Simulators (CORSIM)

by
Molina, German
Institute of Statistics and Decision Sciences, Duke University
Durham, NC 27708-0251, USA
german@stat.duke.edu
Bayarri, Susie
Dept of Statistics and Operations Research, Universitat de Valencia
Burjassot, Valencia, 46100, SPAIN
bayarri@uv.es
Berger, James
Institute of Statistics and Decision Sciences, Duke University
Durham, NC 27708-0251, USA
berger@stat.duke.edu

Abstract:

CORSIM is a large simulator for vehicular traffic, and is being studied in regards to is ability to successfully model and predict behavior of traffic in a 36 block section of Chicago. Inputs to the simulator include information about street configuration, driver behavior, traffic light timing, turning probabilities at each corner and distributions of traffic ingress into the system.

Data is available concerning the turning proportions in the actual neighborhood, as well as counts as to vehicular input into the system, and internal system counts, during a day in May, 2000. Some of the data is accurate (video recordings), but some is inaccurate (observer counts of vehicles). The first goal is to incorporate both types of data so as to derive the posterior distribution of turning probabilities and of the parameters of the CORSIM input distribution.

The vehicles passing through an intersection are modeled with a product multinomial distribution, with turning probabilities specific to each intersection. The accurate data is introduced as restrictions to the model, reducing the actual number of latent variables. We perform an MCMC analysis to learn about the turning probabilities at every intersection, latent counts at different locations, bias parameters for the observers, interarrival rates,...adding up to about 200 parameters in the network.

The posterior distribution on model inputs will then be used to study sensitivity of the computer model predictions. Studying the uncertainty in model predictions is complicated by the fact that the CORSIM model operates close to feasibility constraints, and these constraints must be built into the uncertainty propagation through the model.

Back to the top of the page






Multiscale Relationships Between Coarse Woody Debris and Presence/Absence of Western Hemlock in the Oregon Coast Range

by
Vicente J. Monleon
PNW Research Station
USDA Forest Service
1221 SW Yamhill
Portland, OR 97205
vjmonleon@fs.fed.us
Alix I. Gitelman
Department of Statistics
Oregon State University
44 Kidder Hall
Corvallis, OR 97331
gitelman@stat.orst.edu
Andrew Gray
PNW Research Station
USDA Forest Service
1221 SW Yamhill
Portland, OR 97205

Abstract:

This study examines the relationship between the abundance of coarse woody debris (CWD) and the establishment of western hemlock ( Tsuga heterophylla) at two different scales: microsite-level and stand-level within the Oregon Coast Range. Western hemlock is a key structural component of old-growth forests in the Pacific Northwest, typically providing a multilayered canopy and contributing to the diversity of tree ages. Forest managers are looking for ways to promote the establishment of hemlock in the hope of accelerating the development of old growth characteristics. Most ecological processes operate at several scales. The establishment and survival of hemlock depends upon finding suitable sites at the microsite-level ('safe sites'), which we hypothesize to be characterized by a greater amount of CWD than the rest of the stand. However, the total amount of CWD in the stand may in turn determine the abundance of safe sites, and the lack of CWD may result in hemlock growing in less desirable sites. We use a hierarchical model to determine the relationship between the amount of CWD and hemlock establishment at the microsite-level, and whether this relationship itself depends upon the overall amount of CWD available in the stand. In each of 15 mature, unmanaged forest stands in the Oregon Coast Range, points without hemlock saplings and points with hemlock saplings were randomly selected. Each of these points represents the microsite-level of the study. Around each sampled point, the area covered by CWD was measured. In addition, a measurement of CWD for the entire stand was obtained for each of the 15 stands. To understand the relationship between CWD and hemlock presence/absence, we fit a series of hierarchical logistic regression models that account for CWD at the microsite-level alone and at both the microsite- and stand-levels. The slope term of these regression models measures the relationship between the odds ratio of hemlock sapling presence to hemlock sapling absence, and the amount of CWD at the corresponding level or levels in the hierarchy. There is significant association between the amount of CWD and hemlock establishment at the microsite-level, but this relationship does not seem to depend on the total amount of CWD available in the stand. On average, for each $0.1 m^2$ CWD per $m^2$ area increase in the amount of CWD, the odds of finding a hemlock sapling are estimated to increase 2.45-fold (95\% posterior interval ranges from 1.47 to 3.96). This relationship varies across the stands, from a low of an estimated 1.46 -fold increase to a high of an estimated 5.37-fold increase. These results suggest that CWD can be used to help predict hemlock presence/absence, and that management practices that increase the amount of CWD in forest stands should be considered as potentially beneficial to hemlock establishment.

Back to the top of the page






Borrowing Strength: Incorporating Information from Early Phase Cancer Clinical Studies into the Analysis of Large, Phase III Cancer Clinical Trials

by
Peter Mueller
Gary L. Rosner
The University of Texas, M.D. Anderson Cancer Center
Maria de Iorio
Duke University
pm@odin.mdacc.tmc.edu

Abstract:

During the stages of drug development, clinical studies progress in stages. Patients treated in early studies are necessarily monitored more closely for obvious safety reasons. Aside from recording safety data, clinical investigators also often collect information on the pharmacokinetics of the agents under study. In later phase studies, especially large randomized phase III studies, there is usually less close monitoring of patients, either because of the difficult logistics or cost or because enough is known about the safety of the drug or drug combinations under study. Thus, early phase studies typically collect more data per patient but treat relatively few patients, compared to large randomized phase III studies. Methods for combining the fuller data collected on patients enrolled in earlier phase studies with sparse data collected as part of a phase III study help us learn more about PK and PD variability in the population.

We describe the development of statistical tools for carrying out full Bayesian meta-analyses across studies with varying degrees of exchangeability from study to study. We analyze data from three studies carried out by the Cancer and Leukemia Group B (CALGB): studies 8881, 9160, & 8541. Using the data from the two earlier phase studies (8881 & 9160), in which relatively frequent blood count monitoring took place, we make more precise our inference in a large phase III study in adjuvant breast cancer.

To achieve the desired borrowing strength across studies we use hierarchical models with sub-models for each study. For each study we define a population PK/PD model, i.e., a hierarchical model to allow inference about PK/PD data across patients. As part of these models we use flexible non-parametric random effects distributions for patient specific random effects to accomodate heterogeneity of the patient population and outliers. The random effects distributions in each of the studies are different, but it would be unreasonable to assume them a priori independent. Thus we need a model for dependent random probability measures, i.e., we require dependent non-parametric Bayesian models. We use a class of models based on the dependent Dirichlet process (DDP) proposed in MacEachern (2001).

Back to the top of the page






Bayesian Analysis of Essay Grading

by
Stephen Ponisciak, Valen Johnson
ISDS, Duke University
Box 90251, Durham, NC 27708
steve@stat.duke.edu

Abstract:

An interesting problem in educational research is the rating of essays by multiple raters, because each rater will tend to have a different opinion regarding the characteristics of a good essay. Our dataset consists of ratings assigned to essays written by 1200 subjects. Each essay received six ratings from each of six raters - one global rating and five sub-ratings. Each essay was rated by each rater in each category, so the data is fully observed. Our analysis employs hierarchical statistical methods with random effects, as described in Ordinal Data Modeling (Johnson, V.E., and Albert, J.H., 1999) and ``On Bayesian Analysis of Multirater Ordinal Data: An Application to Automated Essay Grading,'' ( Journal of the American Statistical Association, Johnson, V.E., 1996).

We used hierarchical nonlinear regression techniques with the student's grade from each rater in each category as the outcome. Associated with each rating is a "perceived" latent variable, which is assumed to be centered at the "true" latent ability variable for that individual in that category. These "true" ability variables are "perceived" with a different error variance for each rater in each category. We assumed that the "true" ability variables have a multivariate normal distribution whose covariance matrix has the form of a correlation matrix, so that the "true" ability variables are Normal(0,1) a priori.

MCMC was necessary in order to work with the posterior distributions, which are quite unwieldy. We used a method explained by Barnard, McCulloch and Meng (1997) to sample from the distribution of the covariance matrix. Our main interest was in the differences in the rater variances, and in examining the posterior distribution of the covariance matrix mentioned above. We conclude by showing some results.

Back to the top of the page






Multivariate Mixture Models: A Tool For Analyzing Gene Expression Data

by
Surajit Ray, Bruce Lindsay
Pennsylvania State University
325 Thomas Building
University Park, Pa-16801
EMAIL:surajit@stat.psu.edu

Abstract:

``Understanding the Human Genome, shifts our medical attention from treating mere symptoms and alleviating pain, to discovering and isolating the root cause of certain diseases''. This was what Melissa Reyes, an 11th grader from Florida, had to say in response to the question ``How is the sequencing of human genome relevant to you ?''. DNA arrays have recently emerged as a powerful new experimental technique for large-scale analysis of gene expression and function ,which are not yet understood at the molecular level. The Stanford yeast cell cycle data has been analyzed by scientists using hierarchical and model-based algorithms to estimate the number of clusters. In the mixture modeling literature, determination of the number of components is a classic convex optimization problem. The gradient check in the Nonparametric Maximum Likelihood Estimate(NPMLE) routines has provided an elegant tool for determining the number of components in the univariate case. In our recent project, we generalize the idea of NPMLE to Multivariate NPMLE and extract the number of clusters in the high-dimensional expression data scenario. Assessment of the fitted model is also investigated through AIC,BIC and kernel based quadratic distances.

Back to the top of the page






Estimation of Fetal Growth and Gestation in Bowhead Whales

by
C. Shane Reese, James A. Calvin, John C. George, and Raymond J. Tarpley
Los Alamos National Laboratory, Texas A\&M University, North Slope Borough, Texas A\&M University
MS F600
Los Alamos, NM 87545
reese@lanl.gov

Abstract:

We consider the problem of estimating fetal growth and gestation for bowhead whales, balaena mysticetus, of the Bering, Chukchi, Beaufort Seas (BCBS) stock. This western Arctic population is subject to a subsistence hunt by Eskimo whale hunters which is carefully monitored via a quota system established by the International Whaling Commission (IWC) and managed by the Alaska Eskimo Whaling Commission (AEWC). Quota determination is assisted by biological information, such as fetal growth and gestation, which is the basis of a population dynamics model (PDM) used to estimate the annual replacement yield (RY) of the stock. We develop a Bayesian hierarchical nonlinear model for fetal growth with computation carried out via Markov Chain Monte Carlo (MCMC) techniques. Our model allows for unique conception and parturition dates, and provides predictive distributions for both gestation length (mean of 14.0 months with 90% predictive interval of (13.0, 15.2)) and conception dates (mean 24 March with 90% predictive interval of (3 March, 13 April)). These results are also used to propose estimates of geographic locations for both conception and parturition. Finally, a sensitivity analysis indicated that caution should be excercised in specifying some parameters related to the growth rate, conception dates, and parturition dates.

Back to the top of the page






The clustering of infected SIV cells in lymphatic tissue

by
Cavan Reilly, Ashley Haase, Timothy Schacker, David Krason, and Steve Wietgreft
University of Minnestoa, Reilly-Division of Biostatistics,
Haase and Wietgreft-Department of Microbiology,
and Schacker and Krason-Department of Infectious Diseases

A460 Mayo Bldg, MMC 303, 420 Delaware St. S.E., Minneapolis, MN 55455
cavanr@biostat.umn.edu

Abstract:

While much research on the pathogenesis of HIV has been conducted, there has been no research aimed at uncovering the manner in which the virus spreads from one cell to another in an infected host. While some have postulated complicated mechanisms by which the infection spreads, we examined if a simple model of local spread is consistant with lymph node samples obtained from a Rhesus macaque infected with SIV (a close relative of HIV with a similar pathogenesis) a known number of days prior to sample collection.

To investigate this issue, we treat the locations of infected cells as a realization from an inhomogeneous Poisson process and parameterize the intensity of this process as a linear combination of Gaussian densities. We then use the Metropolis algorithm to generate samples from the posterior distriubtion of the intensity. With these sampled intensities, we can compute how far infected cells spread out from centers of clusters of infected cells, and since the lifespan of infected cells is well known, we can assess the validity of our model of viral spread by comparing the distances at which we witness clustering to what we expect under our model. Although Ripley's $K$ function is widely used as a method for examining the clustering of point processes, we show how this function can lead one astray for this application, so we are forced to develop novel descriptive statitics to investigate clustering. Moreover, with our technique, we can assess if the clustering is likely to ha! ve resulted from a homogeneous Poisson process, and our method easliy allows for model diagnostics. We find that the simple model of viral spread is consistent with the data-in fact, at the time of the statistical analysis, the number of days from infection to sample collection was withheld, but the method was able to uncover this time lag based only on the model and the data.

Back to the top of the page






A Bayesian Analysis of Consumer Preferences

by
Marc Sobel and Indrajit Sinha
Dept of Statistics/Marketing; Fox School of Business and Management
1810 N. 13th Street; Temple University
Philadelphia, PA 19122
sobel@sbm.temple.edu

Abstract:

In a mall-intercept study of consumer preferences, each consumer fills out a questionnaire containing 17 likert multiple choice questions regarding (possibly different) product categories. The questions are concerned with such variables as percieved risk, quality variation, deal proneness, and store versus national brand preference. We focus on modelling the relationship between store versus national brand preference and the other preference variables and predicting the former. Flexible classes of latent variable models are proposed together with agreement measures which permit the selection of the optimal model for the data from these classes.

The study, analysis, and prediction of consumer preferences play a large role in Business today. Large parts of the field of data-mining are devoted to the analysis of questions like:

(i) What characterizes consumer preferences? (i.e., how can we profile consumers)

(ii) What characterizes the relationship between different consumer preferences?, and

(iii) Which statistical models are appropriate for analyzing consumer preferences and how can such 'appropriateness' be measured.

(iv) what role (if any) does consumer reliability and the prior preferences of consumers play in such an analysis.

In our paper we address these questions by focusing on the characterization and prediction of brand label preferences. This is achieved by formulating very general latent variable models together with a methodology for choosing the right ones. These latent variable models generalize other treatments of the subject by allowing consideration of consumer reliability and prior consumer preferences.

Back to the top of the page






A Bayesian Method for Using Administrative Records to Predict Census Day Residency

by
Elizabeth Stuart
Harvard University
Statistics Department
1 Oxford St.
Cambridge, MA 02138
stuart@stat.harvard.edu

Abstract:

Administrative records are a promising data source for estimating census coverage or identifying people missed in the census. An important unsolved problem in using records is determining which of them correspond to people actually resident on Census Day. We propose a Bayesian hierarchical model in which one level describes the migration process, and the other describes the probabilities of observation in each of the available record systems. The observation model uses the full information in the records, including the dates associated with the records and available covariate information, and can accommodate a variety of record types, such as tax records, Medicare claims, and school enrollment lists. In addition, multiple record systems can be modeled concurrently simply by multiplying the likelihood of observation for each type. A Gibbs sampler is used to obtain estimates of the in- and out-migration dates, and thus an estimate of the probability of residency in the area on Census Day. This work extends the use of Bayesian methodology in the context of capture recapture population estimation, and could be useful in the context of an administrative records census, or as a way of expanding the role of administrative records in triple system estimation. This is joint work with Alan Zaslavsky.

Back to the top of the page






Use of the Bayesian approach in the analysis of clinical trials in patients with advanced lung cancer

by
Franz Torres , Gisela Gonzalez G, Tania Crombet ,Agustin Lage
Center of Molecular Immunology
Calle 216 esq. 15 Atabey. C. Habana. Cuba
franz@ict.cim.sld.cu

Abstract:

The use of Bayesian theory in small clinical trials is very useful. Different kinds of information can be combined to the actual experimental results and the inferences are drawn from posterior distributions for the parameters given the data. Two clinical trials on Non Small Cell Lung Cancer with a promising vaccine are analysed. In these trials two adjuvants are tested and the second trial adds a pretreatment with Cyclophosfamide. Immunogenicity and survival are the endpoints of interest. The log hazard ratio was analysed with an uninformative, sceptic and enthusiastic prior distributions. The respective posterior distributions were obtained by combination of the data with the respective prior. The second trial used the posteriors of the first trial as its priors. Probabilities, over the posterior distribution, of more than 5%, 10% and 15% improvement in survival were calculated. The frequentist approach was used also for survival comparison using the logrank test. On the first trial there was a mild 0.52 probability for more than 5% improvement in the log hazard ratio between the two adjuvants. Median survival of 8.07m and 8.00m were obtained for no difference with the logrank test. Considering all treated patients against the historical controls a mild evidence was obtained with probability of 0.472 in the sceptic scenario (Median of 8.00m and 5.67m). Analyzing the high responders patients, it was obtained a high evidence of more than 5% improvement with a probability of 0.799 with respect to controls in the sceptic scenario. Analyzing the second trial considering the accumulated evidence of the previous trial; the two adjuvant comparison showed a mild evidence, with probability of 0.47, for more than 5% improvement. There was a moderate to high evidence for a 5% improvement comparing all treated versus the controls. Considering the high responders there are very high evidences of a 5% and 10% improvement with probabilities of 0.799 and 0.922 in the uninformative and sceptic scenarios in contrast to the frequentist approach with no statistical difference. Useful information about the response of the patients to the vaccines as well as differences between the vaccines are captured using the Bayesian approach with different scenarios. We propose to use the frequentist and Bayesian approaches jointly in an analysis for a more complete research conclusion.

Back to the top of the page






Disclosure Risk and Information Loss in an Attitudinal Survey

by
Mario Trottini
Departamento de Estad\'istica e Investigac\'ion Operativa, Universitat de Val\`encia
Avenida Dr. Moliner, 50
46100 Burjassot, Valencia, Spain
mario.trottini@uv.es
M. Jes\`us Bayarri
Departamento de Estad\'istica e Investigac\'ion Operativa, Universitat de Val\`encia
Avenida Dr. Moliner, 50
46100 Burjassot, Valencia, Spain
susie.bayarri@uv.es
Stephen E. Fienberg
Department of Statistics, Carnegie Mellon University
Pittsburgh, PA, 15213
fienberg@stat.cmu.edu

Abstract:

Disclosure limitation denotes a set of techniques aimed to protect confidentiality in the release of statistical data . The problem is not trivial since protection of confidentiality should be achieved in a way which is compatible with the agency's mission of providing data users with good quality data. Many alternative disclosure limitation techniques have been proposed and a key problem is how to compare them in an efficient way. In this paper we address this problem in a small-scale simulation study. We use an adapted data set from an actual survey conducted by the Institute for Social Research at York University (Fienberg, Makov and Sanil, Journal of Official Statistics, Vol. 13, 1997). The data set consists of 662 respondents, and 5 approximately continuos variables: Age, Civil-Liberties, Canadian-U.S. relationship, Income and Attitude (towards Jews) . Alternative forms of data release are obtained contaminating the original data with various amount of bias and noise. The statistical agency has to decide which data is best to release. The choice requires suitable criteria to assess to what extent the released data can create harm to the data providers ( i.e. the disclosure risk ) and to what extent the released data can be beneficial to society (i.e. the data utility). We show that existing measures of disclosure risk and data utility, that only model the users' behavior, usually underestimate the agency's uncertainty and better and more general measures can be derived if a model for the statistical agency's behavior is also introduced.

Back to the top of the page






Accounting for Pile-Up in the Chandra X-ray Observatory

by
David van Dyk and Yaming Yu
Department of Statistics, Harvard University
Alanna Connors
Eurica Scientific
Aneta Siemiginowska and Vinay Kashyap
Smithsonian Astrophysical Observatory
Harvard University
vandyk@stat.harvard.edu

Abstract:

The Chandra X-ray Observatory was launched in July 1999 and boasts the World's most powerful X-ray telescope. Chandra records the binned time, energy, and location of high-energy photons that arrive at its detector. Pile-up occurs in such X-ray detectors when two or more (X-ray) photons arrive at the same location on the detector during the same time bin. Such coincident events are counted as a single higher energy event or are lost altogether if the total energy goes above the on-board discriminatory. Thus, for bright sources pile-up can seriously distort both the count rate and the energy distribution. Accounting for pile-up is perhaps the most important outstanding data-analytic challenge for Chandra. In this poster, we describe how Bayesian hierarchical models can be designed to account for pile-up in X-ray detectors and how they can be fit via Markov chain Monte Carlo. To account for pile-up, we stochastically separate a subset of the observed photons counts into multiple counts of lower energy based on the current iteration of the particular spectral/spatial model being fit. Because of the complexity of the pile-up process this remains a challenging statistical task requiring simulation of highly structured multi-modal distributions. Nonetheless, the Bayesian framework is promising because it allows the inclusion of other sources of information. For example, event grades (i.e., a description of the likelihood of the degree of pile-up based on the spatial distribution of the charge) can be used to improve the fit.

Back to the top of the page






Correction of Ocular Artifacts in the EEG using Bayesian Adaptive Regression Splines

by
Garrick Wallstrom and Robert Kass
Carnegie Mellon University
Department of Statistics
Pittsburgh, PA 15213
garrick@stat.cmu.edu

Abstract:

Ocular activity is a significant source of artifacts in the electroencephalogram (EEG). Regression upon the electrooculogram (EOG) is commonly used to correct the EEG. It is known, however, that this approach also removes high-frequency cerebral activity from the EEG. To counter this effect, we used Bayesian Adaptive Regression Splines (BARS) (DiMatteo (2001); DiMatteo, Genovese, and Kass (2001)) to adaptively filter the EOG of high-frequency activity before using the EOG for correction. In a simulation study, this approach reduced spectral error rates in higher frequency bands.

Back to the top of the page






Who Did Nader Really Raid? A Bayesian Analysis of Exit Poll Data from the 2000 US Presidential Elections

by
Lara J. Wolfson
Brigham Young University
Department of Statistics
Provo Utah 84602 USA
ljwolfson@byu.edu

Abstract:

How strongly was the close outcome of the 2000 US Presidential election influenced by the presence of third party candidates? In a presidential race where the margin of difference in the popular vote between the top two candidates was less than 1%, the votes that went to third-party candidates could have influenced the outcome of "swing" states in the electoral college. Media pundits have opined that the presence of Green Party candidate Ralph Nader on the ballot took such deciding votes away from Al Gore. These opinions derived both from popular wisdom about who likely Green Party voters were, as well as citing data from the 2000 exit polls conducted by the Voter News Service (VNS). In this paper, a Bayesian model for utilizing the VNS data to estimate the probable outcomes of the election in the absence of third party candidates is proposed, showing that careful examination of the exit poll data yields some startling results.

Back to the top of the page






Identifying differentially expressed genes in cDNA microarray experiments: an application of Bayesian methods using noninformative priors

by
Xiao Yang, Keying Ye and Ina Hoeschele
Department of Statistics,Virginia Tech
Blacksburg
VA 24060
xiyang@vt.edu

Abstract:

Recent advancements in biotechnology have made it possible for researchers to study the regulation and interactions simultaneously for thousands of genes using DNA microarrays. Microarrays have enormous potential of being applied in pharmaceutical and clinical research. By comparing the transcriptional levels of genes in two different tissue samples, coordinated expression patterns revealed from microarrays provide clues about gene function and shed light on complex biomolecular pathways and genetic circuits involved in complex traits in many organisms. One of the core objectives of microarray experiments is to identify those differentially expressed genes through measured gene expression levels. However, raw measures (in terms of intensities from two dyes) can be affected by many sources of variations, which makes the inference about the fold change of gene expression across samples almost infeasible based only on raw intensity measurements. Prior normalizations (in terms of eliminating systematic variations due dyes, slides, etc) have to be made before any statistical method can be applied. This paper propose a Bayesian method in the context of generalized Fieller-Creasy problem using noninformative priors, while the parameter of primary interest being the ratio of two population means. This paper is motivated by the fact that measurements taken from microarray experiments are usually not normally distributed, often heavily-tailed or skewed, and that the Fieller-Creasy problem fits the objectives of cDNA microarray experiments, since we are also interested in the inference of the fold change of gene expressions across two tissue samples. We generalize the Fieller-Creasy problem to the case of non-normal distributions, such as the students t family. Results are compared with those from other standard Bayesian methods, such as methods by Newton et al. (2000) and by Baldi and Long (2001). Implications for future studies are also discussed.

Back to the top of the page






Probabilistic Methods for Robotic Landmine Search

by
Yangang Zhang, Mark Schervish, Ercan U. Acar and Howie Choset
Carnegie Mellon University
Pittsburgh, PA 15213
yazhang,mark,eua,choset+@andrew.cmu.edu

Abstract:

One way to improve the efficiency of mine search, compared with a complete coverage algorithm, is to direct the search based on the spatial distribution of the minefield. The key for the success of this probabilistic approach is to efficiently extract the spatial distribution of the minefield during the process of the search. In our research, we assume that a minefield follows a regular pattern, which belongs to a family of known patterns. A Bayesian approach to pattern extraction is developed to extract the underlying pattern of the minefield. The algorithm performs well in its ability to catch the ``actual'' pattern in the situation where placement and detector errors exist. And the algorithm is efficient, therefore, online implementation of the algorithm on a mobile robot is possible. Compared to the likelihood approach, the advantage of using a Bayesian approach is that this approach provides information about the uncertainty of the extracted ``actual'' pattern.

Back to the top of the page






Back to Bayes 01 Homepage