Scientific Background:
In developing countries, higher infant mortality is partially caused
by poor maternal and fetal nutrition. Clinical trials of micronutrient
supplementation are aimed at reducing the risk of infant mortality by
increasing birth weight. Because infant mortality is greatest among
the low birth weight infants (LBW) (less than 2500 grams), an effective
intervention may need to increase the birth weight among the smallest
babies. Although it has been demonstrated that supplementation
increases the birth weight in a trial conducted in Nepal, there is
inconclusive evidence that the supplementation improves their
survival. It has been hypothesized that a potential benefit of the
treatment on survival among the LBW is partly compensated by a null or
even harmful effects among the largest infants.
Data: The methods in this paper are motivated by a double
blind randomized community trial in rural Nepal (Christian et al
2003a,b). The investigators administered an intervention program to
evaluate benefits of the following micronutrient supplementations: 1)
folic acid, and vitamin A; 2) folic acid, iron, and vitamin A; 3)
folic acid, iron, zinc, and vitamin A; 4) multiple nutrients and
vitamin A. The control was vitamin A alone. Each micronutrient
supplement was administered weekly to 1000 pregnant women, who
ultimately delivered 800 live born infants approximately. Details on
the study designs are illustrated in (Christian et al 2003a). The team
measured the birth weight within 72 hours of delivery and then
followed the infants for one year to determine whether or not they
survived. In addition the team measured several characteristics of the
mother (maternal age, parity, maternal height, arm circumference, etc)
and of the infant (weight, length, head and chest circumference).
Scientific Questions:
- Is there a causal effect of
the treatment on birth weight? Does this causal effect vary with
respect to the percentiles of the birth weight distribution? Is it
largest among the LBW infants?
- Is there a causal effect of the
treatment on survival? Does this causal effect vary with respect to
the percentiles of the birth weight distribution? Is it largest among
the LBW infants?
- Is the effect of the treatment on survival
mediated wholly or in part by increases in birth weight?
- Do these percentile-specific causal effects on birth weight and survival
differ across the four micronutrients? Are some of the studied
micronutrients harmful for the largest babies?
Complex Aspects of the Problem: The data analysis is
challenged by measurement error and informative missing data in the
main outcomes. In community-based interventions in developing
countries, a large proportion of births occur in the home without
assistance from trained birth attendants. Approximately 88% of the
babies are measured within 72 hours of the delivery. The remaining
22% are measured between the 72 and the 2000 hours approximately.
Hence, weights are obtained at varying times following birth and therefore they
are imprecise measures of the ``true weight at birth.'' In addition,
a high proportion of deaths of young infants occur in the first few
hours of birth. If there is a delay in reaching the mother and infant,
then many of these infants cannot be weighed because they have already
died. Approximately 7% of the birth weight measurements are missing
and among this 7%, 34% of the babies have died right after the
delivery. These babies are likely to have been of lower birth weight
than those who survived to be weighed, and therefore, these missing
birth weights due to death are likely to be informative of birth
weight. To overcome these challenges, we plan to develop a Bayesian
measurement error model that allows us to estimate the ``true weight
at birth'' for both the measurements made after the 72 hours and for
the missing measurements as a function of the mother's characteristics
and the vital status of the baby.
Bayesian approach with data augmentation for estimating
percentile-specific causal effects: We plan to develop a Bayesian
approach for causal inference for this case-study of micronutrient
supplementation. Our approach integrates for the first time Bayesian
methods and data augmentation (Tanner and Wong, 1987; Tanner 1991;
Albert and Chib, 1993; Chib and Green 1998) with a causal model with
counterfactuals and principal stratification (Rubin, 1878; Holland,
1986; Frangakis and Rubin, 2002). We first define causal parameters
that measure the effects of an intervention on a clinical outcome
(infant mortality) that are allowed to vary with the percentiles of
the post-treatment variable (birth weight). Secondly, we implement the
causal statistical framework of principal stratification (Frangakis
and Rubin, 2002) to compare the causal ``direct'' effect of the
treatment on mortality, from the causal effect of the treatment on
mortality that is ``mediated'' by post-treatment changes in the birth
weight. A Bayesian approach to causal inference is very attractive
because we can: 1) calculate the posterior distributions of
percentile-specific causal effects accounting for the uncertainty
about the missing counterfactuals, measurement error, and missing
data; and 2) investigate the sensitivity of causal inferences to key
assumptions for which there are no direct observations in the data
set.
Timeline for the completion of the work: We have recently
submitted a manuscript (http://www.bepress.com/jhubiostat/paper68/)
where we define percentile-specific causal effects and present results
of this case study for one multiple nutrient (iron, folic acid, and
vitamin A) only. In the manuscript we would prepare for the 2005
Bayesian workshop, we would plan to: 1) extend our approach to account
for measurement error and informative missingness in the birth weight
variable; 2) assess sensitivity of the results with respect to the
measurement error and the missing data models; 3) detail critical
decisions in the model building, including the definition of causal
percentile-specific parameters, the relationship between survival and
birth weight, and the Bayesian implementation of principal
stratification for estimating the direct and mediated effects; 4)
assess the sensitivity of the results to modelling assumptions
inherent to associations between observed data and the
counterfactuals; 5) detail computational aspects of the Bayesian model
with data augmentation; and finally 6) present and contrast the
results with respect to the four multiple nutrients administered.
Public Health Impact: Currently recommendations exist for
supplementing women with iron-folic acid during pregnancy in
developing countries. This case study will provide critical
information toward the evaluation and planning of these public health
interventions. In fact, preliminary analyses of these data indicate
that it is important to be, at the very least, cognizant of the
differential beneficial effects of an intervention depending on where
in the distribution the program participants fall and that an overall
effect size may: 1) under-estimate the maximum likely benefit in the
most malnourished individuals; and 2) incorrectly assume benefits
where none exist and potentially mask harm in the more well-nourished
individuals.
Non-statistician collaborators: Joanne Katz, Professor,
Department of International Health and Parul Christian, Associate
Professor, Department of International Health, all at Johns
Hopkins University. Dr Christian and Katz are the co-investigators
of the community-based trial in Nepal.
REFERENCES
Albert, J.H. and Chib, S. (1993). "Bayesian Analysis of Binary and
Polychotomous Response Data." Journal of the American Statistical Association, 88, 669--679.
Chib, S. and Greenberg, E. (1998). "Analysis of Multivariate
Probit Models." Biometrika, 85, 347--361.
Christian, P., Khatry, S., Katz, J., Pradhan, E., LeClerq, S.,
Shrestha, S., Adhikari, R., Sommer, A., and West, K. (2003a). "Effects
of alternative maternal micronutrient supplements on
low birth weight in rural Nepal: double blind randomised community
trial." British Medical Journal, 326, 1--6.
Christian, P., West, K., Khatry, S., Leclerq, S., Pradhan, E., Katz, J.,
Shrestha, S., and Sommer, A. (2003b). "Effects of maternal micronutrient supplementation on fetal loss and infant mortality: a cluster-randomized trial in Nepal." American Journal of Clinical Nutrition, 78, 1194--1202.
Frangakis, C.E. and Rubin, D.B. (2002). "Principal Stratification in Causal Inference." Biometrics, 58, 1, 21--29.
Holland, P. (1986). "Statistics and Causal Inference." Journal
of American Statistical Association, 81, 945--960.
Rubin, D.B. (1978). "Bayesian Inference for Causal Effects: The Role of
Randomization." The Annals of Statistics, 6, 34--58.
Tanner, M.A. (1991). Tools for Statistical Inference --
Observed Data and Data Augmentation Methods, vol. 67 of Lecture
Notes in Statistics. New York: Springer-Verlag.
Tanner, M.A. and Wong, W.H. (1987). "The calculation of posterior
distributions by data augmentation." Journal of the American
Statistical Association\/, 82, 398, 528--550.
|
Although ocean and atmosphere are similar in many respects --- they
are both geophysical fluids --- the ocean is less well understood
due principally to much lower data density in both space and time.
Historically, physical oceanographic data was collected by
labor-intensive, time-consuming and expensive ships. Over the
past several decades, ocean data has additionally been collected
by satellites, current meters and a variety of floats and
drifters. These instruments have added immeasurably to our
understanding of the ocean, yet because of their relatively recent
deployment, inferences about long term changes in the ocean must
be founded on ship-based measurements. Inferences about long term
changes in the ocean are not only intrinsically interesting (to
oceanographers), they are crucial in the current scientific and
public policy debates about global warming. Calculations based on
the amount of anthropogenically released carbon suggest that the
earth's atmosphere should have warmed by an amount greater than
what has been observed. Are the calculations wrong, or has the
excess heat gone somewhere not accounted for? One possibility is
that the excess heat has gone into the ocean. This possibility is
being explored by climate modellers, physical oceanographers and
others. But the task is much more difficult for the ocean than the
atmosphere primarily because the data density is so much less, in
both space and time. Additionally, because water has such a large
heat capacity, heat content increases are manifested as small
temperature changes, relative to the expected temperature changes
in the atmosphere.
This paper begins with some simple figures describing the missing
heat problem and illustrating relative data densities for the
ocean, land and atmosphere. Then the bulk of the paper describes
two ways in which our research team --- a physical oceanographer,
a statistician and several graduate students --- have been looking
at the data to estimate changes in temperature, salinity and
vertical structure of oceans in the last 50 years or so.
The phenomenon of interest is temporal change over large spatial
scales. Traditionally, physical oceanographers estimate such
temporal change by examining the data from repeated occupations of
a transect. I.e., they look for instances where data-collecting
ships have sampled the same latitude or longitude line in
different decades. Two examples are the line of $24^\circ$N
latitude in the North Atlantic which was occupied in 1957, 1981
and 1992 and the line of $53^\circ$W longitude, also in the North
Atlantic, which was occupied in 1956, 1983 and 1997. Their method
is to look at the three possible pairwise comparisons of years.
We will describe a spatio-temporal model that accounts not only
for data from the three occupations, but also for data from other
cruises that passed through or near the area of interest and in
years other than those of the occupations. Our model allows us to
view the temperature in the target region as a time series, and to
see the three occupation years as part of that time series,
leading to a more complete picture of temporal change.
In addition, we describe the importance of distinguishing property
changes on isobars from that on isopycnals. An isobar is
a surface of constant pressure, and hence approximately at a constant
depth. An isopycnal is a surface of constant density. Because an
isopycnal is gravitationally-neutral, properties such as temperature
and salinity can flow or diffuse much more easily on an isopycnal than
an isobar. When properties change temporally at a given location it is
useful to decompose the change into two parts: one part due to the
meandering or heaving of an isopycnal past a given depth, which can
happen on relatively short time scales such as months, and one part
due to changes on a given isopycnal, which typically represents
structural change in the ocean occurring on a longer time scale.
The second way we look for temporal change is through the so-called
mixed layer. The upper ocean (approx. 50-200m depending on
season and location) is well mixed vertically through convection, so
temperature and density are roughly constant. The mixing is
biologically important because it brings nutrients to the surface
where there is also light, thus promoting the production of
phytoplankton, the primary component of ocean ecosystems. Again, this
is significant for global warming because primary production is a sink
for carbon. Below the mixed layer, temperature decreases and density
increases rapidly and convection is inhibited. The depth and
temperature of the mixed layer are subject to a solar-driven annual
cycle. The second part of our research focuses on long-term changes in
depth and temperature of the mixed layer. Because measurements are
almost never taken at the same location we employ a statistical model
with components for spatial, annual and long-term trends. Interest
centers on the long-term; the spatial and annual must be modelled in
order to handle the long-term accurately.
Modelling mixed layer depth leads to a novel development in
statistical theory: assessing the likelihood function directly from
the oceanographer rather than through a sampling model for data. One
would usually get the likelihood function from the distribution of
temperature as a function of mixed layer depth. Unfortunately, there
is no good model for that distribution. We will show first, that the
standard physically-based data model does not fit the data well and
second, that change-point models yield incorrect measures of
uncertainty. Therefore, we use direct assessment of the likelihood.
The oceanographer is queried about a small number of profiles; then we
construct an algorithm that mimics the oceanographer's assessments and
apply that algorithm to the thousands of profiles in the data base.
This procedure raises both practical and philosophical concerns.
|