The Cox Proportional Hazards model is used in survival analysis,
which generally involves studying the effects of various variables
on survival time. For instance, a study of the Herero people
investigated the effect of gender on mortality. These data are in the
file anthro.dat
. The file contains a header as well as row names
(codes for each subject).
The study ended in 1989. After that, we have no information on how much longer the surviving subjects lived. Record the last year that each subject was seen, and a 0/1 variable for whether they died before 1989. Those who did not die before 1989 are considered ``censored''.
I generated a vector (year.last.seen
) which is the year
of death (for people who died before 1989) or 1989 (for people
who did not die before 1989). I then subtracted year of
birth from this to get the age at which a person was last seen
(age.last.seen
). I created a vector (is.dead
) with
deaths represented as 1's and people still alive at the end of the
study represented as 0's.
Question: How do you do this?
I used these two vectors to create a survival object:
> anthro.surv <- Surv(age.last.seen, is.dead) > anthro.surv[1:30] [1] 25+ 33+ 28+ 5+ 35+ 33+ 31+ 29+ 22 22+ 20+ 0 21+ 15+ 13+ 8+ 7+ 15+ 32+ [20] 21 23+ 19+ 15+ 3 38+ 36+ 34+ 32+ 28+ 25+A survival object consists of survival times, followed by a
+
if
that observation was censored (in this case meaning that they were still
alive at the end of the study). This means that the first person was 25
and still alive at the end of the study, the second person was 33 and still
alive, etc. The ninth person died at age 22, the 12th person died at
age 0, etc.
Now that we have a vector in the survival analysis format, we can
use that as the response variable in a Cox proportional hazards model.
This sort of model uses the same format as before. This time there is
a new argument included, na.action=na.omit
, to skip over the
several people whose gender was not recorded. The relevant variables
appear to be year of birth, number of living siblings, and gender:
> coxph(anthro.surv ~ birth.year + sibs + sex, na.action=na.omit) Call: coxph(formula = anthro.surv ~ birth.year + sibs + sex, na.action = na.omit) coef exp(coef) se(coef) z p birth.year -0.0323 0.968 0.00754 -4.29 1.8e-05 sibs 0.0623 1.064 0.02405 2.59 9.6e-03 sex 0.7681 2.156 0.15709 4.89 1.0e-06 Likelihood ratio test=52.2 on 3 df, p=2.73e-11 n=1262 (6 observations deleted due to missing)The above shows that
sex
does have a significant effect on
mortality. Sex is listed alphabetically (female, male) so men
have a higher hazard rate of dying than women.