next up previous
Next: About this document ... Up: Statistical Modeling in S-PLUS Previous: A Binomial Model (logistic

Cox Proportional Hazards

The Cox Proportional Hazards model is used in survival analysis, which generally involves studying the effects of various variables on survival time. For instance, a study of the Herero people investigated the effect of gender on mortality. These data are in the file anthro.dat. The file contains a header as well as row names (codes for each subject).

The study ended in 1989. After that, we have no information on how much longer the surviving subjects lived. Record the last year that each subject was seen, and a 0/1 variable for whether they died before 1989. Those who did not die before 1989 are considered ``censored''.

I generated a vector (year.last.seen) which is the year of death (for people who died before 1989) or 1989 (for people who did not die before 1989). I then subtracted year of birth from this to get the age at which a person was last seen (age.last.seen). I created a vector (is.dead) with deaths represented as 1's and people still alive at the end of the study represented as 0's.

Question: How do you do this?

I used these two vectors to create a survival object:

> anthro.surv <- Surv(age.last.seen, is.dead)
> anthro.surv[1:30]
 [1] 25+ 33+ 28+  5+ 35+ 33+ 31+ 29+ 22  22+ 20+  0  21+ 15+ 13+  8+  7+ 15+ 32+
[20] 21  23+ 19+ 15+  3  38+ 36+ 34+ 32+ 28+ 25+
A survival object consists of survival times, followed by a + if that observation was censored (in this case meaning that they were still alive at the end of the study). This means that the first person was 25 and still alive at the end of the study, the second person was 33 and still alive, etc. The ninth person died at age 22, the 12th person died at age 0, etc.

Now that we have a vector in the survival analysis format, we can use that as the response variable in a Cox proportional hazards model. This sort of model uses the same format as before. This time there is a new argument included, na.action=na.omit, to skip over the several people whose gender was not recorded. The relevant variables appear to be year of birth, number of living siblings, and gender:

> coxph(anthro.surv ~ birth.year + sibs + sex, na.action=na.omit)
Call:
coxph(formula = anthro.surv ~ birth.year + sibs + sex, na.action = na.omit)


              coef exp(coef) se(coef)     z       p
birth.year -0.0323     0.968  0.00754 -4.29 1.8e-05
      sibs  0.0623     1.064  0.02405  2.59 9.6e-03
       sex  0.7681     2.156  0.15709  4.89 1.0e-06

Likelihood ratio test=52.2  on 3 df, p=2.73e-11  n=1262 (6 observations deleted
due to missing)
The above shows that sex does have a significant effect on mortality. Sex is listed alphabetically (female, male) so men have a higher hazard rate of dying than women.
next up previous
Next: About this document ... Up: Statistical Modeling in S-PLUS Previous: A Binomial Model (logistic
Brian Junker 2002-08-26