Non-parametric survival

An Australian study by Caplehorn and Bell (1991) compared retention in two methadone treatment clinics for heroin addicts. A patient's survival time was determined as the time in days until the patient dropped out of the clinic or was censored at the end of the study. The two clinics differed according to their overall treatment policies. Interest lies in identifying factors that influence retention time: clinic, maximum daily methadone dose, and presence of a prison record.

Kaplan-Meier method

Figure 1 is a Kaplan-Meier survival curve showing the cumulative proportion of addicts retained in the clinics over time. Figure 1 shows that the rate of loss of patients over time is relatively constant and that approximately 15% remain in treatment by 1000 days post admission.

\TeX \begin{figure}[h]
\centering
\includegraphics[width=8.00cm]{../images/SA_addict_KM.png}
\caption{Kaplan-Meier survival curve showing the cumulative proportion of heroin addicts retained in two methadone treatment clinics (Caplehorn and Bell 1991).}
\label{fig:SA_addict_KM}
\end{figure}

Figure 1: Kaplan-Meier survival curve showing the cumulative proportion of heroin addicts retained in two methadone treatment clinics (Caplehorn and Bell 1991).

Figure 1 was produced using the following code:

library(survival)
dat <- read.table("addict.csv", header = TRUE, sep = ",")
head(dat)

id start stop status clinic prison dose
1 0 428 1 1 0 50
2 0 275 1 1 1 55
3 0 262 1 1 0 55
4 0 183 1 1 0 30
5 0 259 1 1 1 65
6 0 714 1 1 0 55

id: patient identifier.
start: day of entry to clinic.
stop: day of relapse.
status: 0 = censored, 1 = died.
clinic: 1 = clinic 1, 2 = clinic 2.
prison: 0 = prison record absent, 1 = prison record present.
dose: maximum daily methadone dose (mg/day).

The function Surv creates a survival object in R linking survival time and censoring:

Surv(dat$stop, dat$status)

The output above shows survival times for each subject. A plus sign after time indicates that the subject was censored. Plot the Kaplan-Meier survival function of days from discharge from clinic to relapse:

addict.km <- survfit(Surv(stop, status) ~ 1, conf.type = "none", type = "kaplan-meier", data = dat)
plot(addict.km, xlab = "Days to relapse", ylab = "S(t)")

Kaplan-Meier survival function with confidence intervals:

addict.km <- survfit(Surv(stop, status) ~ 1, type = "kaplan-meier", data = dat)
plot(addict.km, xlab = "Days to relapse", ylab = "S(t)", conf.int = TRUE)

Summary estimates of survival at different time points throughout the follow up period:

summary(addict.km, times = seq(from = 0, to = 1000, by = 250))

Kaplan-Meier survival function of days to relapse, stratifying by clinic:

addict.km <- survfit(Surv(stop, status) ~ clinic, type = "kaplan-meier", data = dat)
plot(addict.km, xlab = "Days to relapse", ylab = "S(t)", lty = c(1,2))
legend(x = "topright", legend = c("Clinic 1","Clinic 2"), lty = c(1,2), bty = "n")

Kaplan-Meier survival function of days to relapse, stratifying by methadone dose:

dat$cdose[dat$dose < 60] <- 0
dat$cdose[dat$dose >= 60] <- 1
addict.km <- survfit(Surv(stop, status) ~ cdose, type = "kaplan-meier", data = dat)
plot(addict.km, xlab = "Days to relapse", ylab = "S(t)", lty = c(1,2))
legend(x = "topright", legend = c("Low dose methadone","High dose methadone"), lty = c(1,2), bty = "n")

The survminer package can be used to make more attractive survival plots (Figure 2).

Kaplan-Meier survival function of days to relapse with confidence intervals:

library(survminer)
ggsurvplot(addict.km)

Add a dashed line to indicate median survival time. The setting surv.median.line = hv draws both horizontal and vertical lines on the plot:

ggsurvplot(addict.km, surv.median.line = "hv")

Add a table to show the number of individuals at risk at different follow-up times:

ggsurvplot(addict.km, surv.median.line = "hv",
   risk.table = TRUE, tables.height = 0.1, tables.theme = theme_cleantable(), risk.table.pos = "in",
   xlab = "Days to relapse", ylab = "S(t)")

\TeX \begin{figure}[h]
\centering
\includegraphics[width=8.00cm]{../images/SA_addict_KM_ggsurvplot.png}
\caption{Kaplan-Meier survival curve for the addict data plotted using the survminer package. The dashed line indicates the median time of relapse. The numbers of individuals at risk at different follow-up times are shown in red just above the horizontal axis.}
\label{fig:SA_addict_KM_ggsurvplot}
\end{figure}

Figure 2: Kaplan-Meier survival curve for the addict data plotted using the survminer package. The dashed line indicates the median time of relapse. The numbers of individuals at risk at different follow-up times are shown in red just above the horizontal axis.

Flemington-Harrington estimator

addict.km <- survfit(Surv(stop, status) ~ 1, type = "kaplan-meier", data = dat)
addict.fh <- survfit(Surv(stop, status) ~ 1, type = "fleming-harrington", data = dat)

par(pty = "s", mfrow = c(1,2))
plot(addict.km, xlab = "Days to relapse", ylab = "S(t)", conf.int = FALSE)
plot(addict.fh, xlab = "Days to relapse", ylab = "S(t)", conf.int = FALSE)

With this data set the difference between the Kaplan-Meier and the Flemington-Harrington estimate of survival is not obvious. A closer comparison of the two functions:

tmp <- as.data.frame(cbind(km = addict.km$surv, fh = addict.fh$surv))
head(tmp)
tail(tmp)

Instantaneous hazard

addict.km <- survfit(Surv(stop, status) ~ 1, conf.type = "none", type = "kaplan-meier", data = dat)

Work out the proportion that fail at each evaluated time period:

names(addict.km)
prop.fail <- addict.km$n.event/addict.km$n.risk

Work out the length of time over which these failure occur:

time <- addict.km$time
time
[1] 2, 7, 13, 17, ..., 1076

time0 <- c(0, time[-length(time)])
time0
[1] 0, 2, 7, 13, ..., 1052

Divide prop.fail by the time interval over which those failures occur (that is, time - time0) to get the probability of failing per unit time, i.e. the instantaneous hazard:

haz <- prop.fail/(time - time0)

Plot the result:

plot(time, haz, ylim = c(0,0.03), type = "s", xlab = "Days to relapse", ylab = "h(t)")
lines(lowess(time[-1], haz[-1], f = 0.10))

Tidier plot:

plot(time, haz, type = "n", xlab = "Days to relapse", ylab = "h(t)", ylim = c(0,0.03))
lines(lowess(time[-1], haz[-1], f = 0.10))

A simpler way to plot instantaneous hazard using the epiR package:

library(epiR)
addict.km <- survfit(Surv(stop, status) ~ 1, conf.type = "none", type = "kaplan-meier", data = dat)
addict.haz <- epi.insthaz(addict.km)

plot(addict.haz$time, addict.haz$est, xlab = "Days to relapse", ylab = "h(t)", ylim = c(0,0.03), type = "n")
lines(lowess(addict.haz$time, addict.haz$est, f = 0.10), lty = 1)
lines(lowess(addict.haz$time, addict.haz$lower, f = 0.10), lty = 2, col = "gray")
lines(lowess(addict.haz$time, addict.haz$upper, f = 0.10), lty = 2, col = "gray")

Cumulative hazard

A cumulative hazard plot shows the (average) cumulative number of events likely to be experienced by a subject as a function of time. A cumulative hazard of 2.0 at day 1000 means that, on average, a subject will have experienced 2.0 events by the time it reaches day 1000.

Plot the cumulative hazard of relapse:

addict.km <- survfit(Surv(stop, status) ~ 1, conf.type = "none", type = "kaplan-meier", data = dat)
addict.haz <- epi.insthaz(addict.km)

plot(addict.km, fun = "cumhaz", xlab = "Days to relapse", ylab = "H(t)", lty = c(1,2))

Compare cumulative hazard as a function of time with instantaneous hazard as a function of time:

par(pty = "s", mfrow = c(1,2))
plot(addict.km, fun = "cumhaz", xlab = "Days to relapse", ylab = "H(t)")
plot(addict.haz$time, addict.haz$est, xlab = "Days to relapse", ylab = "h(t)", ylim = c(0,0.06), type = "s")