Competing risks

Kyle (1993) studied 241 cases of monoclonal gammopathy identified at the Mayo Clinic before 1 January 1971 with between 20 and 35 years of total followup on each patient. The response variable is the time to the first of various adverse events death (n = 130), multiple myeloma (n = 39), and `other' (n = 20). Reference: Kyle RA (1993) `Benign' monoclonal gammopathy --- after 20 to 35 years of follow-up. Mayo Clinic Proceedings, 68:26 - 36. Most subjects in the study were discovered incidentally in the process of being examined for other indications. The laboratory values (albumin, creatinine, etc.) may be related to the severity of those other indications, but have shown less relationship to MGUS per se.

In a competing risks analysis, we assign one stratum for each outcome type and all subjects appear in each stratum. In the following example data set the first subject experiences death at day 760 and the second subject experiences lymphoproliferative disease at day 2160.

id time status endpoint sex age hgb creat mspike
1 760 1 death 2 79 1.5 1.2 2.0
1 760 0 myeloma 2 79 1.5 1.2 2.0
1 760 0 other 2 79 1.5 1.2 2.0
2 2160 0 death 2 76 13.3 1.0 1.8
2 2160 0 myeloma 2 76 13.3 1.0 1.8
2 2160 1 other 2 76 13.3 1.0 1.8

id: patient identifier.
time: day of event.
status: 0 = censored, 1 = died.
endpoint: reason for failure - death, myeloma or other lymphoproliferative disorder.
sex: 1 = male, 2 = female.
age: patient age at enrolment.
hgb: plasma haemoglobin concentration at enrolment.
creat: plasma creatinine concentration at enrolment.
mspike: size of monclonal spike.

The code for a competing risks analysis is as follows:

library(survival)
setwd("D:\\TEMP")
dat <- read.table("mgus2.csv", header = TRUE, sep = ",")

mgus2.km <- survfit(Surv(time, status) ~ endpoint, type = "kaplan-meier", data = dat)
plot(mgus2.km, xlab = "Days to endpoint", ylab = "Cumulative proportion to experience event", lty = c(1,2,3), mark.time = FALSE)
legend(x = "topright", legend = c("Death","Myeloma","Other"), lty = c(1,2,3), bty = "n")

Even though the data set contains three observations for each subject, because the endpoint times are independent it is valid to apply a log rank test to compare survivorship for the three outcomes. This is similar to the situation when one has paired data where the two measurements are independent (or uncorrelated) and a two-sample test would be used (rather than a paired sample t-test).

survdiff(Surv(time, status) ~ endpoint, data = dat, na.action = na.omit, rho = 0)

Time to endpoint differs according to outcome (Chi-squared test statistic = 111; df 2; P < 0.001).

mgus2.cph01 <- coxph(Surv(time, status, type = "right") ~ sex + age + hgb + mspike + cluster(id) + strata(endpoint), data = dat)
summary(mgus2.cph01)

Variable Subjects Failed Regression coefficient (SE) P Hazard ratio (95% CI)
Sex 720 188 -0.3399 (0.1563) 0.03 0.71 (0.53 - 0.96)
Age 720 188 0.0514 ( 0.00735) < 0.01 1.05 (1.04 - 1.07)
Haemoglobin 720 188 -0.1664 (0.0443) < 0.01 0.85 (0.79 - 0.91)
Monoclonal spike 720 188 -0.0878 (0.1869) 0.63 0.92 (0.64 - 1.31)

R2 = 0.100.
Likelihood ratio test = 74.4 on 4 df, P < 0.01.

Table 1. Competing risks regression model showing the effect of sex, age, haemoglobin concentration and size of monoclonal spike on the daily hazard of `endpoint'.

Note the use of cluster(id), to apply a robust (sandwhich) estimate to correct for multiple events that an individual experiences. This correction is not required in the competing risks case when each subject can have at most one outcome event.

The advantage of a large data set is that it allows for easy estimation of within-event-type coefficients. For instance, one might ask if the effect of age is identical for both outcomes, while controlling for the common effect of haemoglobin. This can be investigated by coding two dummy variables:

age1 <- dat$age * (dat$endpoint == "death")
age2 <- dat$age * (dat$endpoint != "death")
mgus2.cph02 <- coxph(Surv(time, status, type = "right") ~ sex + age1 + age2 + hgb + mspike + strata(endpoint), data = dat)
summary(mgus2.cph02)

Age is a significant predictor of the overall death rate. Age is, however, of far less importance in predicting the likelihood of plasma cell malignancy.

Variable Subjects Failed Regression coefficient (SE) P Hazard ratio (95% CI)
Sex 720 188 -0.3274 (0.1554) 0.03 0.72 (0.53 - 0.98)
Age 1 720 188 0.0760 (0.0093) < 0.01 1.08 (1.06 - 1.10)
Age 2 720 188 0.0026 (0.0123) 0.83 1.00 (0.98 - 1.03)
Haemoglobin 720 188 -0.1613 (0.0446) < 0.01 0.85 (0.78 - 0.93)
Monoclonal spike 720 188 -0.0887 (0.1870) 0.64 0.91 (0.63 - 1.32)

R2 = 0.128.
Likelihood ratio test = 96.7 on 5 df, P < 0.01.

Table 2. Competing risks regression model showing the effect of sex, age (for deaths or for all other reasons), haemoglobin concentration and size of monoclonal spike on the daily hazard of `endpoint'.