Pregunta

Problem

Given n-patient records with time and status variables (among others), I would like to obtain their survival risk in the time period they're within ie 2, 4, 6, 8, 10 years.

I have a division of 24 - 47 months (2 years), 48 - 83 months (4 years), 84 - 107 months (6 years), 108 - 119 months (8 years) and 120 - "up to what's available" months (10 years).

In an individual perspective, a patient with survival months of 30 months will be included in the two-year period and along with the other predictive variables I want to know this patient's survival risk within two years.

My method

I'm retrieving survival risk percentages for my data using the R code described in this thread.

km <- survfit(Surv(time, status)~1, data=mydata)
survest <- stepfun(km$time, c(1, km$surv))

The time variable is the survival months and the status has values 1 and 0 for alive and dead respectively.

The code outputs something like this (taken from here):

> survest(0:100)
 [1] 1.0000000 0.9854015 0.9781022 0.9708029 0.9635036 0.9635036 0.9635036
 [8] 0.9416058 0.9124088 0.9124088 0.8978102 0.8905109 0.8759124 0.8613139
 [15] 0.8613139 0.8467153 0.8394161 0.8394161 0.8175182 0.8029197 0.7883212
 [22] 0.7737226 0.7664234 0.7664234 0.7518248 0.7299270 0.7299270 0.7225540
 [29] 0.7225540 0.7151810 0.7004350 0.6856890 0.6856890 0.6783160 0.6783160

My question is: are these the actual survival estimates for my 300,000 individual records wherein I need to use survest(0:300000)? I tried survest(0:1000) but the result already converged to some value and this does not answer my problem.

¿Fue útil?

Solución

As mentioned in my comment, I don't think it is possible to get KM-estimates for individual patients. The KM-estimator gives the observed probability of survival at a certain timepoint on a population level. The observed survival probability for an individual, however, is either 0 (death) or 1 (alive), nothing in between.

Instead of observed survival probabilities you will have to use some sort of model (e.g. Cox PH, accelerated failure time model, neural network etc.) to get predicted survival probabilities. These probabilities inform you about the risk of an individual with that particular variable combination to be alive at a particular timepoint.

UPDATE: with example code based on code the OP provided here

library(pec) ; library(rms)

# Simulate data
set.seed(1)
examp.data <- SimSurv(3000)

# fit a Cox model with predictors X1+X2
coxmodel <- cph(Surv(time,status)~X1+X2, data=examp.data, surv=TRUE) 

# predicted survival probabilities can be extracted at selected time-points:
ttt <- quantile(examp.data$time)
ttt
#          0%          25%          50%          75%         100% 
#6.959458e-03 9.505409e+00 3.077284e+01 7.384565e+01 7.100556e+02 

# Get predicted survival probabilities at selected time-points:
preds <- predictSurvProb(coxmodel, newdata=examp.data, times=ttt)

# Store in original data
examp.data$predict.surv.prob.Q1 <- preds[,1] # pred. surv. prob. at  0.006959458
examp.data$predict.surv.prob.Q2 <- preds[,2] # pred. surv. prob. at  9.505409
examp.data$predict.surv.prob.Q3 <- preds[,3] # pred. surv. prob. at  30.77284
examp.data$predict.surv.prob.Q4 <- preds[,4] # pred. surv. prob. at  73.84565
examp.data$predict.surv.prob.Q5 <- preds[,5] # pred. surv. prob. at  710.0556

Now you have predictions of the survival probabilities at those 5 timepoints for each individual in your data. Of course, you do need to evaluate the predictive performance of your model in terms of discrimination (e.g. with the function cindex in pec-package) and calibration (with calibration plot, see rms-package).

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top