Question

My original post, which is currently unanswered and somewhat open ended, can be found here

I have been trying to figure out how to work with this and will relate some sample data, where I have gotten to and what my current issue is.

So, my data, or rather a brief sample of my data, looks like this:

zipcode xcoord        ycoord    age_age6574 age_age75plus   sex_female  stage_late  death_death access  TruncTime
 51062  211253.4259 4733174.483     0           1               0             0         1           40      121
 51011  212255.621  4757938.874     0           1               0             0         0           43      121
 51109  215303.4471 4721047.303     0           1               1             1         0           21      121

This data has been preprocessed such that there exists dummy/binary variables in place of actual categories- age_age6574 and age_age75plus make up one category, sex_female another, stage_late another, death_death another. Access is a continuous variable. TruncTime will be treated as a discrete time variable. The variable in that will be used as a censoring variable will be death_death.

Calling the functions:

So, to import everything I do the following:

 from rpy2 import robjects
 from rpy2.robjects.packages import importr
 from rpy2.robjects.vectors import DataFrame
 survival = importr('survival')
 coxph = survival.coxph
 Surv = survival.Surv
 theData = DataFrame.from_csvfile(newDataFile, header=True, sep=',')

So everything is all setup to do Cox Proportional Hazards - I think!

Within R I can do:

 coxph(formula = Surv(TruncTime, death_death) ~ age_age6574 + 
 age_age75plus + sex_female + stage_late + access, method = "breslow")

and everything works out fine.

When I do this same thing within Python, using everything I have described, I get an error that reads:

This is the function call:

  coxph(Surv('TruncTime', 'death_death'), 'age_age6574'+'age_age75plus'+'sex_female'+'stage_late'+'access', data = theData, method = 'breslow')

This is the error returned:

  Error in (function (time, time2, event, type = c("right", "left", "interval",  : 
  Time variable is not numeric

So, I'm wondering what I'm doing wrong in the function call (why it is returning this error) and how I go about properly calling it? Additionally, I'll also wondering if there is a way to change the censoring from a '0' to a '1' (reverse the way that the censoring variable works)?

::::UPDATE::::

So I found out part of my problem was specifying which columns/attributes to use. Apparently, rpy2 needs numerical indices of columns to call the functions. So:

 Surv(theData[9], theData[7])

for the survival part of the coxph. The Surv part works.

Now, I'm still trying to work out how to specify everything else. Namely:

-How do I specify the other variables to use in building the model. There is a problem with using the + in linking these variables together. Also ~ doesn't work as it does in regular R. The following DOES NOT work

  coxph(Surv(theData[9], theData[7])~theData[3]+theData[4]+theData[5]+theData[6]+theData[8], data = theData, method = 'breslow')

I also tried replacing the '~' with a `,' such as:

  coxph(Surv(theData[9], theData[7]), theData[3]+theData[4]+theData[5]+theData[6]+theData[8], data = theData, method = 'breslow')

-It is definitely having a problem with those + and I'm not sure replacing the ~ with a , has actually worked.

Was it helpful?

Solution

Better to use a Formula.

Would this work ?

from rpy2.robjects import Formula 
coxph(Formula("Surv(TruncTime, death_death) ~ " \
              "age_age6574 + age_age75plus + sex_female + stage_late + access"),
      data = theData, method = 'breslow')
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top