My original post, which is currently unanswered and somewhat open ended, can be found here
I have been trying to figure out how to work with this and will relate some sample data, where I have gotten to and what my current issue is.
So, my data, or rather a brief sample of my data, looks like this:
zipcode xcoord ycoord age_age6574 age_age75plus sex_female stage_late death_death access TruncTime
51062 211253.4259 4733174.483 0 1 0 0 1 40 121
51011 212255.621 4757938.874 0 1 0 0 0 43 121
51109 215303.4471 4721047.303 0 1 1 1 0 21 121
This data has been preprocessed such that there exists dummy/binary variables in place of actual categories- age_age6574
and age_age75plus
make up one category, sex_female
another, stage_late
another, death_death
another. Access
is a continuous variable. TruncTime
will be treated as a discrete time variable. The variable in that will be used as a censoring variable will be death_death
.
Calling the functions:
So, to import everything I do the following:
from rpy2 import robjects
from rpy2.robjects.packages import importr
from rpy2.robjects.vectors import DataFrame
survival = importr('survival')
coxph = survival.coxph
Surv = survival.Surv
theData = DataFrame.from_csvfile(newDataFile, header=True, sep=',')
So everything is all setup to do Cox Proportional Hazards - I think!
Within R I can do:
coxph(formula = Surv(TruncTime, death_death) ~ age_age6574 +
age_age75plus + sex_female + stage_late + access, method = "breslow")
and everything works out fine.
When I do this same thing within Python, using everything I have described, I get an error that reads:
This is the function call:
coxph(Surv('TruncTime', 'death_death'), 'age_age6574'+'age_age75plus'+'sex_female'+'stage_late'+'access', data = theData, method = 'breslow')
This is the error returned:
Error in (function (time, time2, event, type = c("right", "left", "interval", :
Time variable is not numeric
So, I'm wondering what I'm doing wrong in the function call (why it is returning this error) and how I go about properly calling it?
Additionally, I'll also wondering if there is a way to change the censoring
from a '0' to a '1' (reverse the way that the censoring variable works)?
::::UPDATE::::
So I found out part of my problem was specifying which columns/attributes to use. Apparently, rpy2 needs numerical indices of columns to call the functions. So:
Surv(theData[9], theData[7])
for the survival part of the coxph
. The Surv
part works.
Now, I'm still trying to work out how to specify everything else. Namely:
-How do I specify the other variables to use in building the model. There is a problem with using the +
in linking these variables together. Also ~
doesn't work as it does in regular R. The following DOES NOT work
coxph(Surv(theData[9], theData[7])~theData[3]+theData[4]+theData[5]+theData[6]+theData[8], data = theData, method = 'breslow')
I also tried replacing the '~' with a `,' such as:
coxph(Surv(theData[9], theData[7]), theData[3]+theData[4]+theData[5]+theData[6]+theData[8], data = theData, method = 'breslow')
-It is definitely having a problem with those +
and I'm not sure replacing the ~
with a ,
has actually worked.