Pregunta

The question I am posting here is closely linked to another question I posted two days ago about gompertz aging analysis.

I am trying to construct a survival object, see ?Surv, in R. This will hopefully be used to perform Gompertz analysis to produce an output of two values (see original question for further details).

I have survival data from an experiment in flies which examines rates of aging in various genotypes. The data is available to me in several layouts so the choice of which is up to you, whichever suits the answer best.

One dataframe (wide.df) looks like this, where each genotype (Exp, of which there is ~640) has a row, and the days run in sequence horizontally from day 4 to day 98 with counts of new deaths every two days.

Exp      Day4   Day6    Day8    Day10   Day12   Day14    ...
A        0      0       0       2       3       1        ...

I make the example using this:

wide.df2<-data.frame("A",0,0,0,2,3,1,3,4,5,3,4,7,8,2,10,1,2)
colnames(wide.df2)<-c("Exp","Day4","Day6","Day8","Day10","Day12","Day14","Day16","Day18","Day20","Day22","Day24","Day26","Day28","Day30","Day32","Day34","Day36")

Another version is like this, where each day has a row for each 'Exp' and the number of deaths on that day are recorded.

Exp     Deaths  Day     
A       0       4    
A       0       6
A       0       8
A       2       10
A       3       12
..      ..      ..

To make this example:

df2<-data.frame(c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A"),c(0,0,0,2,3,1,3,4,5,3,4,7,8,2,10,1,2),c(4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36))
    colnames(df2)<-c("Exp","Deaths","Day")

Each genotype has approximately 50 flies in it. What I need help with now is how to go from one of the above dataframes to a working survival object. What does this object look like? And how do I get from the above to the survival object smoothly?

¿Fue útil?

Solución

After noting the total of Deaths was 55 and you said that the number of flies was "around 50", I decided the likely assumption was that this was a completely observed process. So you need to replicate the duplicate deaths so there is one row for each death and assign an event marker of 1. The "long" format is clearly the preferred format. You can then create a Surv-object with the 'Day' and 'event'

?Surv
df3 <- df2[rep(rownames(df2), df2$Deaths), ]
str(df3)
#---------------------
'data.frame':   55 obs. of  3 variables:
 $ Exp   : Factor w/ 1 level "A": 1 1 1 1 1 1 1 1 1 1 ...
 $ Deaths: num  2 2 3 3 3 1 3 3 3 4 ...
 $ Day   : num  10 10 12 12 12 14 16 16 16 18 ...
#----------------------
df3$event=1
str(with(df3, Surv(Day, event) ) )
#------------------
 Surv [1:55, 1:2] 10  10  12  12  12  14  16  16  16  18  ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:2] "time" "status"
 - attr(*, "type")= chr "right"

Note: If this were being done in the coxph function, the expansion to individual lines of date might not have been needed, since that function allows specification of case weights. (I'm guessing that the other regression function in the survival package would not have needed this to be done either.) In the past Terry Therneau has expressed puzzlement that people are creating Surv-objects outside the formula interface of the coxph. The intended use of htis Surv-object was not described in sufficient detail to know whether a weighted analysis without exapnsion were possible.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top