If you use the data.table
package, this is very easy:
install.packages("data.table")
library(data.table)
DF = data.table(DF)
DF[,No_Days:=unlist(lapply(rle(Runoff>0.05)$lengths,function(x) rev(seq(x:1)))),by=Soil]
DF[Runoff <= 0.05, No_Days:=0]
Frage
This is really a follow on from another question I posted on here a few weeks ago and got an answer for.
In my initial question I wanted to find the number of days between runoff events in a dataset. As shown in the data sample below:
Date Runoff No_Days
01/01/1980 0 4
02/01/1980 0 3
03/01/1980 0 2
04/01/1980 0 1
05/01/1980 4.5 0
06/01/1980 2 0
07/01/1980 0 6
08/01/1980 0 5
09/01/1980 0 4
10/01/1980 0 3
11/01/1980 0 2
12/01/1980 0 1
13/01/1980 1.2 0
14/01/1980 0 4
15/01/1980 0 3
16/01/1980 0 2
17/01/1980 0 1
18/01/1980 0.8 0
I managed to get to this using the following code:
DF$No_Days <-unlist(lapply(rle(DF$Runoff>0.05)$lengths,function(x) rev(seq(x:1))))
DF$No_Days <-ifelse(DF$Runoff>0.05,0,DF$No_Days)
This all works well for a single dataset i.e. one time series for one group. What I am struggling with now however is how to manipulate the above code to do the same things for a number of time series datasets in the same data.table according to a grouping variable (Soil) to get for example:
Date Runoff No_Days Soil
01/01/1980 0 4 Clay
02/01/1980 0 3 Clay
03/01/1980 0 2 Clay
04/01/1980 0 1 Clay
05/01/1980 4.5 0 Clay
06/01/1980 2 0 Clay
07/01/1980 0 6 Clay
08/01/1980 0 5 Clay
09/01/1980 0 4 Clay
10/01/1980 0 3 Clay
11/01/1980 0 2 Clay
12/01/1980 0 1 Clay
13/01/1980 1.2 0 Clay
14/01/1980 0 4 Clay
15/01/1980 0 3 Clay
16/01/1980 0 2 Clay
17/01/1980 0 1 Clay
18/01/1980 0.8 0 Clay
01/01/1980 0 5 Sand
02/01/1980 0 4 Sand
03/01/1980 0 3 Sand
04/01/1980 0 2 Sand
05/01/1980 0 1 Sand
06/01/1980 2 0 Sand
07/01/1980 0 11 Sand
08/01/1980 0 10 Sand
09/01/1980 0 9 Sand
10/01/1980 0 8 Sand
11/01/1980 0 7 Sand
12/01/1980 0 6 Sand
13/01/1980 0 5 Sand
14/01/1980 0 4 Sand
15/01/1980 0 3 Sand
16/01/1980 0 2 Sand
17/01/1980 0 1 Sand
18/01/1980 0.8 0 Sand
Currently if I run the code it does not distinguish between the different soil types and therefore does not 'restart' the sequencing after each time series.
From reading around it seems that I may need to replace lapply()
in the original code to by()
. I think this will work as long as rle()
is first of all grouped according to Soil but I can't find any way of doing this.
So any help appreciated please!
Lösung
If you use the data.table
package, this is very easy:
install.packages("data.table")
library(data.table)
DF = data.table(DF)
DF[,No_Days:=unlist(lapply(rle(Runoff>0.05)$lengths,function(x) rev(seq(x:1)))),by=Soil]
DF[Runoff <= 0.05, No_Days:=0]
Andere Tipps
If you were interested in doing this in base R as well, you can use ave
to get the same result. For convenience, i'll define
countdown <- function(events) {
unlist(with(rle(events),
Map(function(v,l) {
if(v) rep.int(0,l)
else l:1}
, values, lengths)
))
}
Then you would find the answer without soil type with
DF <- transform(DF, No_Days=countdown(Runoff>0.05))
and then to do grouping by soil type you could to
DF <- transform(DF, No_Days=ave(Runoff>0.05, Soil, FUN=countdown))