Вопрос

I am trying to extract data between the occurrence of two patterns. I.e. if the pattern occurs subset all data until that pattern occurs again. I would then need to give this subset a number so that it is then identifiable

USING (R)

example data:

DF<-(structure(list(date.time = structure(c(1374910680, 1374911040, 
                                   1374911160, 1374911580, 1374913380, 1374913500, 1374913620, 1374913740, 
                                   1374914160, 1374914400, 1374914520, 1374914940, 1374915000, 1374915120, 
                                   1374915240), class = c("POSIXct", "POSIXt"), tzone = ""), aerial = structure(c(2L, 
                                                                                                                  2L, 8L, 8L, 2L, 2L, 2L, 8L, 8L, 8L, 2L, 2L, 8L, 2L, 2L), .Label = c("0", 
                                                                                                                                                                                      "1", "10", "11", "2", "3", "4", "5", "6", "7", "8", "9", "m"), class = "factor")), .Names = c("date.time", 
                                                                                                                                                                                                                                                                                    "aerial"), row.names = c(1L, 2L, 3L, 4L, 5L, 
                                                                                                                                                                                                                                                                                                            6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 
                                                                                                                                                                                                                                                                                                             14L, 15L), class = "data.frame") )

example pattern: where DF$aerial repeats 1,1

From the above I want to subset/extract the data between the occurrence of the pattern, and then give this an identifiable number as to number of occurrence of this pattern(i.e. this is the first occurrence, this is the second occurrence etc etc)

desired output:

         date.time       aerial    occurrence
3  2013-07-27 08:46:00      5          1
4  2013-07-27 08:53:00      5          1
8  2013-07-27 09:29:00      5          2
9  2013-07-27 09:36:00      5          2
10 2013-07-27 09:40:00      5          2
13 2013-07-27 09:50:00      5          3

I can Identify the pattern:

require(zoo)
library(zoo)

pat <- c(1,1)

x <- rollapply(DF$aerial, length(pat), FUN=function(x) all(x == pat))

DF[which(x),]

and obviously I can create an is.between function

is.between <- function(x, a, b) {
x > a & x < b
}

However after this I get stuck,

Note: data between the pattern may not always be aerial 5, this is used to simplify the example

help and pointers greatly appreciated!

Это было полезно?

Решение

It seems that it is good enough to exclude all runs of 1's that are at least 2 long so try this:

library(zoo)

a <- as.numeric(as.character(DF$aerial))
r <- rle(a)
cond <- with(r, values != 1 | lengths < 2)
ok <- rep(cond, r$lengths)
occur <- rep(cumsum(cond), r$lengths)
cbind(DF, occur)[ok, ]

which gives:

             date.time aerial occur
3  2013-07-27 03:46:00      5     1
4  2013-07-27 03:53:00      5     1
8  2013-07-27 04:29:00      5     2
9  2013-07-27 04:36:00      5     2
10 2013-07-27 04:40:00      5     2
13 2013-07-27 04:50:00      5     3

REVISION: Added occur column

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top