Question

I have a data frame which looks like

z<-data.frame(a=c(seq(1990,1995,1), 1997,1998,1999,2001,2002,2003), b=seq(90,101,1))

I use function

rollapply(b, 3, sd, align='right') 

to calculate standard deviation.

The thing I want to do is that function breaks and starts to calculate standard deviation again if there is a gap between consecutive years.

EDIT:

My sample output should look like this:

enter code here       a   b    c
                 1  1990  90  NA
                 2  1991  91  NA
                 3  1992  92  sd(90,91,92)
                 4  1993  93  sd(93,92,91)
                 5  1994  94  sd(94,93,92)
                 6  1995  95  sd(95,94,93)
                 7  1997  96  NA
                 8  1998  97  NA
                 9  1999  98  sd(98,97,96)
                10  2001  99  NA
                11  2002  100 NA
                12  2003  101 sd(101,100,99)
Was it helpful?

Solution

I think this does what you want:

my.roll <- function(x) rollapply(x, 3, sd, align='right', fill=NA, na.rm=T)
z$sd <- ave(z$b, c(0, cumsum(diff(z$a) - 1)), FUN=my.roll) 

Produces:

      a   b sd
1  1990  90 NA
2  1991  91 NA
3  1992  92  1
4  1993  93  1
5  1994  94  1
6  1995  95  1
7  1997  96 NA
8  1998  97 NA
9  1999  98  1
10 2001  99 NA
11 2002 100 NA
12 2003 101  1

Note how the first two entries after each gap are NA because you need at least three values in your window.

Basically, what we do here is use cumsum and diff to figure out the blocks of contiguous years, and then with that we can use ave to apply sd to each block. Note this will break if you have repeated years (e.g. 1997 shows up 2 or more times), or if your data isn't sorted by year.

OTHER TIPS

Convert data.frame to a zoo object, z, and merge that with a grid, g, of all years including the ones not found in z. Apply rollapplyr to that and extract out the original times:

library(zoo)

z <- read.zoo(DF, FUN = identity)
g <- merge(z, zoo(, start(z):end(z)))
r <- rollapplyr(g, 3, sd, fill = NA)[I(time(z))]

giving:

> r
1990 1991 1992 1993 1994 1995 1997 1998 1999 2001 2002 2003 
  NA   NA    1    1    1    1   NA   NA    1   NA   NA    1 

r is a zoo object for which time(r) is the times and coredata(r) is the data.

Note: We have used:

DF <- structure(list(V1 = c(1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 
  1997L, 1998L, 1999L, 2001L, 2002L, 2003L), V2 = 90:101), .Names = c("V1", 
  "V2"), class = "data.frame", row.names = c(NA, -12L))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top