Question

I am trying to write R code which acts as a "moving window", just with memory (state). I have figured out (thanks to this question) how to apply a function to subsequent tuples of elements. For example, if I wish to write a (simple) moving average with a typical period 4, I would do the following:

mapply(myfunc, x[1:(length(x)-4)], x[2:(length(x)-3)], x[3:(length(x)-2)], x[4:(length(x)-1)])

Where myfunc is a function with 4 arguments, which calculates their mean (I cannot use mean, as it expects only 1 argument, and I don't know how to make the 4 arguments a single vector). That's quite cumbersome, though, and if the typical period is 100, say, I am not sure how to do it.

So here's my first question: how do I generalize this?

But here's another issue: suppose I wish the applied function to be able to save state. A simple example would be to keep record of how many values it was applied on so far. Another example is the exponential moving average (EMA), which is not really a window function, but instead a function which works on single values but which keeps state (the last resulted mean).

How can I write a function which when applied to a vector, works on its values one by one, returning a vector of the same length, which is able to retain its last output every time, or save any other "state" during its calculations? In Python, for example, I'd use classes for that, but that's quite difficult in R.

Important note: I am not interested in auxiliary R packages like zoo or TTR to do the work for me. I am trying to learn R, and in any case the functions I wish to write, while having similarities with MA or EMA, are custom, and do not exist in any of these packages.

Was it helpful?

Solution

Regarding your first question,

n <- length(x)
k <- 4
r <- embed(x, n-k)[1:k, seq(n-k, 1)]
do.call("mapply", c("myfunc", split(r, 1:k)))

Regarding the second question, Reduce can be used to iterate over a vector saving state.

OTHER TIPS

For things like this you should consider using a plain for loop:

x <- runif(10000)
k <- 100
n <- length(x)
res <- numeric(n - k)

library(microbenchmark)
microbenchmark(times=5,
  for(i in k:n) res[i - k + 1] <- sum(vec[i:(i + k)]),
  {
    r <- embed(x, n-k)[1:k, seq(n-k, 1)]
    gg <- do.call("mapply", c("sum", split(r, 1:k)))    
  },
  flt <- filter(x, rep(1, k))
)

Produces:

Unit: milliseconds
                    min        lq    median        uq       max neval
for            163.5403  164.4929  165.2543  166.6315  167.0608     5
embed/mapply  1255.2833 1307.3708 1338.2748 1341.5719 1405.1210     5
filter           6.7101    6.7971    6.8073    6.8161    6.8991     5

Now, the results are not identical and I don't pretend to understand exactly what GGrothendieck is doing with embed, but generally speaking for loops are just as fast as *pply functions so long as you initialize your result vectors first. Windowed calculations don't lend themselves well to vectorization, so might as well use a for loop.

EDIT: as several have pointed out in comments, there appears to be an internally implemented function to do (filter) this that is quite a bit faster, so that seems to be the best option (though you should confirm it actually does what you want as again, the results are not exactly identical and I am not personally familiar with the function; in it's default configuration it appears to do a rolling weighted sum, or sum if weights are 1, with a centered window).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top