Question

Supposing I need to apply an MA(5) to a batch of market data, stored in an xts object. I can easily pull the subset of data I wanted smoothed with xts subsetting:

x['2013-12-05 17:00:01/2013-12-06 17:00:00']

However, I need an additional 5 observations prior to the first one in my subset to "prime" the filter. Is there an easy way to do this?

The only thing I have been able to figure out is really ugly, with explicit row numbers (here using xts sample data):

require(xts)
data(sample_matrix)
x <- as.xts(sample_matrix)

x$rn <- row(x[,1])
frst <- first(x['2007-05-18'])$rn
finl <- last(x['2007-06-09'])$rn
ans <- x[(frst-5):finl,]

Can I just say bleah? Somebody help me.

UPDATE: by popular request, a short example that applies an MA(5) to the daily data in sample_matrix:

require(xts)
data(sample_matrix)
x <- as.xts(sample_matrix)$Close

calc_weights <- function(x) {
    ##replace rnorm with sophisticated analysis
    wgts <- matrix(rnorm(5,0,0.5), nrow=1)
    xts(wgts, index(last(x)))
}

smooth_days <- function(x, wgts) {
    w <- wgts[index(last(x))]
    out <- filter(x, w, sides=1)
    xts(out, index(x))
}

set.seed(1.23456789)
wgts <- apply.weekly(x, calc_weights)
lapply(split(x, f='weeks'), smooth_days, wgts)

For brevity, only the final week's output:

[[26]]
                [,1]
2007-06-25        NA
2007-06-26        NA
2007-06-27        NA
2007-06-28        NA
2007-06-29 -9.581503
2007-06-30 -9.581208

The NAs here are my problem. I want to recalculate my weights for each week of data, and apply those new weights to the upcoming week. Rinse, repeat. In real life, I replace the lapply with some ugly stuff with row indexes, but I'm sure there's a better way.

In an attempt to define the problem clearly, this appears to be a conflict between the desire to run an analysis on non-overlapping time periods (weeks, in this case) but requiring overlapping time periods of data (2 weeks, in this case) to perform the calculation.

Was it helpful?

Solution

Here's one way to do this using endpoints and a for loop. You could still use the which.i=TRUE suggestion in my comment, but integer subsetting is faster.

y <- x*NA                   # pre-allocate result
ep <- endpoints(x,"weeks")  # time points where parameters change

set.seed(1.23456789)
for(i in seq_along(ep)[-(1:2)]) {
  rng1 <- ep[i-1]:ep[i]          # obs to calc weights
  rng2 <- ep[i-2]:ep[i]          # "prime" obs
  wgts <- calc_weights(x[rng1])
  # calc smooth_days on rng2, but only keep rng1 results
  y[rng1] <- smooth_days(x[rng2], wgts)[index(x[rng1])]
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top