Pregunta

Supposing I need to apply an MA(5) to a batch of market data, stored in an xts object. I can easily pull the subset of data I wanted smoothed with xts subsetting:

x['2013-12-05 17:00:01/2013-12-06 17:00:00']

However, I need an additional 5 observations prior to the first one in my subset to "prime" the filter. Is there an easy way to do this?

The only thing I have been able to figure out is really ugly, with explicit row numbers (here using xts sample data):

require(xts)
data(sample_matrix)
x <- as.xts(sample_matrix)

x$rn <- row(x[,1])
frst <- first(x['2007-05-18'])$rn
finl <- last(x['2007-06-09'])$rn
ans <- x[(frst-5):finl,]

Can I just say bleah? Somebody help me.

UPDATE: by popular request, a short example that applies an MA(5) to the daily data in sample_matrix:

require(xts)
data(sample_matrix)
x <- as.xts(sample_matrix)$Close

calc_weights <- function(x) {
    ##replace rnorm with sophisticated analysis
    wgts <- matrix(rnorm(5,0,0.5), nrow=1)
    xts(wgts, index(last(x)))
}

smooth_days <- function(x, wgts) {
    w <- wgts[index(last(x))]
    out <- filter(x, w, sides=1)
    xts(out, index(x))
}

set.seed(1.23456789)
wgts <- apply.weekly(x, calc_weights)
lapply(split(x, f='weeks'), smooth_days, wgts)

For brevity, only the final week's output:

[[26]]
                [,1]
2007-06-25        NA
2007-06-26        NA
2007-06-27        NA
2007-06-28        NA
2007-06-29 -9.581503
2007-06-30 -9.581208

The NAs here are my problem. I want to recalculate my weights for each week of data, and apply those new weights to the upcoming week. Rinse, repeat. In real life, I replace the lapply with some ugly stuff with row indexes, but I'm sure there's a better way.

In an attempt to define the problem clearly, this appears to be a conflict between the desire to run an analysis on non-overlapping time periods (weeks, in this case) but requiring overlapping time periods of data (2 weeks, in this case) to perform the calculation.

¿Fue útil?

Solución

Here's one way to do this using endpoints and a for loop. You could still use the which.i=TRUE suggestion in my comment, but integer subsetting is faster.

y <- x*NA                   # pre-allocate result
ep <- endpoints(x,"weeks")  # time points where parameters change

set.seed(1.23456789)
for(i in seq_along(ep)[-(1:2)]) {
  rng1 <- ep[i-1]:ep[i]          # obs to calc weights
  rng2 <- ep[i-2]:ep[i]          # "prime" obs
  wgts <- calc_weights(x[rng1])
  # calc smooth_days on rng2, but only keep rng1 results
  y[rng1] <- smooth_days(x[rng2], wgts)[index(x[rng1])]
}
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top