Question

I need to run a bootstrap on a time series with non-standard dependence. So to do this I need to create a function that simulates the time series by making time by time adjustments.

testing<-function(){
  sampleData<-as.zoo(data.frame(index=1:1000,vol=(rnorm(1000))^2,x=NA))
  sampleData[,"x"]<-sampleData[,"vol"]+rnorm(1000) #treat this is completely exognenous and unknown in connection to vol
  sampleData<-cbind(sampleData,mean=rollmean(sampleData[,"vol"],k=3,align="right"))
  sampleData<-cbind(sampleData,vol1=lag(sampleData[,"vol"],k=-1),x1=lag(sampleData[,"x"],k=-1),mean1=lag(sampleData[,"mean"],k=-1))

  #get estimate
  mod<-lm(vol~vol1+x1+mean1,data=sampleData)

  res<-mod$residuals

  for(i in 5:1000){
    #recursively estimate
    sampleData[i,"vol"]<-as.numeric(predict(mod,newdata=data.frame(sampleData[i-1,])))+res[i-3]

    #now must update other paramaters
      #first our rolled average
      sampleData[i,"mean"]<-mean(sampleData[(i-3):i,"vol"])

      #reupdate our lagged variables
      sampleData[i,"vol1"]<-sampleData[i-1,"vol"]
      sampleData[i,"mean1"]<-sampleData[i-1,"mean"]

  }

  lm(vol~vol1+x1+mean1,data=sampleData)
}

When I run this code and measure the run time I get

system.time(testing())
user  system elapsed 
2.711   0.201   2.915 

This is a slight problem for me as a will be integrating this code to construct a bootstrap. This means any time taken here is multiplied by about 100 for each step. And I am updating this at a few thousand times. That means a single run will take hours (to days) to run.

Is there anyway to speed this code up?

Kind regards,

Matthew

Was it helpful?

Solution

Here's how to avoid the overhead of predict.lm. Also note that I used a matrix instead of a zoo object, which would be a tiny bit slower. You can see just how much this slowed down your code. That's the price you pay for convenience.

testing.jmu <- function() {
  if(!require(xts)) stop("xts package not installed")
  set.seed(21)  # for reproducibility
  sampleData <- .xts(data.frame(vol=(rnorm(1000))^2,x=NA), 1:1000)
  sampleData$x <- sampleData$vol+rnorm(1000)
  sampleData$mean <- rollmean(sampleData$vol, k=3, align="right")
  sampleData$vol1 <- lag(sampleData$vol,k=1)
  sampleData$x1 <- lag(sampleData$x,k=1)
  sampleData$mean1 <- lag(sampleData$mean,k=1)

  sampleMatrix <- na.omit(cbind(as.matrix(sampleData),constant=1))
  mod.fit <- lm.fit(sampleMatrix[,c("constant","vol1","x1","mean1")],
                    sampleMatrix[,"vol"])
  res.fit <- mod.fit$residuals

  for(i in 5:nrow(sampleMatrix)){
    sampleMatrix[i,"vol"] <-
      sum(sampleMatrix[i-1,c("constant","vol1","x1","mean1")] *
          mod.fit$coefficients)+res.fit[i-3]
    sampleMatrix[i,"mean"] <- mean(sampleMatrix[(i-3):i,"vol"])
    sampleMatrix[i,c("vol1","mean1")] <- sampleMatrix[i-1,c("vol","mean")]
  }

  lm.fit(sampleMatrix[,c("constant","vol1","x1","mean1")], sampleMatrix[,"vol"])
}
system.time(out <- testing.jmu())
#    user  system elapsed 
#    0.05    0.00    0.05 
coef(out)
#    constant        vol1          x1       mean1 
#  1.08787779 -0.06487441  0.03416802 -0.02757601

Add the set.seed(21) call to your function and you'll see that my function returns the same coefficients as yours.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top