Applicando una regressione della finestra rotabile a una serie XTS in R

https://stackoverflow.com/questions/9351066

27-10-2019
|

Domanda

Ho un XTS di 1033 punti giornalieri di ritorno per 5 coppie di valute su cui voglio eseguire una regressione della finestra a rotazione, ma Rollipply non funziona per la mia funzione definita che utilizza LM (). Ecco i miei dati:

> head(fxr)
                 USDZAR        USDEUR       USDGBP        USDCHF        USDCAD
2007-10-18 -0.005028709 -0.0064079963 -0.003878743 -0.0099537170 -0.0006153215
2007-10-19 -0.001544470  0.0014275520 -0.001842564  0.0023058211 -0.0111410271
2007-10-22  0.010878027  0.0086642116  0.010599365  0.0051899551  0.0173792230
2007-10-23 -0.022783987 -0.0075236355 -0.010804304 -0.0041668499 -0.0144788687
2007-10-24 -0.006561223  0.0008545792  0.001024275 -0.0004261666  0.0049525483
2007-10-25 -0.014788901 -0.0048523001 -0.001434280 -0.0050425302 -0.0046422944

> tail(fxr)
                 USDZAR       USDEUR       USDGBP       USDCHF        USDCAD
2012-02-10  0.018619309  0.007548205  0.005526184  0.006348533  0.0067151342
2012-02-13 -0.006449463 -0.001055966 -0.002206810 -0.001638002 -0.0016995755
2012-02-14  0.006320364  0.006843933  0.006605875  0.005992935  0.0007001751
2012-02-15 -0.001666872  0.004319096 -0.001568874  0.003686840 -0.0015009759
2012-02-16  0.006419616 -0.003401364 -0.005194817 -0.002709588 -0.0019044761
2012-02-17 -0.004339687 -0.003675992 -0.003319899 -0.003043481  0.0000000000

Posso facilmente eseguire un LM su di esso per l'intero set di dati per modellare USDZAR rispetto alle altre coppie:

> lm(USDZAR ~ ., data = fxr)$coefficients
  (Intercept)        USDEUR        USDGBP        USDCHF        USDCAD 
-1.309268e-05  5.575627e-01  1.664283e-01 -1.657206e-01  6.350490e-01

Tuttavia, voglio eseguire una finestra rotolante di 62 giorni per ottenere l'evoluzione di questi coefficienti nel tempo, quindi creo una funzione DOLM che lo fa:

> dolm
function(x) {
  return(lm(USDZAR ~ ., data = x)$coefficients)
}

Tuttavia, quando corro a rotolamento su questo ottengo quanto segue:

> rollapply(fxr, 62, FUN = dolm)
Error in terms.formula(formula, data = data) : 
  '.' in formula and no 'data' argument

Questo è anche se Dolm (FXR) da solo funziona bene:

> dolm(fxr)
  (Intercept)        USDEUR        USDGBP        USDCHF        USDCAD 
-1.309268e-05  5.575627e-01  1.664283e-01 -1.657206e-01  6.350490e-01

Cosa sta succedendo qui? Sembra funzionare bene se Dolm è una funzione più semplice per esempio:

> dolm <- edit(dolm)
> dolm
function(x) {
  return(mean(x))
}
> rollapply(fxr, 62, FUN = dolm)
                  USDZAR        USDEUR        USDGBP        USDCHF        USDCAD
2007-11-29 -1.766901e-04 -6.899297e-04  6.252596e-04 -1.155952e-03  7.021468e-04
2007-11-30 -1.266130e-04 -6.512204e-04  7.067767e-04 -1.098413e-03  7.247315e-04
2007-12-03  8.949942e-05 -6.406932e-04  6.637066e-04 -1.154806e-03  8.727564e-04
2007-12-04  2.042046e-04 -5.758493e-04  5.497422e-04 -1.116308e-03  7.124593e-04
2007-12-05  7.343586e-04 -4.899982e-04  6.161819e-04 -1.057904e-03  9.915495e-04

Qualsiasi aiuto molto apprezzato. Essenzialmente quello che voglio è ottenere i pesi per la regressione di UsDzar ~ UsDeur + USDGBP + USDCHF + USDCAD su una finestra rotolante di 62 giorni.

Soluzione

Ci sono diversi problemi qui:

rollapply passa una matrice ma lm richiede a data.frame.
rollapply applica la funzione a ciascuna colonna separatamente by.column=FALSE.
È possibile o meno desiderare che il risultato sia giusto allineato con le date ma se si utilizza rollapplyr :

1) Incorporando quanto sopra abbiamo:

dolm <- function(x) coef(lm(USDZAR ~ ., data = as.data.frame(x))))
rollapplyr(fxr, 62, dolm, by.column = FALSE)

2) Un'alternativa al lm nel dolm sopra è usare lm.fit che funziona direttamente con le matrici ed è anche più veloce:

dolm <- function(x) coef(lm.fit(cbind(Intercept = 1, x[,-1]), x[,1]))

Altri suggerimenti

Nuova risposta

La risposta di G. Grothendieck è corretto ma puoi farlo più velocemente con il rollRegres pacchetto come mostra il seguente esempio (il roll_regres.fit La funzione è ~ 118 volte più veloce)

# simulate data
set.seed(101)
n <- 1000
wdth = 100
X <- matrix(rnorm(10 * n), n, 10)
y <- drop(X %*% runif(10)) + rnorm(n)
Z <- cbind(y, X)

# assign other function
dolm <- function(x)
  coef(lm.fit(x[, -1], x[, 1]))

# show that they yield the same
library(zoo)
library(rollRegres)
all.equal(
  rollapply(Z, wdth, FUN = dolm,
            by.column = FALSE,  align = "right", fill = NA_real_),
  roll_regres.fit(X, y, wdth)$coefs,
  check.attributes = FALSE)
#R [1] TRUE

# benchmark
library(compiler)
dolm <- cmpfun(dolm)

microbenchmark::microbenchmark(
  newnew = roll_regres.fit(X, y, wdth),
  prev   = rollapply(Z, wdth, FUN = dolm,
                     by.column = FALSE,  align = "right", fill = NA_real_),
  times = 10)
#R Unit: microseconds
#R expr        min         lq       mean     median         uq        max neval
#R newnew    884.938    950.914   1026.134   1025.581   1057.581   1242.075    10
#R   prev 111057.822 111903.649 118867.761 116857.726 122087.160 141362.229    10

Puoi anche usare il roll_regres Funziona dal pacchetto se si desidera utilizzare invece una formula R.

Vecchia risposta

Una terza opzione sarebbe quella di aggiornare la matrice R in una decomposizione QR come fatto nel codice seguente. Puoi accelerare questo facendolo in C ++ ma ne avrai bisogno dchud e dchdd subroutine da Linpack (o un'altra funzione per aggiornare R)

library(SamplerCompare) # for LINPACK `chdd` and `chud`
roll_coef <- function(X, y, width){
  n <- nrow(X)
  p <- ncol(X)
  out <- matrix(NA_real_, n, p)

  is_first <- TRUE
  i <- width 
  while(i <= n){
    if(is_first){
      is_first <- FALSE
      qr. <- qr(X[1:width, ])
      R <- qr.R(qr.)

      # Use X^T for the rest
      X <- t(X)

      XtY <- drop(tcrossprod(y[1:width], X[, 1:width]))
    } else {
      x_new <- X[, i]
      x_old <- X[, i - width]

      # update R 
      R <- .Fortran(
        "dchud", R, p, p, x_new, 0., 0L, 0L, 
        0., 0., numeric(p), numeric(p), 
        PACKAGE = "SamplerCompare")[[1]]

      # downdate R
      R <- .Fortran(
        "dchdd", R, p, p, x_old, 0., 0L, 0L, 
        0., 0., numeric(p), numeric(p), integer(1),
        PACKAGE = "SamplerCompare")[[1]]

      # update XtY
      XtY <- XtY + y[i] * x_new - y[i - width] * x_old
    }

    coef.    <- .Internal(backsolve(R, XtY, p, TRUE, TRUE))
    out[i, ] <- .Internal(backsolve(R, coef., p, TRUE, FALSE))

    i <- i + 1
  }

  out
}

# simulate data
set.seed(101)
n <- 1000
wdth = 100
X <- matrix(rnorm(10 * n), n, 10)
y <- drop(X %*% runif(10)) + rnorm(n)
Z <- cbind(y, X)

# assign other function
dolm <- function(x) 
  coef(lm.fit(x[, -1], x[, 1]))

# show that they yield the same
library(zoo)
all.equal(
  rollapply(Z, wdth, FUN = dolm,  
            by.column = FALSE,  align = "right", fill = NA_real_),
  roll_coef(X, y, wdth), 
  check.attributes = FALSE)
#R> [1] TRUE

# benchmark
library(compiler)
roll_coef <- cmpfun(roll_coef)
dolm <- cmpfun(dolm)
microbenchmark::microbenchmark(
  new =  roll_coef(X, y, wdth),
  prev = rollapply(Z, wdth, FUN = dolm,  
                   by.column = FALSE,  align = "right", fill = NA_real_), 
  times = 10)
#R> Unit: milliseconds
#R>  expr        min         lq       mean     median         uq       max neval cld
#R>   new   8.631319   9.010579   9.808525   9.659665   9.973741  11.87083    10  a 
#R>  prev 118.257128 121.734860 124.489826 122.882318 127.195410 135.21280    10   b

La soluzione sopra richiede di formare il model.matrix e model.response prima ma sono solo tre chiamate (un extra per model.frame) prima della chiamata a roll_coef.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow