Pergunta

The R package caret provides a handy function createFolds, which returns a list of indexes for training sets to be used in cross-validation:

set.seed(1)
require(caret)
x <- rnorm(10)
createFolds(x,k=5,returnTrain=TRUE)

$Fold1
[1]  1  2  5  6  7  8  9 10

$Fold2
[1]  1  3  4  5  6  8  9 10

$Fold3
[1]  1  2  3  4  5  7  8 10

$Fold4
[1] 1 2 3 4 6 7 8 9

$Fold5
[1]  2  3  4  5  6  7  9 10

I would like to create a similar function, except I want to return a list of indexes to be used in time-series cross validation. I found some example code in R, but I want to generalize and functionalize things more. Here's what I initially came up with:

createTSfolds <- function(y, Min=max(frequency(y),3)) {
    i <- seq(along=y)
    stops <- i[Min:(length(i)-1)]
    starts <- rep(1,length(stops))
    out <- mapply(seq,starts,stops)
    names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
    out
}
createTSfolds(x)

$Fold1
[1] 1 2 3

$Fold2
[1] 1 2 3 4

$Fold3
[1] 1 2 3 4 5

$Fold4
[1] 1 2 3 4 5 6

$Fold5
[1] 1 2 3 4 5 6 7

$Fold6
[1] 1 2 3 4 5 6 7 8

$Fold7
[1] 1 2 3 4 5 6 7 8 9

(Min is the minimum number of observation needed to fit a model)

This function works pretty well for now, but I'd like to add 2 functions that Rob Hyndman discusses:

  1. Windowing: Instead of the training set extending back to the 1st observation, it extends back n observations.
  2. Variable forecast horizons: Instead adding 1 index to the training set each fold, add k to the training set each fold.

Here is how I implemented windowing:

createTSfolds <- function(y, Min=max(frequency(y),3), lookback=NA) {
    i <- seq(along=y)
    stops <- i[Min:(length(i)-1)]
    if (is.na(lookback)) { 
        starts <- as.list(rep(1,length(stops)))
        out <- mapply(seq,starts,stops)
    } else {
        starts <- stops-Min+1
        out <- mapply(seq,starts,stops)
        out <- split(t(out),1:nrow(t(out)))
    }
    names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
    out
}
createTSfolds(x,Min=4,lookback=4)

I can't figure out how to implement variable forecast horizons, which would look like this: For example if k=3:

$Fold1
[1] 1 2 3

$Fold2
[1] 1 2 3 4 5 6

$Fold3
[1] 1 2 3 4 5 6 7 8 9

I'm looking for ways to improve my existing code, as well as ways to add variable increments to the training set each fold.

Thank you

Foi útil?

Solução

Here is one approach. It is not entirely robust, as I am not sure about the output you seek when both lookback and k are present. Let me know if this is what you were looking for.

 createTSfolds2 <- function(y, Min = max(frequency(y), 3), lookback = NA, k = NA){
   out = llply(Min:(length(y) - 1), seq)
   if (!is.na(k)) {out = out[seq(1, length(out), k)]}
   if (!is.na(lookback)) {
     out = plyr::llply(out, function(z) z[(length(z) - lookback + 1):length(z)])
   }
   names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
   return(out)
 }

createTSfolds2(x, Min = 3, lookback = NA, k = 3)

$Fold1
[1] 1 2 3

$Fold2
[1] 1 2 3 4 5 6

$Fold3
[1] 1 2 3 4 5 6 7 8 9

createTSfolds2(x, Min = 3, lookback = 3, k = 3)

$Fold1
[1] 1 2 3

$Fold2
[1] 4 5 6

$Fold3
[1] 7 8 9
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top