Question

The R package caret provides a handy function createFolds, which returns a list of indexes for training sets to be used in cross-validation:

set.seed(1)
require(caret)
x <- rnorm(10)
createFolds(x,k=5,returnTrain=TRUE)

$Fold1
[1]  1  2  5  6  7  8  9 10

$Fold2
[1]  1  3  4  5  6  8  9 10

$Fold3
[1]  1  2  3  4  5  7  8 10

$Fold4
[1] 1 2 3 4 6 7 8 9

$Fold5
[1]  2  3  4  5  6  7  9 10

I would like to create a similar function, except I want to return a list of indexes to be used in time-series cross validation. I found some example code in R, but I want to generalize and functionalize things more. Here's what I initially came up with:

createTSfolds <- function(y, Min=max(frequency(y),3)) {
    i <- seq(along=y)
    stops <- i[Min:(length(i)-1)]
    starts <- rep(1,length(stops))
    out <- mapply(seq,starts,stops)
    names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
    out
}
createTSfolds(x)

$Fold1
[1] 1 2 3

$Fold2
[1] 1 2 3 4

$Fold3
[1] 1 2 3 4 5

$Fold4
[1] 1 2 3 4 5 6

$Fold5
[1] 1 2 3 4 5 6 7

$Fold6
[1] 1 2 3 4 5 6 7 8

$Fold7
[1] 1 2 3 4 5 6 7 8 9

(Min is the minimum number of observation needed to fit a model)

This function works pretty well for now, but I'd like to add 2 functions that Rob Hyndman discusses:

  1. Windowing: Instead of the training set extending back to the 1st observation, it extends back n observations.
  2. Variable forecast horizons: Instead adding 1 index to the training set each fold, add k to the training set each fold.

Here is how I implemented windowing:

createTSfolds <- function(y, Min=max(frequency(y),3), lookback=NA) {
    i <- seq(along=y)
    stops <- i[Min:(length(i)-1)]
    if (is.na(lookback)) { 
        starts <- as.list(rep(1,length(stops)))
        out <- mapply(seq,starts,stops)
    } else {
        starts <- stops-Min+1
        out <- mapply(seq,starts,stops)
        out <- split(t(out),1:nrow(t(out)))
    }
    names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
    out
}
createTSfolds(x,Min=4,lookback=4)

I can't figure out how to implement variable forecast horizons, which would look like this: For example if k=3:

$Fold1
[1] 1 2 3

$Fold2
[1] 1 2 3 4 5 6

$Fold3
[1] 1 2 3 4 5 6 7 8 9

I'm looking for ways to improve my existing code, as well as ways to add variable increments to the training set each fold.

Thank you

Était-ce utile?

La solution

Here is one approach. It is not entirely robust, as I am not sure about the output you seek when both lookback and k are present. Let me know if this is what you were looking for.

 createTSfolds2 <- function(y, Min = max(frequency(y), 3), lookback = NA, k = NA){
   out = llply(Min:(length(y) - 1), seq)
   if (!is.na(k)) {out = out[seq(1, length(out), k)]}
   if (!is.na(lookback)) {
     out = plyr::llply(out, function(z) z[(length(z) - lookback + 1):length(z)])
   }
   names(out) <- paste("Fold", gsub(" ", "0", format(seq(along = out))), sep = "")
   return(out)
 }

createTSfolds2(x, Min = 3, lookback = NA, k = 3)

$Fold1
[1] 1 2 3

$Fold2
[1] 1 2 3 4 5 6

$Fold3
[1] 1 2 3 4 5 6 7 8 9

createTSfolds2(x, Min = 3, lookback = 3, k = 3)

$Fold1
[1] 1 2 3

$Fold2
[1] 4 5 6

$Fold3
[1] 7 8 9
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top