k-fold cross validation with RNNs

https://datascience.stackexchange.com/questions/64608

20-10-2020
|

Question

is it a good idea to use k-fold cross-validation in the recurrent neural network (RNN) to alleviate overfitting?

A potential solution could be L2 / Dropout Regularization but it might kill RNN performance as discussed here. This solution can affect the ability of RNNs to learn and retain information for longer time.
My dataset is strictly based on time series i.e auto-correlated with time and depends on the order of events. With standard k-fold cross-validation, it leaves out some part of the data, trains the model on the rest while deteriorating the time-series order. What can be an alternate solution?

Solution

TL;DR

Use Stacked Cross-Validation instead of traditional K-Fold Cross-Validation.

Stacked Cross-Validation

In Sckit-learn, this is called TimeSeriesSplit (docs).

The ideas that instead of randomly shuffling all your data points and losing their order, like you suggested, you split them in order (or in batches).

Belows is a picture of traditional K-Fold vs Stacked K-Fold. They both have K=4 (four iterations). But in the traditional K-Fold all the data is used all the time, whereas in the Stacked K-Fold only the past data is used for training and the now data for testing.

Strategy

Split your data in a Stacked K-Fold fashion.
At every iteration, store your model and measure its performance against the test set.
After all iterations are over, pick the stored model with the highest performance (might not even be the last one).

Note: This is a common approach in training neural nets (batching with validation sets).

OTHER TIPS

Besides stacked cross-validation/TimeSeriesSplit/multi-train-test split as explained by BrunoGL you could use Walk Forward Validation. The idea is to move along the time axis and forecast data for the very next point in time in a similar fashion as you would when applying your model "live" or "online" in a real setting.

The amount of past data to be considered for your forecast can be varied (i.e. you can use the full past dataset or just a number of examples).

As far as I know there is currently no implementation in SkLearn but it is straightforward:

### Walk Forward Validation with full history starting at starting_point
starting_point = 100
for i in range(starting_point, len(X)):
    X_train = X[0:i]
    X_test = X[i:i+1]
    ### apply model here

Since you're doing a lot of splits it is computationally expensive tho. In a way could see it as a timeseries leave-one-out validation as your test data is just one data point.

A good read on this is How To Backtest Machine Learning Models for Time Series Forecasting.

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange