Question

I have been reading several papers, articles and blog posts about RNNs (LSTM specifically) and how we can use them to do time series prediction. In almost all examples and codes I have found, the problem is defined as finding the next x values of a time series based on previous data. What I am trying to solve is the following:

  • Assuming We have t values of a time series, what would be its value at time t+1?

So using different LSTM packages (deeplearning4j, keras, ...) that are out there, here is what I am doing right now:

  1. Create a LSTM network and fit it to t samples. My network has one input and one output. So as for input I will have the following patterns and I call them train data:

    t_1,t_2

    t_2,t_3

    t_3,t_4

  2. The next step is to use for example t_4 as input and expect t_5 as output then use t_5 as input and expect t_6 as output and so on.

  3. When done with prediction, I use t_5,t_6 to update my model.

My question: Is this the correct way of doing it? If yes, then I have no idea what does batch_size mean and why it is useful.

Note: An alternative that comes to my mind is something similar to examples which generate a sequence of characters, one character at a time. In that case, batch_size would be a series of numbers and I am expecting the next series with the same size and the one value that I'm looking for would be the last number in that series. I am not sure which of the above mentioned approaches are correct and would really appreciate any help in this regard.

Thanks

Was it helpful?

Solution

  1. The way you are doing it is just fine. The idea in time series prediction is to do regression basically. Probably what you have seen other places in case of vector, it is about the size of the input or basically it means feature vector. Now, assuming that you have t timesteps and you want to predict time t+1, the best way of doing it using either time series analysis methods or RNN models like LSTM, is to train your model on data up to time t to predict t+1. Then t+1 would be the input for the next prediction and so on. There is a good example here. It is based on LSTM using the pybrain framework.
  2. Regarding your question on batch_size, at first you need to understand the difference between batch learning versus online learning. Batch size basically indicates a subset of samples that your algorithms is going to use in gradient descent optimization and it has nothing to do with the way you input the data or what you expect your output to be. For more information on that I suggest you read this kaggle post.

OTHER TIPS

I recently did a lot of reading and code writing on LSTMs so I'll try to pitch in and answer the question, though I'm not yet familiar with Theano Keras or deeplearning4j. From a quick scour of the Internet it seems that the meaning of batch_size is program-dependent, so it might be a good idea to check the online Help files. From what I gather though, it can refer to a couple of different things: 1) the count of cases fed into the training algorithm for processing before the next examination of inputs and/or outputs. 2) the number of iterations of the training algorithm over the last set of input before the next examination of inputs and/or outputs. 3) the size of sequences fed to the LSTM algorithms.

With that in mind it seems almost certain that Theano and deeplearning4j are referring to #1 and/or #2, given the following links. On the other hand, I've run across journal articles recently, where batch size referred to sequence size, so watch out for a potential mismatch between the software and academic terminology:

This post at CrossValidated indicates that in deeplearning4j and Keras the batch_size should be set to 1 for online learning. So in other words, batch_size controls the number of training cycles before the next input/output check.

☻ In Keras, the batch_size apparently sets the number of training cycles to execute or # of cases to train on before checking the inputs and/or outputs again, according to this Github page.

This Github package also seems to use a different variable to denote sequence length.

☻ From Tony Kanh's answer at this StackOverflow page, in TensorFlow the batch_size determines the size of the output vector. This is probably identical to the number of items processed in each training cycles, since multiplying it by the num_steps and size (probably the sequence size?) determines the dimensions of the output.

☻ For closely related neural nets like Bidirectionals, the batch size is apparently equivalent to the sequence size, not the number of cases fed into the training algorithm. See p. 5, Berglund, et al., "Bidirectional Recurrent Neural Networks as Generative Models," available at the Cornell University Library website without a paywall.

As I said I haven't used Theano and deeplearning4j yet, but I hope that at least provides a starting point to find the correct answer.

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top