Question

I was following Tensorflow's own time series/LSTM tutorial, and there's something I don't quite understand about the whole process around Backpropagation Through Time (BPTT).

The resources I've found explain how prediction for a time in a future with an offset work (the "offset" in the tutorial, and the t+3rd time instance in the Wikipedia article)

What I don't understand is how does this generalize to where the input width is more than 1?

So in the Wikipedia article we see the case for inputWidth=1, outputWidth=1 and offset=3.

I think I understand if outputWidth>1 then you'd just start using not only the last, but also previous cell predictions as output. I also understand if offset>1 the you just unroll the network into offset number of steps (as shown on Wikipedia).

But what if inputWidth>1? Let's say inputWidth=k (k>1)!

In this case, do you unroll the same RNN multiple times in parallel, and somehow average the weights? Do you unroll the same RNN k times sequentially? What does that look like?

Was it helpful?

Solution

You unroll the same RNN multiple times, but not in parallel. The RNN needs both an input and the previous step, therefore, you unroll the same RNN for the first time step, then for the second, and so on.

For each step it is always the same RNN, so the weights are the same each time:

enter image description here

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top