how does minibatch for LSTM look like?

https://datascience.stackexchange.com/questions/26061

31-10-2019
|

Question

Minibatch is a collection of examples that are fed into the network, (example after example), and back-prop is done after every single example. We then take average of these gradients and update our weights. This completes processing 1 minibatch.

I read these posts

[1]
[2],
about padding entries in a minibatch so they have same length
and about preserving the cell state but the following is still unclear to me:

Question part a:

How a minibatch entity would look like for LSTM? Say, I want it to reproduce Shakespeare, letter by letter (30 characters to choose from).

I launch LSTM, let it predict for 200 characters of a poem, then perform back propagation. (hence, my LSTM works with 200 timesteps). Does this mean my minibatch consist of 1 example whose length is 200?

Question part b:

If I wanted to launch 63 other minibatches in parallel, would I just pick 63 extra poems? (Edit: Original answer doesn't mention this explicitly, but we don't train minibatches in parallel. We train on 1 minibatch, but train its examples in parallel)

Question part C:

If I wanted each minibatch to consist of 10 different examples, what would such examples be, and how would they be different from 'what I perceive as a minibatch'?

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange