How do attention mechanisms in RNNs learn weights for a variable length input
Question
Attention mechanisms in RNNs are reasonably common to sequence to sequence models.
I understand that the decoder learns a weight vector $\alpha$ which is applied as a weighted sum of the output vectors from the encoder network. This is used to produce a new input vector.
What I don't understand is that the learned weight vectors $\alpha$ must be a fixed size vector because it's treated as learned weights, but it's applied to a variable length sequence.
If someone could help me understand this particular mechanism I'd appreciate it.
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange