How do attention mechanisms in RNNs learn weights for a variable length input

https://datascience.stackexchange.com/questions/27217

neural-network
recurrent-neural-net
sequence-to-sequence
attention-mechanism

31-10-2019
|

Question

Attention mechanisms in RNNs are reasonably common to sequence to sequence models.

I understand that the decoder learns a weight vector $\alpha$ which is applied as a weighted sum of the output vectors from the encoder network. This is used to produce a new input vector.

What I don't understand is that the learned weight vectors $\alpha$ must be a fixed size vector because it's treated as learned weights, but it's applied to a variable length sequence.

If someone could help me understand this particular mechanism I'd appreciate it.

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange