Question

Attention mechanisms in RNNs are reasonably common to sequence to sequence models.

I understand that the decoder learns a weight vector $\alpha$ which is applied as a weighted sum of the output vectors from the encoder network. This is used to produce a new input vector.

What I don't understand is that the learned weight vectors $\alpha$ must be a fixed size vector because it's treated as learned weights, but it's applied to a variable length sequence.

If someone could help me understand this particular mechanism I'd appreciate it.

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top