Attention Mechanism: Why use context vector instead of attention weights?

https://datascience.stackexchange.com/questions/52039

machine-learning
attention-mechanism

01-11-2019
|

Question

In attention, the context vector ($c$) is derived from the sum of the attention weights ($\alpha$) multiplied by the encoder hidden states ($h$), where the weights are obtained by multiplying the decoder hidden state and the encoder states.

$c_i = \Sigma_j^{T_x} \alpha_{ij} h_j$

My question is, why calculate this context vector and simply not forward the attention weights, as these could indicate how much to focus on each of the encoder states.

Could somebody explain the intuition behind this?

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange