What is a 'hidden state' in BERT output?
-
21-10-2020 - |
質問
I'm trying to understand the workings and output of BERT, and I'm wondering how/why each layer of BERT has a 'hidden state'.
I understand what RNN's have a 'hidden state' that gets passed to each time step, which is a representation of previous inputs. But I've read that BERT isn't a RNN - it's a CNN with attention.
But you can output the hidden state for each layer of a BERT model. How is it that BERT has hidden states if it's not a RNN?
解決
BERT is a transformer.
A transformer is made of several similar layers, stacked on top of each others.
Each layer have an input and an output. So the output of the layer n-1
is the input of the layer n
.
The hidden state you mention is simply the output of each layer.
You might want to quickly look into this explanation of the Transformer architecture : https://jalammar.github.io/illustrated-transformer/
Note that BERT use only Encoders, no Decoders.