What information does output of [SEP] token captures in BERT?

https://datascience.stackexchange.com/questions/73408

10-12-2020
|

Pergunta

After reading around on the web I came to understand that the output representation of the special token [CLS] captures the representation of a sentence (am I correct?).

My primary question is what information does the output embedding of [SEP] token (T_SEP) captures?

My other doubt is if I input a bunch of sentences into BERT separated by [SEP] does the output embedding of [CLS] contain information about all the sentences?

Solução

You are right, CLS token tries to capture the sentence representation, because during pretraining this token is used to decide if 2 sentences are contiguous or not.

But the author mentioned that CLS token was not supposed to be a sentence representation, and should be used carefully.

The SEP token is a token used simply to separate sentence, in order to make is easier for BERT to know that the input is made of several sentences. Since the SEP token is not used at pretraining time, the SEP token does not represent anything.

About your last question : We don't know.

In pretraining, the model was not train with several sentence, so the model will not know how to handle several sentence.

But if you finetune it, the model may be able to learn new representation and can learn so that the CLS token contain information about all the sentences.

For example, this code finetuned BERT with a new pattern :

[CLS] Sen 1 [SEP] [CLS] Sen 2 [SEP] [CLS] Sen 2 [SEP] ...

And the CLS token is used to represent each sentence. Because the model is finetuned, BERT is learning to represent each sentence in the corresponding CLS token.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a datascience.stackexchange