Pergunta

I'm training Bert on question answering (in Spanish) and i have a large context, only the context exceeds 512, the total question + context is 10k, i found that longformer is bert like for long document, but there's no pretrained in spanish so, is there any idea get around bert.

What i tried is:

from transformers import BertConfig
config=BertConfig.from_pretrained(BERT_MODEL_PATH)
config.max_length=4000 
config.max_position_embeddings=4000
config.output_hidden_states=True
model = MyBertModel(config)    

but still gives me an error mismatch

RuntimeError: Error(s) in loading state_dict for BertModel: size mismatch for bert.embeddings.position_embeddings.weight: copying a param with shape torch.Size([512, 768]) from checkpoint, the shape in current model is torch.Size([4000, 768]).

Foi útil?

Solução

The maximum input length is a limitation of the model by construction. That number defines the length of the positional embedding table, so you cannot provide a longer input, because it is not possible for the model to index the positional embedding for positions greater than the maximum.

This limitation, nevertheless, is not arbitrary, but has a deeper justification: in a vanilla transformer, the memory requirements are quadratic on the input length. Therefore, limiting such a length is necessary for the model not to need too much memory.

There are variants of the Transformer architecture that are designed to overcome the quadratic memory problem, like Reformer, Linformer, Longformer, BigBird. They, however, are not "compatible" with the Transformer weights, so someone would need to train one of those models in a masked language modeling task (potentially with multitask learning on the next sequence prediction task) in order to reuse their weights in your context.

Licenciado em: CC-BY-SA com atribuição
scroll top