Relative Content

Tag Archive for pytorchhuggingface-transformersbert-language-modeltransformer-model

Why am I seeing unused parameters in position embeddings when using relative_key in BertModel?

I am training a BERT model using pytorch and HuggingFace’s BertModel. The sequences of tokens can vary in length from 1 (just a CLS token) to 128. The model trains fine when using absolute position embeddings, but when I switch to using relative position embeddings (specifically setting position_embedding_type="relative_key"), training fails because of unused parameters. When I investigate further (adding print statements as proposed in this thread), I find that the unused parameter is module.bert_model.embeddings.position_embeddings.weight.