I want to learn how to use bert when ‘is_decoder’ is True and ‘add_cross_attention’ is True.
But it can’t success.
Here is my code:
tokenizer: BertTokenizer = BertTokenizer.from_pretrained('google-bert/bert-base-uncased')
bertconfig = BertConfig.from_pretrained('google-bert/bert-base-uncased', is_decoder=True, add_cross_attention=True)
bert = BertLMHeadModel(bertconfig).to('cuda:0')
inputs = tokenizer('Hello, my dog is cute.', return_tensors='pt').to('cuda:0')
cross_tensor = bert.bert.embeddings.forward(inputs['input_ids'])
outputs = bert(**inputs, encoder_hidden_states=cross_tensor, encoder_attention_mask=torch.ones_like(cross_tensor).to('cuda:0'))
Here is the error information:
File d:anaconda3envstorch_py38libsite-packagestransformersmodelsbertmodeling_bert.py:352, in BertSelfAttention.forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
349 attention_scores = attention_scores / math.sqrt(self.attention_head_size)
350 if attention_mask is not None:
351 # Apply the attention mask is (precomputed for all layers in BertModel forward() function)
--> 352 attention_scores = attention_scores + attention_mask
354 # Normalize the attention scores to probabilities.
355 attention_probs = nn.functional.softmax(attention_scores, dim=-1)
RuntimeError: The size of tensor a (9) must match the size of tensor b (768) at non-singleton dimension 3
I try to change the shape of ‘encoder_hidden_states’ Tensor, but it dosen’t work.
Who can give me the solution to fix it or a example to use bert in this situation.